Microsoft Word - brain_2_3.doc 45 Connectionist Models: Implications in Second Language Acquisition Farid Ghaemi Islamic Azad University, Karaj Branch, Department of Humanities, Iran. ghaemi@kiau.ac.ir Laleh Fakhraee Faruji Islamic Azad University, Science and Research Branch, Department of Literature and Foreign Languages, Tehran, Iran fakhraeelaleh@yahoo.com Abstract In language acquisition, ‘Emergentists’ claim that simple learning mechanisms, of the kind attested elsewhere in cognition, are sufficient to bring about the emergence of complex language representations (Gregg, 2003). Connectionist model is one of the models among others proposed by emergentists. This paper attempts to clarify the basic assumptions of this model, its advantages, and the criticisms leveled at it. Keywords: Associationism, Connectionism, Emergentism, Information Processing (IP) 1. Introduction ‘Emergentism’ is the name that has recently been given to a general approach to cognition that stresses the interaction between organism and environment and that denies the existence of pre- determined, domain specific faculties or capacities (Gregg, 2003). Emergentism offers itself as an alternative to modular, ‘special nativist’ theories of the mind, such as theories of Universal Grammar (UG). One of the models among others proposed by emergentists is called ‘connectionism’. Richards & Schmidt (2002) defined connectionism as a theory in cognitive science that assumes that the individual components of human cognition are highly interactive and that knowledge of events, concepts and language is represented diffusely in the cognitive system (p. 108). According to them this theory has been applied to models of speech processing, lexical organization, and first and second language learning. Connectionism provides mathematical models and computer simulations that try to capture both the essence of information processing and thought processes. The basic assumptions of the theory are (Richards & Schmidt, 2002, p. 108): 1. Information processing takes place through the interactions of a large number of simple units, organized into networks and operating in parallel. 2. Learning takes place through the strengthening and weakening of the interconnections in a particular network in response to examples encountered in the input. 3. The result of learning is often a network of simple units that acts as though it “knows” abstract rules, although the rules themselves exist only in the form of association strengths distributed across the entire network. Connectionist approaches to learning have much in common with Information Processing (IP) perspectives (Saville-Troike, 2006, P. 80), but they focus on the increasing strength of associations between stimuli and responses rather than on the inferred abstraction of “rules” or on restructuring. Indeed, from a connectionist perspective learning essentially is change in the strength of these connections. Some version of this idea has been present in psychology at least since the 1940s and 1950s, but Connectionism has received widespread attention as a model for first and second language acquisition only since the 1980s. Ahlsén (2006, p. 10) called connectionism as associationism, assumes in which higher functions are dependent on the connections between different centers in the cortex. Linguistic ability is seen as the relationship between images and words. He believed that aphasia results from broken connections between the centers that are needed for linguistic function. Connectionist BRAINotes BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 2, Issue 3, September 2011, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 46 models challenges the traditional separation of item memory and abstract, symbolic rule systems, and provides an empirically testable alternative to nativist approaches to learning (Williams, 2005). Gass & Selinker (2008) provided a premise and a metaphor for connectionism as (p. 494): • Premise: Emphasizes environment influence and exposure to language towards a gradual buildup of language. • Metaphor: The builder needs the assistance of others to help construct the house; he cannot do it entirely on his own. An early connectionist model was a network trained by Rumelhart and McClelland (1986, as cited in Jordan, 2004, p. 243) to predict the past tense of English verbs. The network showed the same tendency to overgeneralize as children, but there is still no agreement about the ability of neural networks to learn grammar. 2. Elements of connectionist models Connectionist models consist of a large number of simple processing units with multiple connections linking them. Activation flows along the connections, just as electrical impulses transmit information through neurons in the brain. The ease with which activation spreads from one unit to another is determined by the strength of the connections along which it travels. The stronger the connection to a unit, the more readily that unit becomes activated. A connection’s strength depends upon how frequently it is used. Thus, over time, connections to a frequent word will become strong, ensuring that the word is activated more rapidly than other less common ones (Field, 2004, p. 73). According to Field (2004, p. 73) strength of the connectionist approach is that, besides modeling processes such as word recognition, it can also model learning. In a computer simulation, each connection receives a number or weight to indicate its relative strength. At the outset, these can be set at 0; but, as connections are used, their weights are adjusted by means of a complex formula. If a connection is not used, its weight declines to a negative value, indicating an inhibitory relationship. The effectiveness of this learning process has been increased by a feedback mechanism known as back propagation which provides the program with a kind of memory. It compares what a connectionist network outputs from a particular stimulus with what it should output. 2.1. Backpropogation By dint of many repeated presentations of the input, some connections within the network become strengthened while others become weakened. In this way, the network can gradually be ‘trained’ to produce correct responses through a process of error reduction (Field, 2004, p. 75). Using back propagation, a connectionist program has managed to simulate the acquisition of a set of regular and irregular English Past Simple verb forms. It succeeded in discriminating between cases where an -ed inflection was appropriate (walk-walked) and those where a new form had to be learnt (write-wrote). It also, in the process, manifested the kind of U-shaped development observed both in first and second language acquisition, where a speaker acquires a correct irregular form, then later replaces it with a regular form that has been over-generalised (writed). Simulations such as this are sometimes cited in support of an empiricist view of language acquisition. They suggest that linguistic patterns can be identified through exposure to multiple examples, with no need to presuppose a genetically transmitted mechanism which drives the acquisition process (Field, 2004, p. 75). 2.2. Parallel Distributed Processing (PDP) Network performance in PDP is determined by several main factors (MacDonald & Seidenberg, 2006, P. 587): (1) the architecture of the system (e.g., the configuration of units and connections); (2) the characteristics of the input and output representations; (3) characteristics of the patterns used in training the model; and (4) characteristics of the learning algorithm. In other words, F. Ghaemi & L. F. Faruji - Connectionist Models: Implications in Second Language Acquisition 47 the model’s performance depends on its initial state, what it experiences, and how it learns from those experiences. MacDonald & Seidenberg (2006) focused on three properties that inform the probabilistic constraints approach to comprehension as follow (p. 588): First, the networks incorporate a theory of statistical learning. The main idea is that one way that people learn (there may be others) is by gathering information about the frequencies and distributions of environmental events. This type of learning is thought to be general rather than language specific. The reading models in particular developed the idea that lexical knowledge consists of statistical relations between orthographic, phonological, and semantic codes. Learning then involves acquiring this statistical knowledge over time. Subsequent research on statistical learning in infants and adults has provided strong evidence consistent with this view. This learning mechanism provides a way to derive regularities from relatively noisy data, a property that is likely to be highly relevant to the child’s experience in learning language. Second, the models provide a basis for understanding why particular types of statistics are relevant and not others. Given the properties of these representations, other aspects of the model architecture (e.g., number of units or layers patterns of connectivity between layers), and a connectionist learning algorithm, the model will pick up on particular statistical regularities implicit in the examples on which the model is trained. Thus, motivating the various elements of a model and how it is trained is very important, but the model itself determines which statistics are computed. Third, the framework provides a powerful processing mechanism, the exploitation of multiple simultaneous probabilistic constraints. Information in a network is encoded by the weights. The weights determine (“constrain”) the output that is computed in performing a task. Processing involves computing the output that satisfies these constraints. This output changes depending on what is presented as input (e.g., the current word being processed in a network that comprehends sentences word by word). This type of processing, known as constraint satisfaction, has several interesting properties. One is that the network’s output is determined by all of the weights. Such models illustrate how a large number of constraints can be utilized simultaneously without imposing excessive demands on memory or attention. Constraint satisfaction is passive – activation spreads through a network modulated by the weights on connections – rather than a resource-limited active search process. Another important property is that the constraints combine in a nonlinear manner. Bits of information that are not very informative in isolation become highly informative when taken with other bits of information. Much of the power and efficiency of the language comprehension system arises from this property. Languages exhibit much partial regularity. Different types of information are correlated, but weakly. The comprehender cannot wholly rely on any one type of information, but combinations of these partial cues are highly reliable. This concept may seem paradoxical at first. If individual cues are unreliable, wouldn’t combinations of these cues be even more unreliable? MacDonald & Seidenberg (2006, p. 587) answers: no, not if cue combination is nonlinear. The informativeness of each cue varies as a function of other cues. 3. Connectionism and SLA Even though connectionist approaches have been around for a number of years, it is only recently that research within a second language context has begun to take place (Gass & Selinker, 2008, p. 220). Gasser (1990) examines the implications of connectionist models of cognition for second language theory. Connectionism offers a challenge to the symbolic models which dominate cognitive science. In connectionist models all knowledge is embodied in a network of simple processing units joined by connections which are strengthened or weakened in response to regularities in input patterns. These models avoid the brittleness of symbolic approaches, and they exhibit rule-like behavior without explicit rules. In SLA the assumption is that transfer from L1 to L2 occurs because strong associations already established in L1 interfere with establishment of the L2 network. Because frequency is the BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 2, Issue 3, September 2011, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 48 primary determinant of connection strength, it might be predicted that the most common patterns in L1 would be the most likely to cause interference in L2, but research on transfer from linguistic perspectives does not support this conclusion in any strong sense; L1-L2 relationships are not that simple. Proponents of connectionist approaches to language acquisition note that while frequency is “an all-pervasive causal factor” (Ellis 2002, as cited in Saville-Troike, 2006, P. 81), it interacts with other determinants, including how noticeable the language patterns are in the input learners receive, and whether the patterns are regular or occur with many variations and exceptions. According to Saville-Troike (2006) the best-known connectionist approach within SLA is Parallel Distributed Processing, or PDP (P. 80). Based on this viewpoint, processing takes place in a network of nodes (or “units”) in the brain that is connected by pathways. As learners are exposed to repeated patterns of units in input, they extract regularities in the patterns; probabilistic associations are formed and strengthened. These associations between nodes are called connection strengths or patterns of activation. The strength of the associations changes with the frequency of input and nature of feedback. The claim that such learning is not dependent on either a store of innate knowledge (such as Universal Grammar) or rule-formation is supported by computer simulations. For example, Rumelhart and McClelland (1986, as cited in Saville-Troike, 2006, P. 80) demonstrated that a computer that is programmed with a “pattern associator network” can learn to associate English verb bases with their appropriate past tense forms without any a priori “rules,” and that it does so with much the same learning curve as that exhibited by children learning English L1 (Saville- Troike, 2006, P. 80). The model provides an account for both regular and irregular tense inflections, including transfer to unfamiliar verbs, and for the “Ushaped” developmental curve which is often cited in linguistic models and in other cognitive approaches as evidence for rule-based learning. Assumptions about processing from a connectionist/PDP viewpoint differ from traditional IP accounts in other important ways. For example (McClelland, Rumelhart, and Hinton 1986; Robinson 1995, as cited in Saville-Troike, 2006, P. 80): i. Attention is not viewed as a central mechanism that directs information between separate memory stores, which IP claims are available for controlled processing versus automatic processing. Rather, attention is a mechanism that is distributed throughout the processing system in local patterns (p. 81). ii. Information processing is not serial in nature: i.e. it is not a “pipeline . . . in which information is conveyed in a fixed serial order from one storage structure to the next” (Robinson 1995 as cited in Saville-Troike, 2006, P. 81). Instead, processing is parallel: many connections are activated at the same time. iii. Knowledge is not stored in memory or retrieved as patterns, but as “connection strengths” between units which account for the patterns being recreated. It is obvious that parallel processing is being applied when tasks simultaneously tap entirely different resources such as talking on a cell phone while riding a bicycle, but it also less obviously occurs within integrated tasks such as simply talking or reading, when encoding/decoding of phonology, syntactic structure, meaning, and pragmatic intent occur simultaneously (Saville-Troike, 2006, P. 81). Many connections in the brain must be activated all at once to account for successful production and interpretation of language, and not processed in sequence (i.e. one after the other). 4. Advantages of connectionist models Connectianist models appear to offer a number of advantages (Feldman & Ballard, 1982). Ellis (2003, as cited in Doughty & Long, 2003, p. 85) believed that a detailed transition theory is needed for describing the process of language acquisition. If language is not informationally produced in its own module, then we must eventually show how generic learning mechanisms can result in complex and highly specific language representations. We need dynamic models of the F. Ghaemi & L. F. Faruji - Connectionist Models: Implications in Second Language Acquisition 49 acquisition of these representations and the emergence of structure. And we need processing models where the interpretation of particular utterances is the result of the mutual satisfaction of all of the available constraints. For these reasons, emergentists look to connectionism, since it provides a set of computational tools for exploring the conditions under which emergent properties arise (Ellis, 2003, as cited in Doughty & Long, 2003, p. 85). According to Ellis (2003, as cited in Doughty & Long, 2003) connectionism has various advantages for this purpose: neural inspiration; distributed representation and control; data-driven processing with prototypical representations emerging rather than being innately pre-specified; graceful degradation; emphasis on acquisition rather than static description; slow, incremental, non- linear, content- and structure-sensitive learning; blurring of the representation/learning distinction; graded, distributed, and non-static representations; generalization and transfer as natural products of learning; and, since the models must actually run, less scope for hand-waving (Ellis, 2003, as cited in Doughty & Long, 2003 p. 85). Connectionist studies are important in that they directly show how language learning takes place through gradual strengthening of the associations between co-occurring elements of language, and how learning the distributional characteristics of the language input results in the emergence of rule like, but not rule-governed, regularities. Given that connectionist models have been used to understand various aspects of child language acquisition, the successful application of connectionism to SLA suggests that similar mechanisms operate in children and adults, and that language acquisition, in its essence, is the distributional analysis of form-function mappings in a neural network that attempts to satisfy simultaneously the constraints of all other constructions that are represented therein (Ellis, 2003, as cited in Doughty & Long, 2003 p. 91). Traditional symbolic models (taking the classical approach to categorization) work on the all-or- none basis and can satisfy hard constraints only. That is, even if one condition is not met, a rule does not apply. A connectionist system can satisfy soft constraints, meaning that it finds the best solution to a situation where multiple constraints compete by meeting as many of them as possible, even if none of the conditions are met completely. There is one more general point to be made. Being motivated by the recognition that the brain is a neural network, connectionism equates mental representations with patterns of neural activity, which what connectionism shares with enactivism, both present a clear contrast to traditional cognitivism in this respect (Zalewski, 2010, p. 95). 5. Criticisms of connectionist models According to Gregg (1996) nonmodular approaches in L2 acquisition research has been largely motivated by a misguided concern for communicative competence, there has been little in the way of explicit nonmodular theorizing. Instead, there has been a more or less implicit denial of modularity: Learning is seen as a general process irrespective of object. That is, it is often assumed by L2 acquisition theorists that such processes as hypothesis testing, generalization, analogy, automatization, and so forth, apply equally to any learning task-linguistic or otherwise (McLaughlin, 1987, as cited in Gregg, 1996, p. 58 [6]). One counterargument of strong deterministic role of frequency of input in language learning is that some of the most frequent words in English (including the most frequent, the) are relatively late to appear, and among the last (if ever) to be mastered. Still, whatever one’s theoretical perspective, the effects of frequency on SLA clearly merit more attention than they have typically received since repetition drills went out of fashion in language teaching. Researchers from several approaches to SLA which focus on learning processes are taking a renewed look at how frequency influences learning (Saville-Troike, 2006, P. 81 [10]). Connectionists take a very different antimodular approach to language learning which is away from rules, structures, and so on, and instead see learning as the relative strengthening (or spreading activation) of associations, or connections, between interconnected units or nodes. The rules or principles proposed by linguists are simply epiphenomena, in this view; they have no non metaphoric existence whatever, and thus play no role in language learning. The fact-if indeed it is a BRAIN. Broad Research in Artificial Intelligence and Neuroscience Volume 2, Issue 3, September 2011, ISSN 2067-3957 (online), ISSN 2068 - 0473 (print) 50 fact-that one can successfully model the acquisition of English past-tense forms with a connectionist computer program does not say anything as to whether the human mind goes about acquiring in that way; nor is a spreading activation model necessarily incompatible with a more traditional concept of mental structure (J. A. Fodor & Pylyshyn, 1988, as cited in Gregg, 1996, p. 59 [6]). Further, as Carroll and Meisel (1990 as cited in Gregg, 1996, p. 59 [6]) pointed out, connectionist accounts fail to deal with the fact that humans have knowledge that transcends the input; but that, of course, is the very heart of the logical problem. Something like spreading activation could possibly be involved in the establishing of connections between, say, irregular verbs and their past-tense endings; but we cannot appeal to a lack of activation for our knowledge that one sentence (e.g., She may have been being misled) is a possible sentence of English whereas another (e.g., She may been have being misled) is impossible. After all, one have probably never heard either type of sentence, and thus they are equally unlikely to have been activated. Given the serious shortcomings of the connectionist model, it is hard to see any good reason for taking it up in the stead of a structured system of mental representations, as assumed by almost any other view of the mind. It is true; of course, that the mind is instantiated in the brain, and that any mental activity whatever must have a neurological basis. But it by no means follows from this that the neurological level is an appropriate, let alone the only proper, level at which to account for mental activity (Gregg, 1996, p. 59). What otherwise widely different theories such as O’Grady’s “general nativism” and connectionism have in common is a reductionist animus, an attempt to account for language acquisition phenomena as nothing more than special cases of other kinds of phenomena(Gregg, 1996, p. 59) 6. Concluding remarks Connectionism now offers a radical alternative to innativism view. The old questions about innateness, the mind and the body, and what it means to know are being asked again, and a new set of answers is being proposed. Not everyone agrees that connectionist models make good neuropsychological sense (Segalowitz and Bernstein, 1997, as cited in Segalowitz & Lightbown, 1999 [11]). However, as Ellis (1999 [2]) has suggested, one does not necessarily have to view connectionism as an attempt to model neural architecture for it to be useful (as cited in Segalowitz & Lightbown, 1999 [11]). Connectionist analysis can, instead, be understood as a mathematical tool that reveals how the process of simple associative learning can contain hidden within it the basis for establishing response patterns that appear to be rule-governed in the absence of explicit rule representation. This use of connectionism as a tool is analogous to, say, the way one uses Fourier analysis to analyze complex wave forms to reveal underlying hidden, simpler acoustic patterns (e.g., the component frequencies, formant structures of speech)(Segalowitz & Lightbown, 1999 [11]). How the brain actually performs the equivalent analysis is another question. References [1] Ahlsen, E. (2006). Introduction to neurolinguistics. Amsterdam: John Benjamins Publishing Company. [2] Ellis, N. C. (2003). Constructions, chunking, and connectionism: The emergence of second language structure. In C. J. Doughty & M. H. Long (Eds.), The handbook of second language acquisition (pp. 63-104). Oxford: Blackwell Publishing. [3] Feldman, J. A., & Ballard, D. H. (1982). Connectionist models and their properties. Cognitive Science, 6, 205-254. F. Ghaemi & L. F. Faruji - Connectionist Models: Implications in Second Language Acquisition 51 [4] Field, J. (2004). Psycholinguistics: The key concepts. London & NewYork: Routledge. Gass, S. M., & Selinker, L. (2008). Second language acquisition: An introductory course (3rd ed.). New York & London: Routledge. [5] Gasser, M. (1990). Connectionism and universals of second language acquisition. Studies in Second Language Acquisition, 12, 179-199. [6] Gregg, K. R. (1996). The logical and developmental problems of second language acquisition, in T. K. Bhatia & W. C. Ritchie (Eds.), Hand book of second language acquisition (pp. 50-75). London: Academic Press, Inc. [7] Jordan, G. (2004). Theory construction in second language acquisition. Amsterdam & Philadelphia: John Benjamins Publishing Company. [8] MacDonald, M. C., & Seidenberg, M. S. (2006). Constraint satisfaction accounts of lexical and sentence comprehension. In M. J. Traxler & M. A. Gernsbacher (Eds.), Hand book of psycholinguistics (pp. 581-613). London: Elsevier Inc. [9] Richards, J. C., & Schmidt, R. (2002). Longman dictionary of language teaching and applied linguistics (3rd ed.). London: Longman. [10] Saville-Troike, M. (2006). Introducing second language acquisition. Cambridge: Cambridge University Press. [11] Segalowitz, N., & Lightbown, P. M. (1999). Psycholinguistic approaches to SLA. Annual Review of Applied Linguistics, 19, 43–63. [12] Williams, J. N. (2005). Associationism and connectionism. Retrieved November 2010 from people.pwf.cam.ac.uk/jnw12/associationism.pdf [13] Zalewski, J. (2010). A connectionist–enactivist perspective on learning to write. In J. Arabski & Wojtaszek (eds.), Neurolinguistic and psycholinguistic perspectives on SLA (pp. 93-106). Toronto: Multilingual Matters.