PERMEABLE MODULES: ON EVOLVING AND ACQUIRING LANGUAGE-SPECIFIC CAPACITIES

Lise Menn, University of Colorado

Ann M. Peters, University of Hawai'i

Stories about beginnings come in only two basic modes. An entity either has an explicit point of origin...or else it evolves... (Gould 1991:48)

Introduction

Here at the end of the 20th century, linguists and psycholinguists have increasing opportunities to contribute to the understanding of how our brains represent and process language. However, the emerging picture is already too complex to deal with in terms of conventional notions of discrete language areas and cognitive modules. This is because we are dealing with an evolved system, not an engineered one - which implies that it is, like other complex evolved systems, an enormous, wondrous mess. To understand it, we must understand the nature of evolved systems; therefore, we must think about how evolution works.

The intent of this paper is mostly heuristic: we suggest some ways of thinking about modularity that will allow us to deal with this wonderful messiness in an evolutionary context. We will illustrate these approaches by showing how they apply to a relatively simple and very familiar evolved system, the human vocal tract.
We consider the question of modularity in language processing without regard to the particular theories of grammar that are typically associated with particular views on modularity. "Autonomy" of syntactic processing and an "autonomous" theory of syntax are actually very different notions, as Bock (1995) emphasizes, and one's views on one of them need have no particular connection with one's views on the other. This paper is intended to be theory-neutral as far as syntax is concerned; the focus is simply on how a system can have internal structure - internal clumps and layers which may develop at different times and different rates - and yet still have integrated functioning in the skilled user.

Psycholinguistics requires a conceptual model for modularity that is subtle enough to deal with current findings from neurolinguistics, as well as with language evolution. Rather than adopting a notion of modules as monolithic and impermeable, we consider a gradient conceptualization of the components of language as complex and interwoven, with degrees of permeability. The strong notion of a module is a processing component (transforming some kind of "input" to some kind of "output") whose internal activity is not affected by the state of any other part of the system. The modularity of a system is the extent to which a system is composed of such self-contained modules; the modularity of a component is the extent to which its processing is independent of simultaneously occurring events.

Anatomically, there is no difficulty in realizing a system whose components have differing degrees of modularity; as Wilkins & Wakefield (1995) point out, this is a matter of varying the proportion of neural connections that are internal to a component as compared to the proportion that connects that component to others. The higher the proportion of internal connections, the more encapsulated the component.
An important principle of evolution is that it is characterized by a great deal of inertia. Nature is, in the classical phrase, "a tinkerer": evolution works on what is at hand - on the variation inherent in the population under a particular existing environmental range of conditions. In this view, humans, as well as all our co-inhabitants of the planet, are contrived from old parts, stretched and twisted and reorganized, with bits of new tubing and wiring stuck in here and there (for a formal anatomical exposition, see Wilkins & Wakefield, 1995).

There are two important misconceptions about the evolution of language that we have found in recent literature. The first is the assumption that, since syntax as it currently exists is a complex integrated system, it must have been invented "all at once". The second is the more subtle but related claim - one on which we disagree with Wilkins & Wakefield - that there must have been a definable moment at which the brain had reached the point where it was capable of syntactic computations worthy of the name of Language (cf. Pinker & Bloom, 1990; Wang, 1991).

The modern theory of "punctuated equilibrium" - available to nonspecialists like ourselves in, for example, the monthly Natural History magazine columns by Stephen Jay Gould, does not require that enormous linked changes in the structure and behavior of a species (for example, wings and powered flight) must arise, or even tend to arise, either in tandem or as a result of a single mutation. On the contrary, there can be great asynchrony between changes in structures and concomitant changes in behaviors: a new structure may make a new behavior possible, but that behavior may emerge only later. Or, a new environment may bring out a new behavior, and individuals may then be selected for a structure which makes that behavior more effective. It follows that the large brain that enables modern human behaviors and the behaviors themselves did not have to arise either suddenly or simultaneously. If they indeed arose at different times, as we argue below, we will need to consider how an integrated system such as syntax could have evolved.

Modularity and evolution of the vocal tract

Consider the human vocal tract and its control mechanisms. The vocal tract is a complex, integrated, but "lumpy" system involving many elements which have enormously varied phylogenetic depth and overlaid functions. Its gross anatomical basis is a set of "old" respiratory and ingestive organs - nose, mouth, and throat - which look roughly similar in all primates. The vocal tract takes part in many behaviors, most of them highly automatized, with a rich variety of hard-wired and acquired control loops, both involuntary and voluntary. The more automatic, hard-wired functions like sneezing are almost completely introspectively inaccessible and cognitively impenetrable (i.e. unaffected by knowledge); other functions were acquired at ontogenetically and phylogenetically different times, and are subject to both involuntary and voluntary control. The entire vocal tract works together smoothly in the normal mature human, but it may shows its seams in cases of brain injury or abnormal development.

Consider the control of airflow in the vocal tract, as outlined in Table 1A. Breathing is controlled by largely involuntary neural circuitry that responds to at least two needs: the body's physiological need for oxygen, and involuntary phonation: that is, involuntary crying, laughing, screaming, sighing, and other emotionally controlled sounds, driven by the limbic system. The oxygen-supply aspect of the respiratory control system is evolutionarily very old; more recent, but still at least as old as the mammals, are yawns, coughs, and sneezes. Yawning and coughing also must involve voluntary circuits (experienced air travelers can yawn voluntarily).

In humans, the respiratory control apparatus acquires another automatic control capacity: it learns to produce language-specific prosodic patterns. These appear to the hearer to be largely in place and automatic by the time a child is one year old, so that we hear the familiar intonation contours of "jargon" prosody. These patterns then increasingly come under syntactic and pragmatic control over the next several years, as knowledge of language grows. The syntactic control of prosodic patterns also becomes automatized: hard to access metalinguistically, and resistant to voluntary control. We have two kinds of evidence for this: first, fully native-sounding prosody can rarely be acquired for a second language after middle childhood; second, intonation contour and timing are difficult for the prelingually profoundly deaf to acquire after early childhood, even when the normal fundamental frequency range of speech lies within the range of signals that are made audible with amplification. The syntactic respiratory control mechanism can also be selectively disrupted in adulthood by brain lesions (leading to the mis-named "foreign accent syndrome").

Tension, depression, amusement, and many other emotional states also have major involuntary effects on phonatory respiration; depression leads to restricted pitch range, tension to heightened pitch, and so on (Scherer & Oshinsky, 1977; Williams & Stevens, 1972). Some of these involuntary effects can be controlled with effort, but we have all experienced others - say, a fit of giggles at a wedding - that cannot be repressed.
In summary, the overall prosodic control mechanism must be in some way specific to language, must be acquired during a critical window, and varies across languages. But, it is simultaneously dependent on factors external to language structure: the respiratory mechanism, and affective social/cognitive interactions. It is under considerable voluntary control, yet highly susceptible to involuntary influences, only some of which can be overridden.

TABLE 1. Functions of the vocal tract in evolutionary perspective


A. Respiratory control - airways and diaphragm (a few major functions):
  1. Land animals:
  2. Humans, neonate:
  3. Humans, by 1 year:
  4. Humans, before adulthood:
B. Jaw, mouth, tongue movement (a few major functions):
  1. Complex animals:
  2. Vertebrates:
  3. Mammals:
  4. Humans:
Now let us turn to the control of oral configuration for speech sound articulation, as outlined in Table 1B. Here again we see a basic set of automatic commands - in this case, motor commands related to sucking, biting, chewing, swallowing, spitting, licking, buccal grooming, and buccal components of emotional signalling. We infer that oral configuration must be responsive to a range of stimuli, processed by a range of different neural circuits, in all animals that have mouths and guts. From this basis, the human individual develops, refines, and automatizes speech articulation over the first seven years or so (Templin 1957, Irwin & Wang, 1983). This automatic control is again determined in great detail by the specific language being acquired; and it is also susceptible to specific disruption by brain damage, as in acquired dysarthrias.

So linguistic phonation and oral configuration are also controlled by many areas of brain that are necessary for their operation, but not exclusively dedicated to it; some of these control circuits are clearly hard-wired, while others which interact with them are just as clearly acquired during the first years of life, becoming trained through internal feedback, and ending up in the adult as fully automatic, and highly modular, with dedicated neural circuits.

What can we infer about how this vocal tract and its multiple control systems evolved? Although spoken language is referred to as a communicative function that is overlaid on the respiratory and ingestive organs, that does not imply that these organs were simply appropriated and used in their original form. Being used for speech changed the organs, but the mouth, nose, throat, etc. had to remain good at their original jobs - after all, we developed no alternative way to eat or to breathe!

Can we date the evolutionary change which makes speech possible for us, but not for chimpanzees? The major vocal tract change discernible in the fossil record is an elongation of the neck, which had not yet taken place in the Neanderthal (Lieberman and Crelin 1971). This stretching out of the neck greatly increased the size of the pharynx, which in turn increased the variety of vowels that could be produced. Roughly the same change is present in the ontogenetic development of each human today between birth and age two - newborns have almost no neck, as everyone who has bathed one knows. This increase in speech production capacity comes at considerable cost, however: an increased risk of inhaling food and choking to death.
The fact that the longer neck came at such a cost has an important implication which is often not appreciated, although it has been mentioned by Lieberman and others (Lieberman and Crelin, 1971, Barber & Peters, 1992, Lieberman, 1995). This implication is that our ancestor whose neck got longer already had a highly vocal communication system of some sort, even if it was not a Language (having neither syntax nor phonology). Otherwise, at that point in time, there would have been no compensating payoff for elongating the neck 1.

Bringing the bulk of vocalization under voluntary control required a brain change that could logically have occurred before or after the neck began to lengthen.2 Current thinking places it, in fact, much earlier, tying it into the enormous increase in size of the frontal lobes, as shown from skulls about 250,000 years old (Deacon, 1992). There is no reason to assume that the payoff for this frontal lobe increase was initially linguistic; the most likely payoff, from what we know about the functions of the frontal lobe, would have been increased capacity for long-range planning. Barber and Peters (1992) argue that early artifacts support such an increase in foresight: core-based stone tools, which require extensive pre-envisioning of how blades will be struck off from a pre-shaped core, date from 200,000 years ago; and the Lazaret cave, which was left with a pair of wolf skulls neatly guarding its sheltered entrances, argues for an active human imagination by 150,000 years ago.

Language ontogeny and evolution

To support our earlier assertion that there is no clear ontogenetic boundary between pre-language and language, and that therefore there is no reason to expect a sharp phylogenetic boundary either, we present Table 2, which reviews well-known facts about the ontogeny of language in terms of language 'design features' (cf. Hockett 1958). We suggest that a dynamic systems view that sees the 'parts' of language as heterogeneous, interlocking, and overlaid - just like the 'parts' of the vocal tract and its control systems, but more complex and hooked into many more of aspects of cognition - can help us address both evolutionary and developmental questions. Rather than deciding whether our ancestors or our children do or do not 'have' a particular function of the brain, we can consider what kinds of functional precursors could have developed into the present integrated adult system.

TABLE 2: Design features of continuum from pre-language to language, in order of ontogenetic emergence

(modified from Barber & Peters 1992, p. 337)
1) Communication, 2-12 months
Linearity (turn-taking)
Vocal tract as medium (for hearing infants)
External feedback, establishment of dynamic stability (imitation)
2) Contrast (discrete symbols), 10 -15 months
Attention, rejection, other pragmatic functions
Noun-like words, predications
3) Arbitariness, conventionality, 16-20 months
Naming explosion
4) Efficiency
Phonological and syntactic patterning
Productivity, redundancy
5) Displacement, 22-28 months
6) Recursiveness, perhaps 30 months
A major encouragement for this approach is work in cognitive modeling which demonstrates that task-specific modularity can develop from an initially undifferentiated processing system (e.g. Jacobs, Jordan, & Barto, 1991). A relatively undifferentiated neonate cortex might similarly be able to develop a modular structure. Indeed, Posner's studies of reading (Posner & Carr, 1991) have long supported the viability of the notion of ontogenetically acquired modularity, and recent models in various cognitive areas show how automaticity can be acquired through training with internal feedback (Givon, 1995, ch. 9; Markey, 1994). On such a view, the mechanisms supporting a complex hierarchical structure such as syntactically complex language would not have had to emerge all at once.
We assert further that it is not just unnecessary, but also improbable that language appeared full-blown, like Athene from the head of Zeus. We support this claim by considering another property of language development: the intricate interaction between exposure to language and development of a brain that can process it, within the history of each individual.

According to widely accepted interpretations of the stories of language-deprived children (Curtiss, 1977; Johnson & Newport, 1989; Goldin-Meadow, 1979), the human brain can develop syntactically complex language only if it is exposed fairly early to input of sufficient complexity. Home sign - the rudimentary signing used in uninstructed hearing families of deaf children - is not complex enough. Indeed, Christine Yoshinaga-Itano's research group at the University of Colorado has been finding that outreach to tutor such families in ASL (or its English-adapted variants) gives no guarantee of linguistic success for the deaf children; the hearing parents may never reach enough competence in Sign for them to give the children sufficiently complex input. So the syntactic potential of the modern brain may never be realized in such cases. Furthermore, Curtiss (1977) indicates that the syntactic potential itself will degenerate and be lost in the individual, if it is not stimulated to develop.

In order to understand how organisms are shaped by evolution (phylogeny), it is also necessary to consider the development of the individual (ontogeny). A serious ambiguity in the term "capacity" can confuse matters here. "Capacity" is sometimes used to mean what one will have achieved as a mature organism, but at other times to mean what one could have achieved in all other possible worlds. Because we have neural (and other kinds of) plasticity, we are born with the potential to develop in different ways in alternative worlds; that is, we have potentials which will only be realized in particular environments. In this sense, most of the potential of every organism is latent and unrealized, since each of us lives in only one of these possible worlds. For example, chimpanzees have the capacity to learn to communicate with signs and lexigrams (Savage-Rumbaugh, 1986), but they cannot realize that capacity without the special environment of skilled human intervention (cf. Wilkins & Wakefield, 1995:176). Likewise, humans require an environment containing a certain level of language use - more than what some of Yoshinaga-Itano's subjects have been getting - in order to realize their capacity for language. Apparently, the input must be richer, perhaps as rich as a pidgin, for syntactically complex language to develop3. So the first hominids who had a brain with the capacity for language - that is, who had the developmental potential for language - could not have actually realized a syntactically complex language, because there would have been no adequate model for them to hear. This state of affairs might have persisted for millennia, until at least a pidgin-like level had been reached. The phylogenetic development of language might have been excruciatingly slow, spiralling up gradually; generations of pre-syntactic speakers might have put together a few words, gestures, and facial expressions, with recursive structures only at the discourse level.

We cannot assume that a Great Leap Forward took place even once a level comparable to recent pidgins was reached, as one might think from reading Bickerton (1981). Pidgins develop in a world where most of the speakers have command of some other, fully developed language - although, in the worst case, it is a language that no one around them knows. Pidgin speakers have therefore become mentally capable of complex syntax in some language, and so they can push the limited resources of the pidgin to communicate complex ideas, creating patterns of usage which can be grammatized by young learners. Without that kind of model and that kind of early experience, we suggest that humans could only have gradually developed to and through the level of syntactic complexity found in pidgins of the modern era. This presumably happened through gradual abbreviation and automatization, as we see in grammatization of aspects of spoken and signed language (Traugott & Heine, 1991; Frishberg, 1975) and in the development of writing systems (Daniels & Bright, 1996).

Conclusion.

We suggest that psycholinguists should start trying to link up whatever set of 'parts' they propose, neither as boxes nor as homogenous masses, but as interlocking and multiply controlled elements, on the analogy of the vocal tract: components partly hard-wired and partly acquired, partially independent and partially interdependent, having degrees of autonomy from each other and from other cognitive and pre-cognitive systems. This is the only way, we think, to cope with the functional imaging reports that are showing more and more areas of brain activation during language tasks (e.g. Martin, Wiggs, Ungerleider, & Haxby, 1996) and with reports of localization of category-specific naming disorders (Warrington & Shallice, 1984; Sheridan & Humphreys, 1993).

Since the brain is a real-world object and cannot perform language tasks in isolation from meaning and understanding, models must further consider how and to what extent conceptually-distinct types of linguistic and real-world information are integrated. Many specialized, local relations - mini-modules, one might say - may be set up during the acquisition of skill and knowledge.

If we, as linguists and psycholinguists, do not think clearly and creatively - in fact, imaginatively and aggressively - about how language may be represented in the brain, our potential for input to the emerging neurological research paradigms will simply be seen as irrelevant, and our concerns will be ignored. Surely, Language, like its subsystem Speech, is a complex, multiply-controlled system whose parts vary in the nature and richness of their interconnections. We can no longer afford to approach experimental design and data interpretation with arguments that essentially go "It's modular." "It isn't." "It is TOO!" "Isn't!" "Phooey on you!"

NOTES

1. Unless the payoff was in some other area, but it's not obvious what that might be.
2. We might argue that the potential for voluntary phonatory control given by the larger prefrontal lobes was in fact realized in Neanderthal, and that it helped to drive the neck elongation. However, any kind of signaling system - continuous (like pitch range) or discrete (like phonemes), finite (like a small lexicon) or open (with hierarchical structure and recursion) - can profit from having a greater output range, because any increasing in range will makecontrasts among the signals easier to perceive.
3. Recent work on Nicaraguan Sign Language (Morford & Kegl, 1996, Senghas, 1996) may soon help to specify this threshold further.

REFERENCES

Barber, E. J. W., & Peters, A. M. W. (1992). Ontogeny and phylogeny: What child language and archaeology have to say to each other. In J. A. Hawkins & M. Gell-Mann (Eds.), The evolution of human languages (Santa Fe Institute Studies in the Sciences of Complexity, 9, pp. 305-352.) Redwood City, CA: Addison-Wesley.
Bickerton, D. (1981). The roots of language . Ann Arbor: Karoma.
Bock, J. K. (1995). Sentence production: From mind to mouth. In J. L. Miller & P. D. Eimas (Eds.), Handbook of perception & cognition, Vol . 11: Speech, language, and communication (pp. 181-216.) Orlando: Academic Press.
Curtiss, S. (1977). Genie: A psycholinguistic study of a modern-day "wild child" . New York: Academic Press.
Daniels, P., & Bright, W., eds. (1996). The world's writing systems. New York: Oxford University Press.
Deacon, T. W. (1992). Brain co-evolution. In J. A. Hawkins and M. Gell-Mann (Eds.), The evolution of human languages . (Santa Fe Institute Studies in the Sciences of Complexity, 9.) Redwood City, CA: Addison-Wesley.
Frishberg, N. (1975) Arbitrariness and iconicity: Historical change in American Sign Language. Language 51, 696-719.
Givon, T. (1995). Functionalism and grammar, ch. 9. Amsterdam: Benjamins.
Goldin-Meadow, S. (1979). Structure in a manual communication system developed without a conventional language model: Language without a helping hand. In H. A. Whitaker & H. A. Whitaker (Eds.), Studies in neurolinguistics, vol.4 . New York: Academic Press.
Gould, S. J. (1991). The creation myths of Cooperstown. Bully for Brontosaurus (pp. 42-58). New York: Norton.
Hockett, C. H. (1958). A course in modern linguistics. New York: Macmillan.
Irwin, J. V., & Wong, S. P. (Eds.) (1983). Phonological development in children: 18 to 72 months . Carbondale: Southern Illinois University Press.
Jacobs, R. A., Jordan, M. I., & Barto, A. G. (1991). Task decomposition through competition in a modular connectionist architecture: The what and where vision tasks. Cognitive Science 15 , 219-250.
Johnson, J., & Newport, E. M. (1989). Critical period effects in second language learning: The influence of maturational state on the acquisition of English as a second language. Cognitive Psychology , 21 , 60-99.
Lieberman, P. (1995) Manual versus speech motor control and the evolution of language. Behavioral and Brain Science 18 , 197-198.
Lieberman, P., & Crelin, E. S. (1971). On the speech of Neanderthal man. Linguistic Inquiry 2 , 203-22.
Markey, K. L. (1994). The sensorimotor foundations of phonology: A computational model of early childhood articulatory and phonetic development. Ph.D. thesis. Technical report CU-CS-752-94. Boulder CO: University of Colorado, Department of Computer Science.
Martin, A., Wiggs, C. L., Ungerleider, L. G., & Haxby, J. V. (1996). Neural correlates of category-specific knowledge. Nature 379 , 949-652.
Morford, J. P., & Kegl, J. (1996) Grammaticization in a newly emerging signed language in Nicaragua. Fifth International Conference on Theoretical Issues in Sign Language Research, Montreal, Canada.
Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Science 13 , 707-784.
Posner, M. I., & Carr, T. H. (1991). Lexical access and the brain: Anatomical constraints on models of word recognition. TR91-5, Institute of Cognitive and Decision Sciences, U. of Oregon.
Savage-Rumbaugh, E. S. (1986). Ape language: From conditioned response to symbol . New York: Columbia University Press.
Scherer, K. R., & Oshinsky, J. S. (1977). Cue utilization in emotion attribution from auditory stimuli. Motivation and Emotion 1, 331-346.
Senghas, A. (1996). The creolization of agreement in Nicaraguan Sign Language. Fifth International Conference on Theoretical Issues in Sign Language Research, Montreal, Canada.
Sheridan, J., & Humphreys, G. W. (1993). A verbal-semantic category-specific recognition impairment. Cognitive Neuropsychology 10 , 143-184.
Templin, M. C. (1957). Certain language skills in children: Their development and interrelationships . Institute of Child Welfare Monographs, 26. Minneapolis: University of Minnesota Press.
Traugott, E. C., & Heine, B. Approches to grammaticalization, vol. 1. Amsterdam: Benjamins.
Wang, W. S-Y. (1991). Explorations in language evolution. In Wang, William S-Y. (Ed.), Explorations in language (pp. 105-130). Taipei: Pyramid Press.
Warrington, E.K., & Shallice, T. (1984). Category-specific semantic impairments. Brain 107 , 829-854.
Wilkins, W. K., & Wakefield, J. (1995). Brain evolution and neurolinguistic preconditions. Behavioral and Brain Science 18 , 161-226.
Yoshinaga-Itano, C. (1997) Personal communication.
Williams, C. E., & Stevens, K. N. (1972). Emotions and speech: Some acoustic correlates. Journal of the Acoustical Society of America 52, 1238-1250.