Linguistics 431/631: Connectionist language modeling

Ben Bergen

 

Meeting 5: Morphology

September 19, 2006

 

Rules vs. emergence

Language users display knowledge of systematicity in their linguistic systems.

      These can be characterized as rules

o      Generalized from exposure to language

o      Innate, and discovered through language exposure.

      On an alternative view, the rule-based behavior is the result of a mechanism in which there is no explicit representation of a rule

o      This perspective is known as emergentism

o      Rummelhart and McClelland (R&M) use honeycombs as an example of emergent structure

 

Acquisition of morphology

Children acquiring English past-tense tend to go through three stages (not clearly demarcated).

      Stage 1: children use a few frequent and irregular past tense verbs, such as went, had, gave, took, etc.

      Stage 2: children use many more past tense verbs, including some new irregular verbs, but mostly a lot more regular verbs, like looked and dropped.

o      Theres evidence that children know a rule at this stage they generalize the regular past-tense formation scheme to new verbs, like wug.

o      They also incorrectly apply the regular past tense to irregular verbs they knew in stage 1.

      Stage 3: children/adults use regulars & irregulars, including groups of irregulars, e.g. ing-ung verbs.

 

R&M wanted to see if they could capture these aspects of acquisition in a simple learning model that had no way to learn explicit representations of rules.

      They built a connectionist model, which crucially had only an input and an output layer, representing the present and past tense forms of verbs, respectively.

      The forms were represented in terms of Wickelfeatures, a pared-down version of Wickelphones.

o      Wickelphones represent each segment in a word along with its directly preceding and directly following context - top [tap] would be represented as three Wickelphones: #Ta, tAp, and aP#

o      Wickelfeatures break down phones into phonological features: place (front, middle, back), type (interrupted, continuant, vowel), subtype (stop, nasal; fricative, liquid; high, low), and voiced/voiceless and short/long.

      When all is said and done, each three-phone word can be represented as a pattern of activation over maximally 48 (of the total of 460) Wickelfeatures

      Notice that this representation is localist but also distributed.

o      In a localist representation, each node represents some thing

o      In a distributed representation, each thing is represented as a pattern of activation over a set of nodes.

 


R&Ms simulations

 

Their data were composed of

      10 high-frequency verbs (e.g. come, get, give, look, take, go, have, etc. 8 irregular and 2 regular)

      410 medium-frequency verbs (334 regular and 76 irregular)

      86 low-frequency verbs (72 regular and 14 irregular)

 

They assumed:

      Children learn first about the present and past tenses of the most frequent verbs

      Although children will learn present and past tenses of different verbs throughout acquisition, they mostly learn the past tenses of verbs they already know the present tense form of.

 

Course of training

      They trained the model on the high-frequency verbs for 10 cycles

o      the product of which is interpreted as being like Stage 1

      Then added the medium-freq. verbs and gave the system 190 more learning trials with all 420 verbs

o      responses early in this phase correspond to Stage 2

o      the end of this phase corresponds to Stage 3

      Finally the low-frequency verbs were presented to the network without training it on them

 

Results

      Up until the 11th trial, there is no difference in performance on regular versus irregular verbs

      Then through the course of training on the larger verb set, the regular verbs show better performance than the irregular verbs.

      Phase 2 does not interfere with regular verb performance, but it does affect irregular verb performance (in a so-called U-shaped learning curve)

                                                                       

What kind of errors is the network is making during U-shaped learning?

      In Phase 1, the correct responses are quickly found

      In Phase 2, many irregular verbs are regularized, but these are progressively replaced by the correct past tense forms

      By contrast, the regular verbs are learned on a progressively increasing curve.

 

Summary of results

      The models goes through the three stages identified in the child language acquisition literature

      It captures most differences in performance on different types of regular and irregular verbs

      It responds appropriately to verbs it has seen during training and also to verbs it has never seen before.