Linguistics 431/631: Connectionist language modeling
Ben Bergen
October 31, 2006
Drawbacks of TRACE
Despite its successes, there are some major drawbacks to TRACE
á Repetition of units and connections for the same unit over time
o This would be very hard to learn since backprop is a local learning algorithm
o It predicts that the same unit at a different absolute time in the word will have different properties (false)
á ItÕs insensitive to global parameters like speaking rate, accent, and ambient acoustic characteristics
á Priming is impossible, because activation of a given set of nodes has no relevance to nodes representing those same sounds/words/features elsewhere in the utterance.
á All of these limitations derive from the fact that there arenÕt unique representations for given linguistic structures.
The COHORT model
The COHORT model of speech perception is the other major theory and is based on a few simple ideas
á Upon hearing the start of a word, all potential candidates (the ÔcohortÕ) become activated
á As more of the word is heard, candidates that donÕt fit the signal or the context drop out of the cohort until only one is left
á The speed of recognition depends mainly on the wordÕs uniqueness point
The COHORT model has been rendered in a connectionist architecture, the Distributed COHORT model.

This model differs from those weÕve seen so far:
á It does not have lexical representations per se
á Rather a hidden layer mediates between the phonetic representation on the one hand and phonological and semantic representations on the other
á The representations at each level are distributed
o In localist models (like TRACE) ambiguity is represented by the co-activation of two nodes, each representing one of the possible interpretations
o In COHORT, an ambiguous phonetic input will yield interference at the semantic and phonological levels
á Input to this recurrent network is passed in sequentially, so given the beginning of a word, the output layers will have activation over the various lexical possibilities, which as the word progresses towards its uniqueness point, will subside
o Input [kaept] will yield activation in the semantic and phonological nodes for ÔcaptainÕ and ÔcaptiveÕ.
o As soon as a nasalized [I] is input, the representation for ÔcaptiveÕ will no longer be active, and the other will win out.
More distributed COHORT
Evidence for parallel activation of a cohort (Zwitserlood 1989)
á Present subjects with /kaept/, which is ambiguous between captain and captive
á Then present a word related to either of those continuations Ð like ship and guard
á ThereÕs facilitory priming of both of those related words
Localist vs. distributed models
á In localist models, remember, an arbitrarily large number of lexical nodes can be active at the same time, although there may be some competition between them.
á However, in the distributed model, since a single array of nodes represents all the words in a distributed fashion, this isnÕt possible.
á The activation of a given node or set of nodes can be calculated in a distributed model as the root mean squared error between the target activation and the desired activation.
á Gaskell & Marslen-Wilson ran a number of statistical simulations to determine how well the distributed COHORT model could represent a number of competing lexical items at various cohort sizes
o Each word was represented as a set of 1s and 0s distributed evenly over 200 nodes.
o G&M compared the RMS error of members of the cohort with that of mismatched words at a number of cohort sizes.
á The results showed that when the cohort size is large, the model does not effectively distinguish members of the cohort from mismatched words, but it does much better when the cohort size is small.

Summing up
Connectionist models of speech production and perception use a similar range of methods
á Representational approaches
o Distributed (distributed COHORT)
o Connectionist (TRACE)
o Hybrid (Aphasia Model, Phonological Error Model)
á Dealing with time
o Spatialization (Aphasia Model)
o Sequentialization: using recurrent networks (distributed COHORT, Phonological Error Model)
o Hybrid (TRACE)
á Activation flow
o Feed-forward (COHORT)
o Interactive activation (TRACE, Aphasia Model)
o Hybrid (Phonological Error Model)