Linguistics 431/631: Connectionist language modeling
Ben Bergen
October 10, 2006
Speech production
Connectionist models have been used to model various aspects of the production of speech for a long time.
There are two basic types of model
á Sequential models: no feedback between modules Ð information is passed forward only
á Interactive activation models: downstream nodes provide feedback to upstream nodes
Producing speech is usually seen as involving the following components:
á Message construction: putting together a non-verbal representation of the utterance
á Utterance formulation: taking the message and translating it into a linguistically structured, sequential utterance
o Grammatical encoding: accessing and ordering words appropriate to the message
o Lexical access: retrieving and organizing the sounds of those words for articulation
Lexical access
Lexical access is the process of going from a semantic representation of a word to the sounds used to express it.
There are some ways in which this is less than transparent
á The semantic features to be expressed do not uniquely select a word Ð context plays a large role too
Sheila, she, mom
á The output of lexical access is a sequence of sounds, which means that the mapping from meaning to sound is from a static to a dynamic representation
o WeÕve been unrealistically representing time spatially
á The output is more than just a sequence of units, but are related to one another by the prosodic character of the word
á Words with similar meanings do not necessarily have similar sounds
mom,
dad
o Take just four words: man, woman, mother, and father
o The input is two nodes: female (0,1) and parent (0,1)
o The output is whether the word has an initial /m/
o The relationship is effectively and XNOR
o This implies that there must be an intermediate level of representation between meaning and form
The architecture of most speech production models thus also includes a lemma layer
á The words n their non-phonological, non-semantic form are represented
á Often along with their grammatical features
á Support for this comes from evidence that speakers experiencing the tip-of-the-tongue (TOT) phenomenon often are able to retrieve grammatical features of the word theyÕre searching for, such as grammatical gender

The
aphasia model
Speech errors
á Formal errors
ÒShe knew a BolshevikÓ -> ÒShe knew a bolster pillowÓ
á Semantic errors
ÒI ate the cakeÓ -> ÒI took the cakeÓ
á Mixed errors
ÒI tripped in the barÓ -> ÒI stripped in the barÓ
á Nonword errors
ÒHe stubbed his toeÓ -> ÒHe tubbed his toe.Ó
Aphasia
á When damage to the brain affects language centers, speech errors of the various types become more frequent
á Depending on the brain area affected, recovery can be more or less frequent and more or less complete
The aphasia model
á This is an interactive activation model
o First the semantic layers is activate, which activates the lemma level, which activates the phoneme level
o Feedback proceeds back through the layers until at a given time slice, a lemma is selected
o A burst of activation is then passed to the phoneme level from that lemma.
á In the aphasia model, there was a decay constant Ð random noise on each connection.
á Decreased performance could result from increased noise or decreased connection strengths
The phonological error model
The aphasia model fails to account for the dynamic nature of the phonological output.
The phonological error model described in Dell et al., however, produces a dynamic output
á The input is a lexical representation Ð this could be
o A representation of a lemma
o A semantic representation
o A representation of underlying phonological structure
á There is a hidden layer
á The output is a pattern of activation over 18 phonological features for the particular phone to be produced
á Two sets of context nodes make exact copies of the output and hidden units respectively, and feed these back into the hidden units

For example, to produce the word ÒdogÓ /dag/
1. The static lexical representation is input, with the internal context nodes set to 0 and the external context nodes set to 0.5 (which represents a word boundary)
2. The hidden units become active to some degree
3. This activates the output, which matches more or less closely the first sound in the word, /d/
4. Through backprop, weights are modified such that the actual output is closer to the desired output
5. The output and hidden layer activations are copied to the context nodes
6. The hidden units become active again, now with input from the lexical nodes and both sets of context nodes.
7. The output approximates the second sound, /a/, more or less well
8. And so on
This model is not only able to learn to produce the dynamic activation patterns that correspond to speech production, but can also model speech errors.
General properties of speech errors
á They tend to preserve phonotactics
nine cats -> kine cats, not Nine cats
á They reflect syllable structure Ð affecting coda or rime
nine
cats -> kine cats, not nites
cats
á There are lots of exchanges of speech sounds
nine cats -> kine nats
These have been used as evidence for a distinction between phonological content (the set of segments) and the phonological frame it occurs in (which may include syllabic and stress information).
The phonological error model displays the first two of these error properties.
á Phonotactics is preserved simply because the network has been trained on a large number of words of the given language, and therefore follows its encoded well-formed sequences of sounds even when producing errors.
á Onsets are most likely to be affected because
o There is no input from the context layers, which makes it less constrained
o There are more possible sounds in onset position, so there is less certainty which one it will be.
However, there are places where a model that implements syllable position explicitly like the aphasia model would be more useful
á Complex onsets are treated in errors much like simple onsets, which is not predicted by the phonological error model
á Exchanges are not predicted by the phonological error model.