Linguistics 431/631: Connectionist language modeling

Ben Bergen


Meeting 9: Speech perception

October 25, 2006


Speech perception


The basic problem: how do you go from an acoustic input to retrieving morphemes and words?


Like production, two approaches to speech perception: feed-forward & interactive activation


TRACE: An early and influential interactive activation model (McClelland & Elman 1986).


Today: how TRACE works. Next week: A recurrent model of speech perception.


Architecture of TRACE


Three layers: feature level, phoneme level and word level.

      The feature level has 7 features: power, vocalicness, diffuseness, acuteness, consonantal, voicing, burst amplitude. A set of values (ranging from 0-8) on each of these is a characterization of speech input at a given time slice. Units representing these 7 features and their 8 vales are multiplied spatially over the maximum number of time slices for a word.

      The phoneme units each span 6-11 time slices and start every three time slices.

      The word units start every 3 slices, and cover a span corresponding to the length of the word


Each unit inhibits incompatible units at the same level and has excitatory connections to compatible units at other levels. All connections are bi-directional.


Featural input is presented one slice at a time, from left to right.

Properties of TRACE


Phonotactic rule effects

      Ss presented with a stimulus with an ambiguous segment, making it ambiguous between two non-words, tend to identify the phoneme as conforming to the languages phonotactics.

      E.g. presented with something in between /sli/ and /sri/, subjects will be more likely to identify the second segment as /l/.

      TRACE also displays this behavior, because a sequence of sounds will partially activate lexical representations for words with those phonemes (in this case words like sled and slip, and these will in turn activate the /l/ at the phoneme level


Lexical effects on phoneme perception.

      Given a lexicon containing plug, plus, blush, and blood, an ambiguous input {p,b}lug will in humans and in TRACE be perceived as having an initial p.

o      In TRACE, this is due to activation fed back from the plug node at the word level to the p, l, u, and g at the phoneme level.

      Lexical effects in word-initial targets are seen with some delay in both.

o      In TRACE, this occurs simply because the word node takes longer to activate when early input is ambiguous

      They are strongest at the ends of words in both.

o      In TRACE, this is because when the initial segments are unambiguous, the correct word is already strongly activated when the final segment is presented.