Linguistics 431/631: Connectionist language modeling

Ben Bergen

 

Meeting 9: Speech perception

October 25, 2006

 

Speech perception

 

The basic problem: how do you go from an acoustic input to retrieving morphemes and words?

 

Like production, two approaches to speech perception: feed-forward & interactive activation

 

TRACE: An early and influential interactive activation model (McClelland & Elman 1986).

 

Today: how TRACE works. Next week: A recurrent model of speech perception.

 

Architecture of TRACE

 

Three layers: feature level, phoneme level and word level.

      The feature level has 7 features: power, vocalicness, diffuseness, acuteness, consonantal, voicing, burst amplitude. A set of values (ranging from 0-8) on each of these is a characterization of speech input at a given time slice. Units representing these 7 features and their 8 vales are multiplied spatially over the maximum number of time slices for a word.

      The phoneme units each span 6-11 time slices and start every three time slices.

      The word units start every 3 slices, and cover a span corresponding to the length of the word

 

Each unit inhibits incompatible units at the same level and has excitatory connections to compatible units at other levels. All connections are bi-directional.

 

Featural input is presented one slice at a time, from left to right.


Properties of TRACE

 

Phonotactic rule effects

      Ss presented with a stimulus with an ambiguous segment, making it ambiguous between two non-words, tend to identify the phoneme as conforming to the languages phonotactics.

      E.g. presented with something in between /sli/ and /sri/, subjects will be more likely to identify the second segment as /l/.

      TRACE also displays this behavior, because a sequence of sounds will partially activate lexical representations for words with those phonemes (in this case words like sled and slip, and these will in turn activate the /l/ at the phoneme level

 

Lexical effects on phoneme perception.

      Given a lexicon containing plug, plus, blush, and blood, an ambiguous input {p,b}lug will in humans and in TRACE be perceived as having an initial p.

o      In TRACE, this is due to activation fed back from the plug node at the word level to the p, l, u, and g at the phoneme level.

      Lexical effects in word-initial targets are seen with some delay in both.

o      In TRACE, this occurs simply because the word node takes longer to activate when early input is ambiguous

      They are strongest at the ends of words in both.

o      In TRACE, this is because when the initial segments are unambiguous, the correct word is already strongly activated when the final segment is presented.