Ling 431/631: Corpus Linguistics

Ben Bergen

 

Meeting 11: Norming stimuli

October 29, 2007

 

Norm!

 

Linguistic stimuli to be presented to participants are used in most linguistic methodologies: laboratory-based, survey-based, elicitation-based, even introspection-based.

 

These stimuli, being by definition non-identical, will differ along a number of dimensions that may influence results, depending on the method. Some relevant dimensions of variation [vaguely from easiest to hardest to norm yourself]:

 

 

You can probably come up with tasks in which each of these might be factors.

 

 

Work that's been done for you

 

The MRC psycholinguistic database [http://www.psych.rl.ac.uk/] Contains 150837 words with up to 26 linguistic and psycholinguistic attributes for each:

 


Number of letters

Number of phonemes

Number of syllables

Brown written frequency

Brown number of categories

Brown number of samples

T-L written frequency

Brown verbal frequency

Familiarity rating

Concreteness rating

Imagability rating

Meaningfulness [Colorado]

Meaningfulness [Pavio]

Age of aquisition rating

Type [variant of other word]

POS [10 categories]

Common POS [N/V/Adj/Other]

Alphasyll [pref/suff/abbrv/hyph]

Status [colloquial/dialect/alien]

Variant phoneme

Capitalization of words

Irregular plural

Phonetic transcription

Edited phonetic transcription

Stress pattern


 

The English Lexicon Project [http://elexicon.wustl.edu/] includes the length, frequency, orthographic neighborhood size, bigram frequency, pronunciation, length [in phonemes, syllables, and morphemes], POS, as well as:

 

 

Adam Kilgariff's BNC word frequency counts [http://www.kilgarriff.co.uk/bnc-readme.html] provide:

 

sort-order

frequency

word

word-class

5

2186369

a

det

2107

4249

abandon

v

5204

1110

abbey

n

966

10468

ability

n

321

30454

able

a

     

 

Norm it yourself

 

Some things you can do

 

 

Statistical Analysis

 

The goal of norming is to determine that two or more sets of stimuli are not significantly different along some dimension or set of dimensions.