Ling 431/631: Corpus Linguistics

Ben Bergen

 

Meeting 7: Syntax I

October 1, 2007

 

Corpora have been widely used to study syntax. This can be done in untagged or part-of-speech tagged corpora, but just as you get much more power in your searches of words when you use part-of-speech tagged corpora, so you gain more power in your searches of syntax when you use a parsed corpus. We're going to concentrate on things you can do with an unparsed corpus this week, then move on to parsed corpora.

 

Some syntactic phenomena that you can straightforwardly study using a corpus-based approach

 

Heaviness and Newness (Arnold et al., 2000)

 

Word order, even in a relatively rigid word order language like English, is variable

 

a. The waiter brought the wine we had ordered to the table.

b. The waiter brought to the table the wine we had ordered.

a. Chris gave a bowl of Mom’s traditional cranberry sauce to Terry.

b. Chris gave Terry a bowl of Mom’s traditional cranberry sauce.

a. Sandy picked the freshly baked apple pie up.

b. Sandy picked up the freshly baked apple pie.

 

Two factors that have been called upon to explain which of these ordering patterns are used

 

"Put more grammatically complex or longer elements later (to facilitate processing/production)"

It turns out that different definitions of heaviness all correlate strongly - so it doesn't matter whether you define it in terms of number of words or number of nodes is a hierarchical structure. (Good news if you don't have a parsed corpus.)

"Put new things later (according to general tendency for topic-comment structure)"

Hand-coded NPs for whether they were given (mentioned in previous discourse), inferable (that is, inferable from previous discourse) or new (unmentioned in previous discourse).

 

To fruitfully address this question, the authors use a combination of corpus and laboratory studies. We'll focus on the former.

 

 

 

Corpus study

 

Used the Aligned-Hansard corpus

 

Searched for examples of HNPS and DA

 

Procedure

 

Results

 

For give:

  • When theme is shorter than goal, dative is always used (give a book to your three cousins) but when them is longer than goal, double-object is nearly always used (give Mary the new book).
  • Newness plays a role when they are equal in length - the new constituent goes later.

For HNPS:

  • When DO is longer than PP, DO tends to go later, and vice versa.
  • Newer DO tends to go later.