Ling
431/631: Corpus Linguistics
Ben
Bergen
Meeting 7: Syntax I
October 1, 2007
Corpora have
been widely used to study syntax. This can be done in untagged or
part-of-speech tagged corpora, but just as you get much more power in your
searches of words when you use part-of-speech tagged corpora, so you gain more
power in your searches of syntax when you use a parsed corpus. We're going to
concentrate on things you can do with an unparsed corpus this week, then move
on to parsed corpora.
Some syntactic
phenomena that you can straightforwardly study using a corpus-based approach
Heaviness
and Newness (Arnold et al., 2000)
Word order,
even in a relatively rigid word order language like English, is variable
a. The waiter brought the wine we had ordered to the table.
b. The waiter brought to the table the wine we had ordered.
a. Chris gave a bowl of Mom’s traditional cranberry sauce to
Terry.
b. Chris gave Terry a bowl of Mom’s traditional cranberry
sauce.
a. Sandy picked the freshly baked apple pie up.
b. Sandy picked up the freshly baked apple pie.
Two factors
that have been called upon to explain which of these ordering patterns are used
"Put more grammatically complex or longer elements later (to
facilitate processing/production)"
It turns out that different definitions of heaviness all correlate
strongly - so it doesn't matter whether you define it in terms of number of
words or number of nodes is a hierarchical structure. (Good news if you don't
have a parsed corpus.)
"Put new things later (according to general tendency for
topic-comment structure)"
Hand-coded NPs for whether they were given (mentioned in previous
discourse), inferable (that is, inferable from previous discourse) or new
(unmentioned in previous discourse).
To fruitfully
address this question, the authors use a combination of corpus and laboratory
studies. We'll focus on the former.
Corpus
study
Used the Aligned-Hansard
corpus
Searched for
examples of HNPS and DA
Procedure
Results
|
For give:
|
|
|
For HNPS:
|
|