Ling 431/631: Corpus Linguistics

Ben Bergen

 

Meeting 10: CHILDES

October 22, 2007

 

CHILDES

 

The CHIld Language Date Exchange System is a free online collection of transcribed interactions between children and adults, intended to be shared for child language acquisition research.

 

The collection:

 

Transcripts contributed to CHILDES are encoded using the CHAT notational scheme, which includes tags for a variety of things. The sample from the article:

 

@Begin

@Languages: en

@Participants: ROS Ross Child BRI Brian Father

*ROS: why isn't Mommy coming?

%com: Mother usually picks Ross up around 4PM

*BRI: don't worry.

*BRI: she'll be here soon.

*ROS: good.

@End

 

You can probably figure out what all the special characters are for.

 

In addition, there are the usual array of more advanced features you would like to have in a corpus:

 

 

 

The CLAN system [Computerized Languange ANalysis] is a suite of tools for analyzing language data tagged in the CHAT format. It can be downloaded from the same site.

 

It includes functions for:

 

Strengths

 

Weaknesses