Data-Driven Language Learning
1. What is Data-Driven Language Learning?
"Data-driven language learning refers to giving students large quantities of language data (corpus) and the tools (concordancer) to examine it. Students can then build their own explanations of how language works. Having discovered the linguistic rules themselves, students are more likely to remember and use them. " (from Warschauer & Healey, 1998, Language Teaching, 31, 57-71)
- A collection of naturally occurring language text, chosen to characterize a state or variety of a language.
- A collection of texts, spoken and/or written, which has been designed and compiled based on a set of clearly defined criteria.
- A collection of texts when considered as an object of language or literary study.
3. What are the Benefits of DDL & the use of Corpora in the classroom?
- encourages active and student-centered learning
- provides better quality of language samples
- provides fast and reliable tools for searching examples
(from Chen, 2004, Internet TESL Journal)
A concordancer is a computer program that is able to search rapidly
through large quantities of text for a target item and then print out
all of the target examples in context.
- popular concordancer programs
- MonoConc Pro 2.2 (Beginner & Classroom, Not free)
- WordSmith (Advanced & Research, Not free)
- TextStat (Free concordancer)
of MonoConc & WordSmith
5. Where can I find the plain texts on the web?
- Balanced Corpora:
- Brown Corpus (1 million words were sampled from 15 different text categories in 1961 to make the corpus balanced).
- BNC (British National Corpus - 100 million word collection of samples of written and spoken English).
Online Text Archives To Create your own Corpus:
- Project Gutenberg (over 100,000 books available, started by Michael Hart, 1971, Univ. of Illinois)
- Archives.org: the Open-Access Text, Audio, Video Archives
- Google Book Search (Search for Books, Magazines, ...)
- More than 7,000,000 books & magazines have been scanned, since 2004. (as of Dec. 2008)
- 10/28/2008, "Authors, Publishers, and Google reach landmark settlement. Copyright Accord would make millions more books available online"
- Google Books Library Project - "Our ultimate goal is to work with publishers and libraries to create a comprehensive, searchable, virtual card catalog of all books in all languages that helps users discover new books and publishers discover new readers"
- Bibliomania (Literature with study guide)
- ManyBooks (Literature), FeedBooks (books & news in many formats), WOWIO (books - Registration required)
(* For e-book reading, there are several e-paper based devices such as iRex iLiad, Sony Reader, Amazon Kindle, Bookeen CyBook, Hanlin eReader, and so on)
- Author, Title, Category, and Languages with various download formats
- Example: Of human bondage (in 20 different formats)
- Script-O-Rama (Scripts for movies, tv dramas)
- Encarta (Encyclopedia - excellent sources for standard English)
- TheLiterature Page (classic books, plays, speeches, poems, and essays)
- Awesome Film Script (movie script compendium. Some links may not be current.)
- Audio Books (mp3)
- Audio Gutenberg
- Telltale Weekly
- Literal Systems
- Create an audio file on your own
- TextAloud (google keyword: text aloud)
6. Is there any online concordancer?
- BYU Time Corpora (100 Million Words, 1923-2006)
- Try "always" to check the location of the adverb "always".
- Try "him to", ... to see what kind of verbs have the form "Verb + object + to-infinitive"
- Leeds Reuter Corpora (Sharoff, 2006)
- Try "a piece of" to check what kind of noun is followed. "a cup of", "a number of"
- Hong Kong VLC & VLC1 (Hong Kong Virtual Learning Center)
- Try "explanation" to see what phrase is followed.
- Try "difficulty" to see what phrase is followed.
- Web Corpus (UK, Web as a Corpus)
- LDC (Linguistic Data Consortium, Registration required)
7. Example Lessons for Data-Driven Learning
- 6 Teaching Units for Data-Driven Learning (by Passapong Sripicharn, University of Birmingham)