Those who heavily rely on both traditional monolingual and bilingual English dictionaries to look up meanings of a new word may often find little information about how a word co-occurs with other words in a natural context of use.
The meaning of a word can be unambiguously understood if we understand a word as a unified unit, and if its instances of use from naturally occurring sources are clear.
With the progress of information technology, a collection of natural occurrences of words from both spoken and written medium can now be easily accessed via the Internet. This collection, known as a corpus (or corpora, its plural form), has been widely used for pedagogical purposes.
One of its purposes is for the teaching of the English language to non-native speakers worldwide. The availability of corpora has in fact benefited teachers in designing teaching materials.
Rather than basing their pedagogical decisions solely on the dictionary and textbooks, which are mostly designed based on intuition, teachers can now ground their decisions on actual patterns of language use that are well-documented in corpora. The argument proposed in favor of the use of corpora is that they are a more reliable guide than the intuition of native speakers.
While most traditional dictionaries and textbooks often fall short of information related to language use in terms of varied registers (i.e. academic, journalistic, news, literary), formality (i.e. colloquial, formal, frozen), and frequency of use (i.e. high or low frequency), corpora can remedy this limitation by providing rich insights into these aspects of language use.
Among the available online corpora are the British National Corpus (BNC), the Lancaster/Oslo-Bergen (LOB) Corpus, the Helsinki Corpus of English Texts, the Cambridge and Nottingham Corpus of Discourse in English (CANCODE), the International Corpus of Learner English (ICLE), and the Michigan Corpus of Academic Spoken English (MICASE).
A noncommercial corpus which can be readily accessed is Mark Davis’ Corpus of Contemporary American Corpus (COCA), which is freely available online at www.americancorpus.org. Users can register simply by typing in their email addresses and a password.
This corpus offers the frequency of word occurrences in an assortment of registers such as spoken, fiction, newspaper, magazine and academic. The search string displays word(s) one wishes to type; collocates, to which a word can co-occur; and a post list, a list of word class one wishes to search. To activate both collocates and post list, one can simply click them.
How can we apply the use of a corpus in searching for a word, for which a dictionary does not provide a sufficient account for its patterning?
Let’s take commonly spoken phrases such as think of and think about. It has been observed that non-native English find difficulties in understanding how these two phrases are used (with which lexical units they can co-occur) and what exactly they mean.
A corpus of informal spoken conversation, for example, will display numerous examples of use of these two distinct phrases, from which the users can see their range of syntactic patterning, and can eventually interpret their distinct meanings. These phrases are displayed in bold at the center of the screen, known as the node word.
Given the shortcoming of most traditional dictionaries, lexicographers are now aware that they should reduce their reliance on native speakers’ intuition in dictionary making, and create contemporary user dictionaries by incorporating insights from corpus studies. This certainly adds to the rigor of the dictionary making process.
Yet, due to the limited space a printed dictionary allows, only few instances of natural language use can be included. Thus, corpora can be considered a perfect supplement to this space limitation.