How to find correct collocations using a corpus

British National Corpus (BYU-BNC)

A corpus (korpus) is simply a collection of texts of a language, usually searchable and accompanied with some additional information. In case of BNC, the language is British English and the corpus is available for free, which makes it a great tool for students of English (they also have a larger corpus of American English).

You can find the corpus on http://corpus.byu.edu/bnc/. You have to register to be able to use it, but the registration is easy and free of charge. (Still, to try it only, you might be able to run it even without a registration.)

Do not hesitate to contact me by e-mail for some help on how to use the corpus.

First easy task

Let's say that we want to know the preposition (předložka) that goes together with the word interested - as in I am interested ________ football. (zajímám se o fotbal).

First, you have to think up what you actually want to look for in the corpus. Here, we want to know which preposition follows right after the word interested.

In the left column of your browser window (with BNC opened) you can see the section "SEARCH STRING". This is where you write your queries (that is questions for the corpus). So first of all, fill in the word interested into the "WORD(S)" field. That way you tell the corpus that you want to know something about the word interested.

Next, turn on the other input fields by clicking on the texts "COLLOCATES" and "POS LIST". We want to know the word right after the word interested, so select 0 in the first drop-down (for 0 words before interested) and 1 in the second drop-down (for 1 word after interested). That way, we can look only for the 1 word after interested, that is, the word that immediately follows.

Now you can hit the "SEARCH" button to get some results (the result should be something like this). As we are lucky here, the preposition "IN" appears first in the results, being the correct word to fill the sentence I am interested in football. Also note the number in the "TOT" column, indicating that interested in can be seen 5569 times in BNC. Still, there are some other words as well which we could filter out - we are interested in a preposition, and words like parties or bodies are definitely not prepositions!

In the SEARCH STRING section, look at the options in POS LIST. There you can choose which Part OSpeech (slovní druh) you want to search for. In this case it is a preposition, so choose "prep.ALL" (for all prepositions - there isn't any other choice for prepositions anyway). Hit the "SEARCH" button again…et voila, you get a single result, the preposition IN. So this tells you that in is the best preposition after interested, and that it is even the only possible one.

Futher tasks

With some very common verbs (slovesa) for example, you can get many results. Try to look for all prepositions that follow the word look - but before you do that, try to guess which 4 prepositions could be the most frequent (and probably even in which order). Now do the search (actually just exchange the word interested for the word look in your previous search). These should be the results. Did you guess correctly? Do you know the meanings of all the first four usages of look? If you are not sure, click on the preposition in the results to see sentences in which it appears. (If you noticed the 5th result which is of - or it was when I did the search - it might seem a bit surprising to you. However, if you click the word of, you will soon notice that in that case, look is not used as I verb (sloveso) but as a noun (podstatné jméno). If you want to search for look as a verb only, you can change the search word from look to look.[v*], which means look as a verb. See the results.)

Let's do one more search. First, try to remember which preposition you use after the word worried (as in Where were you yesterday? I was worried ________ you.). Next, search for it (and get these results). You can see that the preposition you probably expected is the first one in the list, so congratulations to you :-) But there are some more. You can usually ignore the lowest numbers (if the best has 1575 like here, do not care about words with eg. 12 or less here - over, to, on, as… - these are either mistakes or some special situations). But even then, 194 for by is still lower than the 1575 but it is not negligable (zanedbatelný) either. So this is a hot candidate for further investigation (click the word "BY"). This is definitely also correct, but it has a slightly different meaning. Can you understand the difference in the meanings from the sentences?

Follow up

Try to do some more searches. Better than inventing things, try to find an exercise you have, maybe something from FCE Use of English, and try to find the correct answer using the BNC. There is a lot of help included, just click the queston marks (otazník) or the link "HELP…" - or ask me if you are desperate! :-) (There is a limit of 100 queries per day for you.)

Please let me know if you tried the tasks, how successful you were and, most importatntly, if you have found the BNC useful. My e-mail address is rur@seznam.cz.




 
<< back to the main page