CTI Textual Studies Q & A

L a n g u a g e
S t u d i e s

Q Can you assist me in my search for contemporary English language corpora, including legal and economic documents?

A I hope that the following mixed bag of links will be of some help. Finding downloadable corpora of modern English is a little more difficult than anticipated. I think there are more datasets for US law and politics, for example, than UK. Let me know if American English is amongst your requirements.

Information about available corpora can be found at the following sites:

Michael Barlow, "Corpus Linguistics" http://www.ruf.rice.edu/~barlow/corpus.html (Probably one of the most extensive lists of available corpora particularly for different languages).

The Oxford Text Archive, whilst not having that many texts freely available for downloading now, does have a catalogue of texts etc. which are freely available on signing of a licence form. You may find items of interest in the catalogue under English-Collections/Corpora at http://sable.ox.ac.uk/ota/

The Linguistic Data Consortium (http://www.ldc.upenn.edu/) and the Summer Institute of Linguistics (http://www.sil.org/)might have further items of interest though they don't have much in the way of downloadables. See also the WWW site of the Linguist email forum at http://linguist.tamu.edu/linguist/

The British National Corpus might be worth looking at. The online version is not yet available but the BNC itself contains a 100 million word sample of modern British English drawn from a variety of written and spoken sources. More information including contact details are available at http://info.ox.ac.uk:80/bnc/

For the language of law and politics you might wish to look at the HMSO Lords page at http://www.parliament.the-stationery-office.co.uk/pa/ld/ldhome.htm which has proceedings from Hansard, judicial business etc. The IOLIS CD-ROM produced by the TLTP Law Consortium apparently has a large corpus of English legal material on it. The corpus, however, is not available apart from the rest of the CD-ROM and you need to use its built in search tools. It might be worth enquiring if it is available on a network or in a law department at at your institution. One of the better starting points for political resources is based at Keele University (http://www.keele.ac.uk/depts/po/psr.htm).

For economics the Virtual Library of Economics might be of use (http://www.helsinki.fi/WebEc/) and I am sure the Social Sciences gataeway at Bristol (SOSIG) will also be useful (at http://www.sosig.ac.uk/).

Many of the smaller corpora tend to be tucked away. For example, I discovered on my travels, transcriptions of student/advisor sessions from Columbia at ftp://ftp.cs.columbia.edu:/pub/fuf/ (see the file corpus.readme).


Q & A

Email CTI Textual Studies

HTML Author: Sarah Porter
Document created: 27 May 1997
Document last modified:

The URL of this document is http://info.ox.ac.uk/ctitext/enquiry/lan01.html