Despite its provocative title, this lunchtime seminar given at King's College London on 16 Jan by Tony McEnery (Lancaster University) was both persuasive and thoughtful. I am pleased to report also that the CHC at Kings had on this occasion done itself proud in sign-posting the whereabouts of the event; Claire and I thus found it almost impossible to get lost on our way down to the second basement Room 2B13 beneath the Strand, to join a small select group (about half a dozen) of other corpus nuts.
Tony began however by giving a brief introduction to corpus linguistics, and its origins in the tension between rationalists, most clearly exemplified by Chomsky, and the older British empiricist tradition. He positions himself at the midpoint, relying on theoretical insight to explain empirical data (and thus annoying extremists of either persuasion). His particular interest is in collocation, more particularly the extent to which demonstrably frequent collocates are consciously identified by human respondents with or without benefit of computer usage. For example, we probably all associate `Great' or `bulldog ' or `stiff upper lip 'with the term `British'; but all the evidence in the BNC indicates that in fact the word most frequently closely associated with `British' is `breakfast'.
Some writers on collocation (eg Kjellmer) have seen it as a way of identifying well structured fixed multi-word sequences that operate as if they were single units; others (notably Renouf and Sinclair) have seen it rather as a means of identifying fixed phrases containing variable slots. Of particular interest in the latter case is the nature of the subcategorical constraints affecting what may fill the slot, since such constraints may be syntactic, or phonological, or (most strikingly) semantic or a combination. Hoey, following Halliday, proposes the term colligation for the specific case where the constraints are purely syntactic and gives a good summary of what a study of collocations should determine: see his presentation at PALC '97.
Whatever it is that determines our preferences for one near-synonym over another in particular contexts, there is plenty of evidence that speakers don't choose haphazardly; McEnery, together with colleagues Paul Baker and Andrew Wilson, is running a little experiment to see to what extent they choose consciously. If choices are made systematically, there is a good case for regarding the underlying system as an important part of linguistic description, and thus of extending the scope of linguistics to include them. McEnery's particular claim is that human estimates of collocational preferences are often wildly wrong, when compared with statistically derived measures, which suggests that there is a gap in our understanding of how language functions. Church and Hanks demonstrated that the near synonyms `powerful' and `strong' are systematically associated with different semantic classes of noun (a `powerful man' is not the same kind of thing as a `strong man', for example.) and proposed the use of a statistic called mutual information to measure the strength of that association.
McEnery's experiment uses a modified version of this as well as some other less intuitively comprehensible measures, together with data collected from a sample of human `raters', each of whom is given a set of concordance lines from the BNC and asked to identify the significant collocations they demonstrate. His results so far suggest that raters often disagree very widely, sometimes seeing only the corpus evidence which supports their intuitive judgments, sometimes disregarding entirely evidence which contradicts them. In an oral re-run of the experiment, Tony asked his audience to propose significant collocations for the words `German', `English', and `European', before revealing the extent to which these coincided with what the BNC sampler suggests -- not a great deal, except for `European', which we all associated with terms such as `parliament' or `union', drawn from what McEnery described as a `political prosody'.
In the space of an hour, Tony gave a clear overview of the basic notions of corpus usage and analysis, as well as outlining an interesting and suggestive line of enquiry, which provoked considerable discussion. Not bad for a lunchtime seminar.
I should also note that afterwards the CHC (in the shape of Willard McCarty) generously paid for a very respectable Italian lunch, over which we discussed TALC, the ELRA evaluation work, and life in general, though not necessarily in that order.
Automagically generated by lite2html on 25 Jan 1998