This report summarizes activities at OUCS since the last meeting of the BNC Committee on 17 Sept 1996. At that meeting, the following areas were reported as in progress:
Progress has been made to meeting all of these goals, reviewed below, although there continue to be technical difficulties. Promotional activities, within the University and elsewhere, have also continued.
The first year of the EPSRC-funded contract for continued provision of support for the BNC has now expired, and a request for its extension is in preparation. One of the conditions of this contract was the User Survey carried out during the summer, a preliminary draft of which was provided at the last meeting. The final draft is now available (supplied as Annex 1 to this report) and confirms that the developments proposed are broadly endorsed by the community. As regards support activities, a request for renewal of the grant made from the News International fund of the University's English Faculty has already been successful, and we are therefore able to guarantee continued support for base-level distribution and promotion of the corpus for a further year at least. Dr. Claire Warwick, who was appointed to this post in August 1996, has carried out a number of support activities as described in her report.
The ELRA contract has now been signed, and a final version of the contract was circulated to Committee members. Four copies of the corpus have so far been distributed to ELRA members.
A total of 110 licenced copies of the corpus have now been distributed; a full list of licensees was circulated to Committee members in advance of this meeting.
Production of the sampler continues to be dogged by technical problems, most recently the discovery of a systematic error introduced in conversion from Lancaster format to SGML. Fortunately, this has now been rectified, and an index of the whole sampler, for use with the SARA program, should be available by the date of the meeting. It is hoped to distribute copies of a CD containing the sampler, the latest version of SARA, at least some parts of the BNC Handbook in browsable (HTML) format, and any other software tools available in time at the ICAME conference later this year. No progress has been made on the possibility of including at least some audio material with the transcribed speech, though evidence from the BNC survey indicates that this would greatly enhance the attractiveness of the package.
This 300 page tutorial manual is now complete and copies are being submitted for consideration by three publishers. Although, as noted above, we would like to include an electronic version of the work with the sampler, since the practical exercises it describes all relate to the full corpus rather than the sampler, some revision would be necessary.
Work on bug correction and enhancements continued. Two new versions of the client and server have been produced during the last six months; the most recent (version 929) clears a long standing error in the underlying sockets code, which has been delaying release of a proper 32-bit version of the client. The error was reported to Microsoft in October, and a fix was received in February: this is not (apparently) any kind of record.
Although basic problems about installing the software continue to appear, the availability of consistent reliable instructions on the Web has much reduced their number. We are now also beginning to hear from happy users of the system, the most recent being from someone at the University of Zurich, who has developed a method of using the SARA server direct from a network of Macintosh computers. We are continuing to demonstrate the software at a variety of conferences and workshops throughout Europe and in the US, where it continues to arouse considerable interest
Progress with the proposed British Library online service continues to be disappointingly slow, in part because of technical problems in rebuilding the corpus index at the Library site, but mainly because those chiefly responsible for developing the ancillary software have many other calls on their time. However, a start has been made on developing an automated registration system, a prototype for which can now be viewed at the project's internal web pages. Production of the new BNC index was successfully completed in February, and the system is now under test: so far it has performed very well.
LB gave a paper on the BNC project at the international SGML conference in Boston in November, attended by over 300 delegates, which aroused considerable interest; a similar presentation was given in March at the University of Wuerzburg, and the paper is due for publication later this year. He also demonstrated the system to graduate seminars at Oxford, Berkeley, and Michigan. CW gave a presentation at a EuroCALL Workshop at Aston University in February, and has demonstrated the BNC at several training workshops and seminars in Oxford and elsewhere.
The BNC's web site now includes a full bibliography of publications about the Corpus, to which additions are welcomed. The bnc-discuss electronic mailing list was relaunched, and now has 309 members. It has been most recently used as a means of assessing the extent of public concern about the current restrictions on access to the BNC.
We hope to announce very shortly both the online service to the BNC, and availability of the BNC sampler CD. Thereafter, we expect to concentrate on the need for a new release of the BNC, following completion of Lancaster's current re-tagging of the whole corpus. If our EPSRC bid is successful, we will be installing a new computer system during the summer, which should make production of a BNC "second edition" considerably easier, particularly in light of the experience gained at the British Library's site.Lou Burnard,