Computers & Texts 12: Robertson

Computers & Texts No. 12

July 1996

An Information Network for Students of Literature

Hugh Robertson
University of Huddersfield

How will we best use digital resources in the teaching of literature? Or indeed, can we use them effectively? This paper is based on my experiences over a number of years of working with hypertext learning materials, and working with English students on modules involving the use of computer software or data. I am neither totally discouraged nor persuaded that I have found entirely satisfactory ways of integrating computer-based work into the curriculum. So I am writing as much about what I would like to try in future as about what I know from current experience works.

There are a number of issues which need to be addressed which significantly affect the kind of digital resources we will want to use and how we present them. What kind of teaching or learning resources will they be? Will they be used to complement or replace lectures or seminars or tutorials? How will they be related to or integrated into the syllabus? How much access to them will students have in theory, and how much will students use the resources in practice? Perhaps most importantly, how can these resources be used to discourage students from regarding them as an easy route to pre-digested material, and to encourage active learning and participation?

Information Technology in English Studies

Nowadays more students use word processors confidently, many use email and browse the World Wide Web. Most of our students now present major assignments in printed form. In a few modules students are required to submit some or all of the material to be assessed on a floppy disk. Among Arts students, however, there continue to be many students who are nervous of computers or feel insecure in using them for any but the most familiar functions, and there are certainly some who are suspicious of or hostile to the computer as a tool alien to their conception of the arts, and who are resistant to its introduction into their studies. We offer options on Computer-Aided Text Analysis and on Writing for Hypertext, but only a minority of students choose to do them. For many of those who do, an increased general confidence in using computers is one of the most important benefits they acknowledge as a result of following the module.

Sources of Texts

The most obvious type of digital resource to make available for students of English literature are the texts themselves. These are available from a number of sources. They can be keyed in-house, or scanned in from existing printed teaching materials. There are many ftp sites where literary texts can be downloaded, including many publicly available texts from the Oxford Text Archive. The current catalogue of the Oxford Text Archive provides a much larger listing of texts which can be acquired from the Text Archive (http://info.ox.ac.uk/ota). Those available for use in teaching are marked with an asterisk. There is a great deal of useful material on the World Wide Web; a page like the Voice of the Shuttle (mirrored at http://info.ox.ac.uk/~enginfo/shuttle/english.html) or at the University of Dundee (http://www.dundee.ac.uk/English/favelink.htm) provide excellent links to many of the available literary resources. Where students have access to the World Wide Web, pointers can be provided for them to useful materials. Where WWW pages are accessed frequently, the caching systems will provide the information quickly even from transatlantic sources. But in many cases the arrangements for copyright and permissions for use make it possible to copy, with appropriate acknowledgements, material from the WWW into a local intranet or a hypertext package, or to make a TACT database available for students to interrogate.

Technically it is possible to do similar things with material from a CD-ROM textual database; but it is necessary to see what is permitted under the licensing agreement. Obvious sources of literary data are the full-text databases currently released in CD-ROM versions by Chadwyck-Healey (and whatever reservations one may have about the choices of poets and editions, or about the limitations of the standard search interface, the Full- Text Poetry database available in the UK through CHEST at £200 per year is a wonderful bargain). These can obviously be used in a network alongside other learning resources. We have a CD-ROM tower with the five CD-ROMs of the English Poetry database, and two of English Verse Drama, available in this way. From a hypertext system it should be straightforward to provide links to material in a CD-ROM textual database (in Microcosm, for example) or to use a script to take users from Guide or a WWW-like network into the Chadwyck-Healey search interface (or the equivalent in another CD-ROM application). However, one would like also to be able to copy extracts into an online lecture/tutorial/guided tour on a local information network, linked to other learning materials, and to make it possible for students to do the same in constructing their own online 'seminar paper' or contribution to a discussion group. The licence for Chadwyck- Healey databases quite explicitly permits the printing out of non-copyright material for use in classes or courses; it doesn't explicitly allow the digital equivalent even within secure local networks which only registered users can access. For obvious reasons the licence prohibits the copying of extracts of the database onto networks which can be accessed by those who are not registered users of the database. The licence allows users to make electronic copies of extracts from the database for their own research. This would permit the manipulation of the data with a concordancing program, and results derived from this could be communicated to students within an information network. But again it is not clear whether it is permissible to make available a TACT database created in this way for student use, even if the students are all registered users of the Chadwyck-Healey database. It is perfectly permissible for students or staff to copy and paste extracts from the databases into an essay, seminar paper, or lecture.

What Kind of Learning Environment?

For a number of years I have been experimenting with various ways of making information which is relevant to their modules available to students through computers. The process becomes easier as we get more and more powerful workstations, as they are linked into a network, as large text databases become available and affordable. One relatively simple way to do this is to make available TACT databases of some of the texts that are being studied. Preparing the texts, however, is not necessarily simple or quick-for sixteenth and seventeenth century texts there are questions to decide about consistency of spelling, whether to modernise or not, and even for older digital transcriptions of texts the issue of converting from an upper case format. There is also normally a time-consuming process of encoding the text to indicate the structure-volume and chapter, or act and scene and speaker, or titles of poems-so that proper references are available to users.

Another method, which allows the possibility of presenting together texts, analysis and other information, and related graphic information (with the potential as well for sound, animation and videos) is to use a hypertext system. This could use a hypertext authoring package such as Guide or Microcosm or Toolbook, or combinations of different tools and packages (such as Guide with Visual Basic). It could use a simple intranet structure, using Netscape or Mosaic as browsers, and simple HTML authoring tools. In our context, where students already use Word for Windows 6 for word-processing, Internet Assistant is an obvious tool for student (and often staff) authoring. There are obviously limitations in the available functions in this solution, but there are advantages in familiarising students with World Wide Web technology and practices, and in ease of authoring, as well as in cost.

Cost is obviously one consideration in deciding upon a system. (I suspect, for example, that cost will be one reason for our not continuing with Guide 4; but costs of the new licence are not yet known.) If a package is not attractive to students and easy to use- easy enough in relation to the benefits users perceive that they get from it-then students vote with their absent fingers and hands. Ease of use here ought to include ease in contributing to the information network-in making one's own comments and notes, asking questions, and adding material, which can include providing new links. An information network provides opportunities for 'guided tours', or staff-produced online tutorials-the temptation is probably to produce something more like online lectures. (Actually there is no reason why lectures shouldn't be replicated on an information network, perhaps as PowerPoint presentations, so that students can go back to them for revision, or to ponder upon them at their own pace.) The package ought also to enable students to produce their own online seminar paper or electronic assignment. One reason why we have not persevered in using Microcosm, is that it requires various files, including a registry file, to be updated to allow any authoring. In our system students have no rights to save files on the fileserver, and neither staff nor students can save files in the Applications area on the fileserver where the Microcosm files are stored.

If the students can not and do not contribute to the information network, there are very real dangers that it becomes simply an easier and narrower source for information than the Library. At least the information on it is available (if you can get to a terminal), and not missing or out on loan. However, its availability may encourage very passive learning. But the time required to prepare good materials, even to borrow good material from the Internet (and much material can be used non-commercially on an intranet, subject to proper acknowledgement), is considerable, so that the information available on the local network is inevitably restricted. The risk is that you create a narrower local super-canon, even, as George Landow has pointed out, if the material included is not canonical (1992, 151).

What Kind of Resources?

Digital resources probably ought to complement rather than replace what we expect students to acquire for themselves or borrow in printed form from the Library. However, these are shifting categories, and electronic information will surely play an increasing part in future. In some modules the core information might be presented in a combination of digital resources and printed teaching materials (perhaps duplicated on the network to facilitate searching and linking). More often, in the near future, the material available digitally may be a mixture of interesting texts and materials to study alongside prescribed texts, concordances (perhaps in the form of a TACT database) of some prescribed and related texts, case studies, summaries of critical debates, bibliographical pointers to further reading, and discussions by participants in the course. In some cases we envisage making linguistic data available that would otherwise be difficult for students to access; equally we will encourage students involved in projects which involve gathering linguistic data to make it available in a common pool (e.g. of data for conversational analysis).

What do we want students to do with it? Well, read it, analyse it, cut and paste evidence for their assignments, challenge it, re-interpret it, add their own examples which extend or contradict the data that is there. We tend, I think, to separate teaching from research, and to think of a lot of digital data as research data. What we need to do is to encourage our students to develop research skills, to interrogate the data for themselves, to ask what to them are new questions rather than to peddle the old answers from secondary books, not so that they can become researchers when they graduate, but so that they are (re)searchers as undergraduates, and when they graduate, leave with confidence that they can investigate, explore and find out.

What that means is that the information networks we provide must not just provide data and links and answers to questions, but opportunities to analyse, to browse and to search. The Microcosm package includes both overt and hidden links, and more interestingly the ability to index a body of texts, and for users to ask the package to 'compute links' on the basis of matches to highlighted words or blocks of text. In practice there are serious drawbacks to its implementation. The user can specify whether the 'matches' should be documents, blocks of text, or paragraphs. The smaller the unit you specify the more matches there are likely to be, and the more risk perhaps of overlooking significant matches. The larger the unit you specify the more difficult it is to identify the matching passage(s) since the relevant words are not highlighted. If you are dividing an originally printed text into a number of files, then it seems to me important that you retain as much as possible the structure of the original text, normally by having files correspond to individual poems, or sections, or scenes or chapters. This can result in having documents (files) which are several thousand words long. It is no great help to be told that there are one or more matches in a document of that length, when they are not highlighted. In many ways the search facility in Guide is more useful. You can (and have to) pre-specify which set of documents you want to include in a given search. When you search on a word or a string, you are given a number of matches in each file, and the matches are highlighted. In these respects it resembles the results of a search on a Chadwyck-Healey text database, where the number of hits writer by writer or text by text are listed, and users can go to the start of the text or to the context, with the matched words highlighted in either case. In both Guide and Chadwyck-Healey databases more complex queries can also be formulated. It would be very desirable in an intranet to have a facility which searched the text files for strings, and which presented the results in a user-friendly way.

Both Chadwyck-Healey and (to a much larger extent) Microcosm make decisions about the most frequent words which are restrictive and unhelpful in some contexts. The Chadwyck-Healey databases discourage a search on words like 'the', 'and' 'of', and 'that'. In my experience one either retrieves no results after a long wait or, in another database, the search engine will obtain a number of hits (e.g. for 'that'). Professor Burrows's work on Jane Austen depends precisely upon the most common words (1987, passim). It seems a pity that Chadwyck-Healey deny information to users which has been so painstakingly gathered. (The information is available if you have the text database on tape rather than CD-ROMs and employ some other searching software.) It would be helpful if we could use the Chadwyck-Healey databases to establish norms for the usage of the most common words in the language in different periods, in different genres and in different writers. (I've been given a very interesting essay, based on data acquired using micro-OCP, on the use of 'the' in the poetry of Pope and Johnson, for example.) In practice, one of the most useful functions of getting the number of hits for the most common words in a period or in the work of a poet in the Chadwyck-Healey poetry database would be to establish the approximate size of the body of work being searched. While it is easy to find out how often 'I' occurs in the poetry of Wordsworth, Keats, and Gray, for example, there is no easy way to establish the comparative frequency per 1000 words. There is an opportunity here, of course, to set students to make some samples and to establish probable frequencies from them, using a program like TACT or OCP.

It is reasonable enough that Microcosm has been designed not to index every word. If it did, the 'computed links' for blocks of text would tend to propose as matches those documents or blocks where the high-frequency words occurred with their highest frequencies. However, it is unhelpful that they do not tell you which words they have excluded from the indexing, on the grounds that they are not 'significant'. It is possible to establish experimentally which they are, by trying to compute links from them-at least Microcosm 3.1 tells you that no links have been found unlike version 3.0. These 'non- significant' words include most of the personal pronouns. 'Me' and 'my' are indexed, but 'I' is not, so that the 'compute links' facility will not aid you in exploring the egotistical sublime in poetry or in prose fiction. Again, words like 'can', 'will', and 'should' as well the various forms of the verb to 'be' are not indexed. Quite aside from the polysemous qualities of 'can' and 'will', this inhibits a whole variety of potential stylistic explorations of literary texts. The idea of using an inverted index of the texts to suggest links to other passages or documents with more or less closely matching vocabulary is a good one. There are problems about the way it is currently implemented- certainly in the context of a body of literary texts. The program does also need a more conventional searching tool alongside it.

Integrating an Information Network

If you do not find a way to integrate textual databases or an information network into the syllabus, and probably if they don't play some part of the assessment process, most students will not make much use of them. If you create computer-centred options in the syllabus, the benefits are restricted to a minority of enthusiasts and those who perceive potential career benefits in improving their familiarity with IT. With the support of some remission from teaching through a University Teaching Fellowship scheme, we have been looking at the benefits of making digital resources available to English students, including a CAL package on syntax for first-year students. We are coming to the conclusion that we need to integrate the use of such resources into the core modules in the first and second year, so that all students become aware of what is available and what is possible, and so that the time taken to produce good quality learning materials can be set off against its use by a large number of students.

We are planning to develop further computer-based support for first-year language study, to integrate use of Chadwyck-Healey databases and of case-studies in hypertext, with opportunities for student contributions, into first-year modules on poetry and on Shakespeare, and to develop similar digital resources and IT activities for core second- year modules on language and on literary theory. This does not mean that there will not be digital resources to support the smaller optional modules. But we expect that through familiarity with the available resources, students will come to make better uses of the resources that are available, in some cases to contribute to them, and to frame their own questions for the Chadwyck-Healey databases. On a small scale this is already happening. To cite a couple of recent examples: one student wanted to find the contexts in which Coleridge used 'fancy' in his poetry. More esoterically, another student wanted, following on from a seminar discussion, to find out how the name 'Pamela' was scanned in the poetry of Richardson's contemporaries, and a search through the whole Chadwyck-Healey poetry database led to interesting examples from Sidney's Arcadia and to Lovelace referring to Sidney's Pamela too.

Access

Arts faculties in Higher Education in Great Britain are not usually richly endowed with computing resources for their students. The position over student access to digital resources is made more difficult if, for copyright reasons, learning resources have to be restricted to secure local networks. In our case, about 400 to 500 students in Theatre Studies, Media Studies and English share some 18 networked PCs (when all are working). Currently the heaviest use of them is for word processing assignments. Students have access to other computers in the institution including, in some cases, in study bedrooms; but it is unlikely that we would be able to make our learning and information network available outside the small local network. Subsets of digital resources (where there are no copyright restrictions) can be made available to students on floppy discs for use elsewhere-providing the appropriate software is also available. As well as supporting our developing information network, our local network also serves a variety of other computing needs including access to the Library catalogue, Microsoft Office (and Internet Assistant), email, CD-ROM datasets, TACT, micro-OCP, and, currently, Guide.

A Learning Environment for English Students

Ideally, what I would like to see is a learning-and authoring-environment in which students can access and (in some controlled way) contribute to a hypertext information network, which may well incorporate extracts from large textual databases and results from concordancing programs such as micro-OCP and TACT. But it should be easy for students to switch to use such programs directly themselves, or access further data from CD-ROMs, consult the Library catalogue, send email messages, or copy and paste material from the information network into their own essays, projects and seminar papers. Of course, students will need to be taught how to reference such material; but if it is easier for students to copy digital material than printed library material, it is correspondingly easier for staff to check whether a suspect phrase exists on an information network than in the library. The underlying program might be a hypertext authoring package such as Guide or Microcosm; with some loss of functionality, but also gains in ease of use and familiarity, it could be a local Web using authoring and browsing tools developed for the World Wide Web.

References

J. F. Burrows (1987). Computation into Criticism: A Study of Jane Austen's Novels and an Experiment in Method Oxford: Clarendon Press.

George P. Landow (1992). Hypertext: The Convergence of Contemporary Critical Theory and Technology Baltimore: Johns Hopkins University Press.

In a posting to the quill-list@chadwyck.co.uk (13/06/96) Duncan Christelou (Chadwyck-Healey Ltd) wrote, 'I'm happy to reassure Mr Robertson and others that if a licence has been purchased, and if the only users of the network are licensed users of the database, it is perfectly legal to transmit extracts of the database (in the form of course materials or otherwise) across that network'.

[Table of Contents] [Letter to the Editor]

Computers & Texts 12 (1996), 15. Not to be republished in any form without the author's permission.

HTML Author: Michael Fraser (mike.fraser@oucs.ox.ac.uk)
Document Created: 22 August 1996
Document Modified:

The URL of this document is http://info.ox.ac.uk/ctitext/publish/comtxt/ct12/roberts.html