Computers & Texts No. 12
July 1996

Review: Chaucer, Johnson, and Shakespeare on CD-ROM

Michael Fraser

The Wife of Bath's Prologue

The typical critical edition is produced in relative isolation. A single researcher wanders from library to library transcribing with pencil manuscripts local and remote. Index cards, cutting and pasting, these are the tools of final assembly. The published edition generally includes a textual apparatus listing variant readings in the numerous manuscripts scrutinised by this poor unfortunate individual. The reader receives a single text in the published edition with variants submerged to the footer, scattered like pot shards with little hope of reassembly. There is a sense that much of the real work producing an edition is necessarily placed to one side by the limitations of the publisher. On the other hand these same limitations give the author and subsequent readers the impression of a definitive edition and the scholarly world may once again rejoice that the text of another ancient work has been decided for the next century.

The Wife of Bath's Prologue is everything which the printed edition is not. Lest anyone think otherwise, the Wife of Bath's Prologue on CD-ROM is unfinished. It is the first release in a project which has declared its intentions to produce new transcriptions of every pre-1500 manuscript and edition of the whole of The Canterbury Tales. But this is not the end of the matter. The CD-ROMs (or whatever medium is chosen to present these issues in the future) are the means to another end, the beginning of something larger: 'to use the computer methods now available to us to determine as thoroughly as we could the textual history of The Canterbury Tales'. Thus, what is before us on the form of the CD- ROM is a small fraction of the on-going research to answer one quite considerable question. The equivalent to the lifetime's work of a single individual, the public face of which would have been the publication of the final textual history whilst the rest would have remained hidden from view, is here shared with the scholarly world. The CD-ROM, its editor Peter Robinson and transcriber Elizabeth Solopova, exude the excitement of scholarship finding new answers to old questions. The excitement has bubbled over into a CD-ROM which, by its very existence, invites fellow-scholars to share the research as it proceeds rather than merely in its finished form.

This bundle of notes, this collection of pencilled transcriptions has been more than tidied up for us. Within this small fraction of the Canterbury Tales lies the structure into which the rest of the Project can grow. The search and display software is DynaText, the basic screens of which will be familiar to users of Chadwyck-Healey's fulltext databases. The left window contains the table of contents, each chapter of which can be expanded or contracted. The right window displays the selected text whether transcription, spelling variants, or articles.

[Screen Shot]

The Wife of Bath's Prologue: Displaying variant lines in all 58 witnesses.

Every critical edition has to begin somewhere and this electronic edition is no different in supplying a base text against which the sources might be collated (MS Hengwrt). Clicking on any one word in this base text displays the readings of the word in the 58 witnesses. Clicking on the line number displays the same for the complete line. This is not so different from the printed edition and, if the user desires, there is the opportunity to view either a regularized or unregularlized collation of the Prologue whereby all readings of the witnesses are present on the screen. Any similarity with the printed edition stops at this point. For not only is it possible to view the variants of any selected word or line but only one further click of the mouse will give you the complete text of the Wife of Bath's Prologue for any of the 58 witnesses.

The witnesses are accessible from the table of contents as well as from within the base text. Any one witness can be viewed in its entirety and, once again, selecting any word within the witness will display the collation of that line against the other witnesses. The apparatus for the selected line allows for the further selection of either a regularized or unregularlized collation of any one word encouraging movement from one witness to another. For each witness there is a new transcription and the CD-ROM includes a lengthy article on the process and guidelines used for the transcribing of the Wife of Bath's Prologue for an electronic edition. The system of transcription, the authors are careful to note, is not definitive but rather itself part of the ongoing process of research. The guidelines are presented for comment and discussion. Transcription notes for each witness are included, highlighting the features which caused particular difficulty in the transcribing, and the transcription also preserves any marginalia. However, the reader is not wholly dependent upon the transcriber. Taking this collation even further beyond any printed work is the inclusion of a digitized image of every folio or page of all 58 witnesses. Thus one can agree with the transcriber that indeed the scribe of MS Chicago 564 sometimes used an emphatic A at the beginning of words that bear no emphasis, or observe that the 'e' of 'experience' is missing from Ms. Bodl. 414 because the scribe, although leaving a space for it, never got round to executing the initial 'e'. The images are monochrome and their quality varies. For the most part they are relatively far removed from the original source being images of photocopies of microfilms of the manuscripts. This can result in curious notes such as the one associated with Paris BN Angl. 39, 'the rest of the line cannot be seen on the photocopy'. Such (honest) glosses only re-enforce the declared opinion that the Wife of Bath's Prologue on CD-ROM is research in progress. Given that some manuscripts are considered more significant for the study of Chaucer than others it is very likely that future releases from the Project will include electronic editions of single manuscripts which include colour, higher resolution, images of each folio. Each witness has its own catalogue description incorporating details one would expect (format, provenance, date, hands etc.) together with a bibliography. Finally each witness has its own section of the spelling database which permits the reader to examine spellings within the manuscript itself or in comparison with the other witnesses.

[Screen Shot]

Wife of Bath's Prologue: Regularized collation together with transcription and digitized folio.

The combination of TEI-SGML and DynaText allows for relatively complex searching of the ten million words contained on the CD-ROM. A search form includes the possibility of searching by line, by proximity to other words, by cited manuscript, person, or place, and by deletion or addition. One irritating flaw I discovered in searching, was the its apparent inability to search for many of the special characters found in the transcriptions (for which the font was specially developed to display). The spelling database helps in locating single words otherwise one has to resort to wildcards (* or ?) to locate phrases. The manual gives detailed examples on how to employ the underlying SGML encoding for best results in searching including the useful ability to limit the witnesses being searched.

There is something rather refreshing in the notion that whilst the project editors are busy compiling and transcribing the manuscripts for the next stage of the project (The General Prologue) other scholars are able to begin some of the work of analysing the text of the Wife of Bath's Prologue and of answering some of the wider questions to which this project is leading. The scholar is even helped in this matter by DynaText's provision of further annotation and hypertext links which might be added by the user (and kept on a local hard disk). On first appearances this is a collection of fifteenth century manuscripts containing the text of Geoffrey Chaucer, one of the stalwarts of the English Canon. On a second appearance the CD-ROM seems rather postmodern in its destruction of a single authoritative edition of the text. It is true that The Wife of Bath's Prologue on CD-ROM has an editor (Peter Robinson) but one cannot speak of a single edition of the Prologue. The editor's role has been to compile the material and implementing the hypertext links to facilitate the creation, if the reader so wishes, of personal editions. It is true that the editor decided that his base text would be the Hengwrt manuscript, but I can quite easily decide to begin with Fitzwilliam McLean MS 181. Every reader can be an editor and our answers are the fruit of their research.

A Dictionary of the English Language

I knew that the work in which I engaged is generally considered as drudgery for the blind, as the proper toil of artless industry; a task that requires neither the light of learning, nor the activity of genius, but may be successfully performed without any higher quality than that of bearing burdens with dull patience, and beating the track of the alphabet with sluggish resolution.
So Samuel Johnson commenced his letter to the Earl of Chesterfield on The Plan of an English Dictionary (1747). Some (who know no better) might be tempted to suggest that nearly 250 years later this description might be better applied to those who spend their time encoding electronic texts so that the rest might easily navigate and search their contents. But, of course, as the Wife of Bath's Prologue amply demonstrates this is most certainly not the case. The encoding of an electronic edition so that the structure is made apparent, the content easily searchable, and the whole attractively presented is not a task for the light of learning even if a considerable portion is fairly harmless drudgery.

Two editions of Samuel Johnson's Dictionary of the English Language have been published on CD-ROM by Cambridge University Press. The first, produced by Johnson in 1755, and the fourth, revised and published by Johnson in 1773. Entries from both editions can be viewed simultaneously on the screen. The electronic edition, like the Wife of Bath's Prologue, is encoded in TEI-SGML and presented with DynaText. This gives the CD-ROM a similar appearance to the Wife of Bath's Prologue and indeed it is only necessary to have installed one DynaText reader together with the specific fonts in order to view any one of the three CD-ROMs reviewed here.

The structure of Johnson's dictionary falls into the transcriptions and the digitized images of each page of each edition. Although it is possible for the dictionary to be navigated by the transcription, moving, for example, from the letter A to ABE... to Abecdary (Belonging to the Alphabet) it is more useful to locate words using the search forms provided.

The value of a work must be estimated by its use; it is not enough that a dictionary delights the critick, unless, at the same time, it instructs the learner; as it is to little purpose that an engine amuses the philosopher by the subtilty of its mechanism, if it requires so much knowledge in its application as to be of no advantage to the common workman.
The subtilty of the underlying encoding system might well amuse the inclined philosopher. However, the common academic is not required to understand more than the basics in order to make good use of it. Readers who have been duly impressed by the search capabilities of the Oxford English Dictionary on CD-ROM will be pleased to know that similar searches can be carried out on the OED's illustrious predecessor. Such searches are only possible because the editor, Anne McDermott, included the encoding of many of the elements identified by the TEI's Guidelines for print dictionaries (headword, part of speech, etymology, usage, sense, definition etc.).

[Screen Shot]

Johnson's Dictionary: Entry, transcription, and digitized image from the first edition.

The forms interface gives the option of searching the complete dictionary for a keyword or limiting the search to within the headword, definition, quotation, first or fourth edition, quoted author or title. If that is not sufficient then more complex searches can be entered using the underlying markup. This is particularly useful for proximity or Boolean type searching but also for giving access to the additional features encoded in the dictionary.

Barbarous, or impure, words and expressions, may be branded with some note of infamy, as they are carefully to be eradicated wherever they are found; and they occur too frequently, even in the best writers.
One of the pleasures afforded this common workman in the review of Johnson's Dictionary was attempting to reveal the voice of Johnson beneath the dull (as, to make dictionaries is dull work) defining of everyday words. Often cited, before even inspecting the electronic edition, are Johnson's definitions of lexicographer (a harmless drudge), oats (a grain, which in England is generally given to horses, but in Scotland supports the people), or to worm (to deprive a dog of something, nobody knows what, under his tongue, which is said to prevent him, nobody knows why, from running mad).

One of Johnson's primary concerns in compiling his dictionary was for the purity of the English language. A substantial number of 'barbarous' words are to be found in both editions of the dictionary. Placed there not, one suspects, because his dictionary was intended to be a snapshot of eighteenth century English usage, but rather because such words, being offensive to Johnson's ideal of purity through etymology, were placed in the dictionary to indicate to the common workman precisely which words he should not be using. In total 49 words are described by Johnson as 'barbarous'. A search specified in the form '<entryfree> cont (<note> with type=usg cont barbarous) and (<author> cont shakespeare)' will find, amongst others, those occurrences where Shakespeare himself employed such words (vastidity, worser).

Far more common are instances of 'low' (258) or 'cant' (154) words. Cant is defined by Johnson as 'a corrupt dialect used by beggars and vagabonds', 'barbarous jargon' or ' a whining pretension to goodness, in formal and affected terms'. Examples of the cant include 'black-guard' (a cant word amongst the vulgar), 'confounded' (hateful; detestable; odius as in 'He was a most confounded Tory' -Swift), 'mundungus' (stinking tobacco) and 'slim' (slender, thin of shape; a cant word as it seems, and therefore not to be used). The latter is an example of Johnson's attempt to educate by proscription. Johnson's aim to eradicate the English language of cant or spurious words peaks in the few instances where he presents the headword then the definition followed by the comment that, 'in this sense it is not used'. One can perhaps understand this where 'not used' is an addendum to a word in the fourth edition previously defined without comment in the first edition (Calmy: calm; peaceful or Preach: noun, a discourse, religious oration). This is not the case with the first definition given for 'snuff' (Snot. In this sense it is not used) which appears in both the first and the fourth editions. On finding this one immediately desires to consult the Oxford English Dictionary which has duly taken note of Johnson's claim and not included 'snot' among the definitions given for 'snuff'. Unfortunately, it is nearly impossible to search for all occurrences of 'not used' in the dictionary because 'not' has been designated a stopword and is thus ignored in all searches. As one might expect in a work of this nature the form 'used' is present in great frequency. One thus tends to stumble on Johnson's proclamations quite by accident. The work of purifying the English language, however, continues even if, on occasions, literature can impede its progress (Primal: First. A word not in use, but very commodious for poetry).

So that in search of the progenitors of our speech, we may wander from the tropick to the frozen zone, and find some in the valleys of Palestine, and some upon the rocks of Norway.
Picking out the Norwegian, the Indian, the Icelandic, the Irish and the Saxon, the Greek and the Hebrew words is, at first sight, easy enough. Searching with the '<etym>' tag containing some specified language shows 4131 words with some reference to Saxon etymology, 9763 from Latin, 5655 from French, and only 433 with reference to German. However, one cannot be sure of the accuracy of this method of searching. Greek words serve to demonstrate the point. One can search for '<etym> cont Greek' and retrieve a paltry 42 words. A browse through the dictionary shows that Johnson was not consistent in his specification of etymology. He also uses the abbreviation Gr. or, on most occasions, leaves it to the reader to recognise Greek on sight. The advantage, however, with Greek words in the electronic edition of the Dictionary is that a separate character set is required. Thus, there is an extra tag within '<etym>' which specifies Greek (<lang="gk">). Searching for all instances of Greek within the etymology retrieves a far more realistic 4307 words. It is unfortunate that a similar tag was not used for all languages and so regularizing the etymological entries. Thus one could ensure that a search for 'latin' would also pick up lat. (a total of 19035 entries) and instances where neither form are used. I was rather disappointed to discover that '<lang>' had not been used for Hebrew words. I assume because no separate character set was defined and instead the publishers decided to insert graphic images of each Hebrew word.

Finally, One word was recognised by Johnson as predating the Flood and the fall of the Tower of Babel. That word was 'sack', to be 'found in all languages, and it is therefore conceived to be antediluvian'. The Oxford English Dictionary very nearly agrees with Johnson on this point, but confines itself to referring to the word as having a prehistoric type.

This, my Lord, is my idea of an English dictionary; a dictionary by which the pronunciation of our language may be fixed, and its attainment facilitated; by which its purity may be preserved, its use ascertained, and its duration lengthened.
An idea hardly fulfilled by Johnson's Dictionary. What was cant then is elegant today and Johnson's refinements are today's slang. The English language evolves as it ever did. The ease and variety of ways in which an eighteenth century representation of English can be consulted on a twentieth century spinning mechanical disk fulfils many of the aims for the dictionary that Johnson had hoped his dictionary would do for the language of England. He would surely have approved this and its future provision on something reticulated or decussated, at equal distances with interstices between the intersections.

The World Shakespeare Bibliography

The final CD-ROM in this review is The World Shakespeare Bibliography. This is unlike the other two CD-ROMs reviewed in that it provides access to a database of references about texts rather than varieties of the text itself.

The World Shakespeare Bibliography currently covers a four year (1990-93) compilation of annotated references to all aspects of Shakespearean studies. Subsequent editions will result in a database embracing publications from 1900 to the present. In this regard the present version should be considered a sample of something greater. This CD-ROM is different from many other subject-specific databases in that it is not merely an electronic version of any printed bibliography or journal of abstracts. It is published in association with the Folger Shakespeare Library who publish the Shakespeare Quarterly but additional material has been compiled by the editor, James L. Harner (Texas A&M University), relying upon the contributions of a long list of international correspondents.

Shakespearean studies have obviously been flourishing in the relatively short period covered by this CD-ROM. There are some 12,000 works and several thousand reviews detailed. Each entry includes a full biographical citation (or other means of retrieving the item), a descriptive annotation and then further information dependant upon the type of entry. There are hyperlinks between essays and the full bibliographic details of the multi-author work, cross-references between items related by keyword, and hyperlinks from the author index to records. A typical record might look something like this (however implausible the subject):

6110 McCombs, Gillian M.
"'Once more unto the breach, dear friends': Shakespeare's Henry V as a Primer for Leaders."
Journal of Academic Librarianship 18 (1992): 218-20.
[Uses Henry V and Kenneth Branagh's film to suggest lessons for developing strong, dynamic leaders among librarians.]

A database, even of this breadth, should not be considered exhaustive on the subject. The majority of entries only appear after either the editor or one of the correspondents have inspected them. Using an online database such as OCLC FirstSearch and confining searches to an appropriate year, it is possible to find further references not mentioned in the CD- ROM database. However, the OCLC catalogue does not include the annotations or the cross- references. Nor does OCLC contain details from as many different media as the World Shakespeare Bibliography. Details may be found of stage productions, films, audio recordings, journalism, obituaries of great Shakespearean players, and software. There are 10 references to Kenneth Branagh as author-director and a further 20 references to articles and the suchlike about Branagh (Much Ado About Nothing was released in 1993).

[Screen Shot]

The World Shakespeare Bibliography: Searching the database using the underlying SGML markup.

From the DynaText search window it is possible to search by title, author, record number, or keyword (combined with language). The range of languages represented on the CD-ROM demonstrates, if nothing else, the breadth of international study devoted to Shakespeare. A combination of the underlying SGML and ISO language codes allows one to note the numerous translations of Shakespeare into Japanese, King Lear and the Merchant of Venice into Afrikaans, the Hindi Titus Andronicus and Hamlet's famous speech in Solomon Islands pidgin English. This is leaving aside various items in Arabic, Catalan, Russian, Croatian, Finnish, Greek, and Korean.

Between Chaucer and Johnson stands Shakespeare who was surely influenced by Chaucer (see Fleissner (1991) no. 8894) and an eighteenth century edition of whose work was perhaps edited by Samuel Johnson (Kliman (1992) no. 634). Whatever the research these three CD- ROMs are commendable fruits of traditional scholarship planted in the fertile ground of the Text Encoding Initiative and attended to by an established academic publisher. It is these three components combined that ensure the future success of computer-based humanities scholarship.

Computers & Texts 12 (1996), 21. Not to be republished in any form without the author's permission.

