[Presentation delivered at the History of the Book seminar, Oxford, Jan 1998]

The Electronic Text and the Future of the Codex

Presentation I. The History of the Electronic Text

Michael Fraser
Oxford University

As the first of this evening's presenters it is my duty to provide a few words by way of an introduction to this seminar. There will be three presentations, each of which will last no longer than twenty minutes. This should give us at least half an hour for (hopefully) lively discussion. Together we do not intend to cover everything which might be said about the electronic text and each of our presentations, although planned in general, will be particular to the interests of the presenting individual. Undoubtedly there will be some overlap in subject matter covered and hopefully such overlap will reveal divergent views. Generally, we have divided the presentations into the past, present and future. For my part I will deal with the very brief history of the electronic text. I will attempt to pin down more accurately what is meant by an electronic text and its place in the history of the book. During the discussion period I would welcome any comparisons which strike you between the development of the electronic text and the developments which led from scroll to codex, manuscript to print.

It interests me greatly that this seminar, the electronic text and the future of the codex, should appear within a series dealing with the history of the book. It's not unusual, of course, and taught courses on the history of the book often conclude with more than one lecture and reading list on the electronic text in all its manifestations. However, how far can one really say that the electronic text is part of the history of the book? And, assuming it is, then is its place in the history of the book (or the codex, if we want to emphasise the physical object), as successor to the book, in parallel to the book (for ever), or something else? Are there any features of the electronic text which might give us clues as to whether it is part of the history or evolution of the book? It might, for example, be interesting to compare the perceived advantages of the electronic text over the printed book with the advantages recognised for the printed book over the manuscript (issues of accessibility, accuracy, literacy, cost, and so on). Or perhaps we should take a broader view and see the development of the electronic text as part of the development of other media; film and particularly television, for example (especially as electronic text increasingly becomes multimedia).

So, what is meant by electronic text? Most of us would imagine writing within a word processor; or viewing a page on the World Wide Web; others may think of a digital folio (perhaps from Oxford's Celtic Manuscripts project) or of text analysis tools such as WordCruncher or the Oxford Concordance Programme. An electronic text is invariably associated with a computer monitor. Many electronic texts can be simply defined as digital representations of some physical object whether it be manuscript, printed book and what becomes important is the degree to which the electronic version represents the physical object.

It used to be more common to speak of machine-readable texts in place of electronic texts. This phrase immediately pre-supposes two things: something mechanical, and something mechanical which can read.

To understand better what in particular is meant by machine-readable, as well as to place our current perceptions of the electronic text within some sort of historical context, it might be a good idea to start at the point which most people reckon to be the beginning of the electronic text.

There would have been a pleasant sense of satisfaction if I was to say that the first book to be converted to machine-readable form was the Bible. Unfortunately, it was not. However, the first books, as far as I can determine, to be converted to electronic form, were the complete works of Thomas Aquinas. This monumental project, the Index Thomisticus, commenced in 1945 under the directorship of Fr Roberto Busa. The initial result of the project was the publication, on paper, of the Index Thomisticus (all 31 volumes of it). Later would come the CD-ROM entitled, "Thomae Aquinatis Opera Omnia cum hypertextibus in CD-ROM".

The next landmark in this brief history was the successful defence of a two-volume doctoral thesis submitted by John William Ellison (Harvard) entitled, "The use of electronic computers in the study of the Greek New Testament text", which, unfortunately, I have not had the good fortune to read. I do know, however, that in the same year Ellison assisted in the publication of a complete concordance to the Revised Standard Version of the Bible which was constructed using Harvard's central computing facilities. Other notable projects during the 1960's and early 1970's were the Centre for the Electronic Processing of Documents (CETEDOC) at the Catholic University of Louvain with an aim to "develop automation in the field of the study of documents" (1968); the start of the Thesaurus Linguae Graecae Project to encode 69 million words of Greek (with a system known as Beta Greek consisting of a combination of upper-case characters and ASCII diacriticals - 1972). By 1965 there were catalogued 50 computer projects in Shakespearean studies and in 1976 Lou Burnard set up the Oxford Text Archive with around thirty machine-readable texts.

The Index Thomisticus, the concordances to the Greek and English bibles, and the other projects at this time were printed on paper. The text was not typed, edited, or viewed on a screen. For the most part the text was encoded on to punch card, paper tape, and later magnetic tape. The text was not prepared directly in binary form. If it were then every character including punctuation marks would have required an eight digit binary number. Instead the text was transcribed on a keypunch which in turn produced a record such as a punchcard (effectively cards with a series of small holes, the position of each hole being significant), translated into binary and fed directly into the computer. There were, I imagine, two sets of punch cards. One set contained the text, and the other a series of instructions on what to do with the text. In went one set, in went another set, the sun fell, the sun rose, out came the results on reams of continuous paper. Welcome back to the scroll. So you see machine-readable texts were not electronic at all. They were created using mechanical devices, were in essence a collection of card or a roll of paper, and were read by another mechanical device which sat between the user and the computer.

In 1949 the container of Aquinas' text was the punch card rather than the book. The punch card was fed into the reader. The reader decoded or interpreted the punch card; this mechanical device read a text which had been translated (or punched) into machine language. Thus, in an era when the storage device was the punch card or paper tape there was, properly speaking, no electronic text. If one takes the metaphor of the machine reading the text (translated into its own language) then it follows that the read text was held in the memory of the machine only for the purposes of performing a particular task. The text existed temporarily in the mind of the computer. Is this comparable to the reading/decoding of a text written in any language on paper which then exists temporarily within the mind of the reader? We still speak of the computer's memory. The written text in many instances exists to preserve memory. When it is not written then it exists in the memory or the mind. Neurologists would claim the activity of the brain is the result of electrical impulses. However, even as we enter the 21st century many are uncomfortable with the notion of reducing the brain to a collection of electrical impulses or something physical. When we speak of digital objects whether they be text, film, or something else, we are uncomfortable with speaking of them as something physical (so we use virtual instead). Our perceptions of the workings of the computer is not unlike our perception of the workings of the mind. Conversely, then is it so bizarre, if the brain is powered by electrical impulses, to speak of the text in one's own memory as electronic text?

The point I am trying to establish is an observation regarding the relationship of electronic text to the codex or physical book. Electronic text in its purest form cannot be found on punch cards or even magnetic tape. Electronic text must be, well, electronic, a series of electrical pulses which have traditionally been seen in terms of binary on/off positions. The electronic text is therefore one or the other of two things. It is either an intermediate stage between the codex and the scroll (i.e. one begins with the codex, covert it to machine-readable form, process it as electronic text, and then print it in the form of a scroll on continuous paper or on reams of unbound paper); or it is another way of preserving memory and the output of the mind; perhaps closer in form to the human mind than other means of preservation.

It is the latter which represents electronic text in its purest form and its real origin lies in the projection of text on to the computer screen. This is not ancient history. Robert Oakman, for example, in an introduction to computers for literary scholars, wrote that 'entering materials and editing them at a C[athode] R[ay] T[ube] terminal promises to become one of the best means of input for literary scholars'. He bemoans the fact that literary scholars have not leapt at the chance to make use of such devices. His book was published in 1980, less than two decades ago. Just after his book was published IBM and Apple began the production of the personal computer and children such as I started fiddling with the Sinclair ZX80 and its successors.

The purpose of creating early machine readable texts

The vast majority of projects in this era created machine-readable texts as a means to an end. Very rarely was the machine-readable text an end in itself but rather for the purposes of some computer-assisted analysis. The production of printed concordances are an obvious example and the advantages of the computer in the slicing and re-ordering of the text are apparent from knowing that in 1965 Ione Dobson published the complete concordance to the works of Byron. Created by hand and taking 25 years to complete, she described it as "the last of the handmade concordances" and left muttering that "much pleasure would be lost on the unthinking machine". Apart from concordances another particularly popular and controversial area of study was authorship attribution. The methodology itself was controversial within the academic arena but the results created quite a stir within the popular imagination as well. Andrew Q. Morton made the New York Times in 1963 for his computer-assisted study of sentence length and Greek function words, 'proving' that St Paul only wrote four of the letters attributed to him.

Ben Ross Schneider in his Travels in Computerland provides a record of the London Stage Information Bank and the conversion of the London Stage 1660-1800 to machine-readable form. His primary purpose in doing so was to make an analysis of the 1027 characters played by around 200 actors, in nearly 30,000 performances, containing details of a possible 113 characteristics. And if all these characteristics were to be tagged in the London Stage (a task he initially undertook with index cards) then there would around 500,000 instances of each characteristic. Having realised the momentous task that lay before him, Professor Schneider wrote, "I could not help thinking, though, during the months, how much easier it would have been for me if the London Stage had already existed in a form the computer could 'read'. We could then give it the list of 1027 roles, have it sort performances of them as desired, and save thereby a great deal of time and effort.' Six months of effort in fact compared the single day the computer would take. This was his motive for the conversion of the London Stage to machine-readable form. However, it required funding. And how did he set about attracting funding? By putting in proposals saying that the conversion was necessary in order to analyse roles and characters, ostensibly for his personal research? Not at all, rather he "continually pestered theatre-historians with the idea that the London Stage ought to be put on computer tape so that the wealth of information it contained , not just about actors and roles, but about backstage affairs [and so on] would be instantly accessible to all. I would be like having an index to every kind of thing in the book, from candlemakers to His Majesty the King, only the computer would even turn the pages and take notes for you." (p.10-11, 1974). [where is it now?]

In general, however, it was the analysis of the text which was important, the results of which usually appeared in a book. Once the research was accomplished the machine-readable texts, if not discarded, were privately stored somewhere, sleeping whilst technology moved on. There was not much idea of creating machine-readable texts purely for dissemination so that others might share a text for their own research. Apart from the rarity of scholars in general making use of computers; machine-readable texts were often incompatible according to the medium on which they were held. Machine-readable texts used for concordances and stylistic analysis were often heavily embedded with markup codes, essential if software such as COCOA or the OCP were to produce publishable results. There is little need to say that markup schemes tended to be idiosyncratic. It was for this reason that the Oxford Text Archive was created, for the preservation (which more often than not meant the conversion from one format to another) and access to machine-readable texts.

There is one project however which commenced before the Oxford Text Archive and which is still going today. Infamous amongst seekers of electronic texts is Project Gutenberg. Project Gutenberg began in 1971 when Michael Hart was given a computer account with $100,000,000 worth of computer time in it at the University of Illinois. Not having any great 'normal computing' plans in mind to occupy this amount of computing time, he decided (in less than two hours) that the greatest value to which the computer could be put was the storage, retrieval, and searching of electronic texts. To prove his point he typed in the US Declaration of Independence and sent it to everyone logged into that computer. Thereby simultaneously creating a text archive and a piece of junk email. The bible and Shakespeare came later, book by book, since the machine could not hold both simultaneously. The preface to the Gutenberg edition of the Declaration explains that the title was originally stored as a set of email instructions which accessed a disk, this in turn had to be manually mounted and was the size of a large cake in its box. In addition backups were kept and one on paper tape. This single, small, file marked the beginning of a project which aimed to have 10,000 texts available in machine-readable form by the end of 2001 (forty years from the start date). The fundamental philosophy behind Project Gutenberg (assuming its history has not been re-written to take account of recent technology) is that anything stored in machine-readable form can be reproduced indefinitely.

Michael Hart's assertion that any text held in machine-readable form can be continually copied raises one of the fundamental issues relating to the existence of electronic texts. It also raises another issue, that of the transmission of the electronic text. I don't mean the means by which the electronic text is transmitted but more the reproduction and reuse of electronic texts. I have touched upon something of the history of electronic or machine-readable texts in general but each individual electronic text has a history of transmission and reception; something which is not always apparent. By way of an example, Lou Burnard drew my attention to the transmission of Shakespeare's first folios in machine-readable form. In the late 1960's Dr T.H. Howard-Hill prepared a series of machine-readable transcripts of the first folio as well as a selection of quartos. The work, carried out here in Oxford, formed the basis of the Oxford Shakespeare Concordances, published by OUP. When that was complete the machine-readable texts were sent to Dr T. Eagleson for use in his revision of Shakespeare Glossary. The texts were then archived on magnetic tape and lay dormant for two years. Then, realising that the technology on which the tapes depended was about to become obsolete, the texts were transferred to more standard form of tape (directed by Lou Burnard at the National Physical Laboratory). During this process some manual editing was undertaken to increase the usefulness of the transcripts (correcting markup and so on). The resulting text was then used by Stanley Wells and Gary Taylor in the publication of the new Oxford Shakespeare, after which the texts were lodged with the Oxford Text Archive. The first folio texts have subsequently been made conformant to the Text Encoding Initiative's markup scheme (SGML) and are delivered over the Internet together with the TEI header which documents, to a certain extent, the source and changes made to the text. One wonders, in the future, how much trouble future historians of electronic texts will have in tracing the transmission history of particular texts.

So, to conclude, what can the history of the electronic text tell us about the future of the codex? Machine-readable texts did not replace the codex. What the creation of machine-readable texts replaced was the labour involved in undertaking certain interpretations of the text. For the most part the machine-readable texts were an intermediary form between one codex and another. The texts distributed by the Oxford Text Archive were intended for similar purposes. Project Gutenberg, however, did aim to replace the purchase of books. The project did not aim to replace the need for paper but underlying the project is a strong (some would say obsessive) sense that that which is in the public domain (i.e out of copyright) should be freely available in as many places and for as many people as possible.

The seeds of the thoughts that the electronic text, in some cases, might replace the printed book had been sown. The early stages of the electronic text reflects the emulation of what is familiar (and all sorts of terms and metaphors from the world of the book, the manuscript, and the printed page have been imposed on the electronic text and the process by which it is created). However, slowly, as the technology develops the possibilities become clearer, the electronic text evolves into something (which with hindsight) is seemingly unfamiliar. One designs for the medium eventually, and the evolution of the electronic texts reaches a point where it is impossible to return to that which was emulated.

Have we reached that point? Time to turn to Dr Lee.

Appendix

Book in the OED: "A written or printed treatise or series of treatises, occupying several sheets of paper or other substance fastened together so as to compose a material whole."

Codex in the OED: "codex (kdks). Pl. codices (kdsiz). [a. L. codex, later spelling of caudex trunk of a tree, wooden tablet, book, code of laws.] 1. = CODE sb.1 1, 2. Obs. 1581 MULCASTER Positions xl. (1887) 228 In the fourth booke of Iustinians new Codex, the thirtenth title. 1622 FLETCHER Sp. Curate IV. vii, The codexes o' th' law. 1659 Gentl. Call. iv. §24. 408 The whole codex of Christian precepts. 1753 Scots Mag. Sept. 460/1 A new codex, or body of the laws.

2. A manuscript volume: e.g. one of the ancient manuscripts of the Scriptures (as the Codex Sinaiticus, Alexandrinus, Vaticanus, etc.), or of the ancient classics. 1845 M. STUART O.T. Canon viii. (1849) 185 Account for the speedy loss or destruction of most codices once in circulation.1875 SCRIVENER Lect. Text N. Test. 26 Tischendorf's great discovery, the Codex Sinaiticus. Ibid. 59 The characters in Codex B are somewhat less in size than those of Codex A.

3. `In medicine, a collection of receipts for the preparation of drugs' (Syd. Soc. Lex.); spec. the French Pharmacopoeia."