[CTI Logo]

A Hypertextual History of Humanities Computing: Towards Independence from the Scientist

The 1970's saw a decline in the wild generalizations about what computers would do for humanities scholarship ("make ignorance impossible" was apparently one prediction).

1970 First symposium on Literary and Linguistic Computing held at Cambridge, organised by R. A. Wisbey (Cambridge were he was director of the Centre) and Michael Farringdon (Swansea).

1970 Michigan Early Modern English Materials Project began work to disseminate their unpublished archives on microfilm but first editing them on a computer and then printing direct to microfilm (Computer Output Microfilm). In the seventies this was a compromise between expensive hard bound publication and the rather horrible computer print outs. Described as a quite remarkable output device is the process known as photocomposition. The text together with all formatting codes was stored on magnetic tape. A lens device recorded photographically the resulting page. David Packard's concordance to Livy was the first example of the effective use of photocomposition. This procedure was also utilized for the Index Thomisticus. (Oakman, 1980)

Hlink 1. Michigan Early Modern English Materials today

Using a computer to aid in the collation of different versions of a text gained strength in the 1970's. 1970 Project OCCULT, described by George Petty and William Gibson (=Ordered Computer Collation of Unprepared Literary Text). The results from comparing two texts were printed in 2 columns. Penny Gilbert was best known for her program COLLATE. The printout attempted to emulate a traditional edition with variants beneath the base text. The process of collation was in a number of stages:

  1. Create base text with list of all words and reference
  2. The process of comparison (all done using input cards). The variants are stored in new file and subsequently merged.
  3. Output the final collation including the ability to discard insignificant variants.

Hlink 2. Peter Robinson's Collate software

1971 The publication by Andrew Morton and S. Michaelson of a computer-generated concordance to the Johannine epistles. The concordance was, as one expected, exhaustive and provided the usual keyword in context index. Its usefulness for serious academic research was questionable, however, when the Greek transcription scheme by which the texts had been entered into the machine had included no breathings or accents, rendering inaccurate the frequency tables. In addition, every word form present in the text was reproduced without any reference to the root word. Stuart Hall in his review of this work concluded, "The computer has in fact much to offer [the exegete], but he will reject it if it is offered as a substitute for scholarship rather than a handmaid".(Hall 1973, 221) The days of the simple computer-generated concordance were really at an end.

1972 A mature student in a class taught by Theodore Brunner at the University of California, Irvine, asked Ted why there was no Thesaurus Linguae Graecae comparable to the Thesaurus Linguae Latinae. On being told the main objection was expense and that between one and four million dollars would be required, the student wrote a cheque for one million dollars. A subsequent meeting of international classicists and the selection of 1,400 editions of Greek texts commenced the TLG project. Method of input was to retype all texts with OCR type face and then use optical scanner. By 1974 the database comprised one million words of Greek text. By 1976 the TLG had encoded 18 million words of text. One text per file on nine track magnetic tape. Twenty-three years on the project is nearing completion with the publication on CD-ROM of 58 million words of Greek. The finished project will be an electronic corpus of 2,884 authors and their 8,203 works, totalling a massive 69 million words of Greek. Before the advent of the CD-ROM the texts were provided on magnetic tape to subscribing institutions. To encode the Greek with its special characters and accents an ASCII code, known as beta Greek was developed. The corpus is thus machine independent and the software which can search, browse and decode the beta-code into fully accented Greek is purchased separately for each computing platform.

Hlink 3. The Thesaurus Linguae Graecae Project today

1972 A scanner with Optical Character Recognition cost from £100,000.

Hlink 4. A scanner...

1972 As a result of the second symposium on the use of computers in literary and linguistic research held in Edinburgh the Association for Literary and Lingustic Computing was formed. The first edition of the Association's bulletin was published the following year (edited by Joan Smith at Manchester and Michael Farringdon of Swansea. Joan Smith had previously circulated a newsletter and organised seminars at Manchester. The computer centre had subsequently decided (rather foolishly) to withdraw support from this initiative.).

Hlink 5.The cover of ALLC Bulletin, Vol 1 Issue 1 and ALLC Conferences are still going strong - Bergen ‘96

The opening article in the first volume of the ALLC's bulletin was by Joseph Raben (the then editor of Computers and the Humanities), entitled, "The humanist in the Computer Lab: thoughts on technology in the study of literature". It was a reflection on the state of computer assisted literary studies at the time as he saw it. The production everywhere of concordances caught more than his eye. He compares the finished, ugly hard to read upper-case printed concordances with the volumes on the library shelves next to which they sit, the products of previous centuries. His article concludes with words which remain relevant today,

A final concern for the immediate future is computer-assisted instruction. Those who predicted an instant success with machine teaching have learned, like the early enthusiasts of machine translation, that the processes of learning are as complex as those of communication; indeed, they have much in common, especially their refusal to submit to simplistic analysis. Humanists trained to approach subjects which have no readily visible hierarchical structure may already have mastered the philosophy of multi-branched searching techniques that will bring computer assisted instruction out of the drill and practice phase into the broad realm of true learning - that is, self-teaching. If humanists do not involve themselves in this new application, it will, by default, become the province of merely mechanical minds, a means of thrusting information into the unwilling students and another triumph for technological impersonality over humanity. If humanists do not concern themselves with directing the future of computer-assisted instruction, they will have themselves to blame when only those factual aspects of a subject which most readily lend themselves to objective presentation drive out the intangible, the nuanced in our approach to humanistic learning. (Raben 1973).

We might well bear his words in mind next time we are browsing the TLTP Catalogue looking for humanities based learning projects.

The second issue of the ALLC Bulletin that year was guest edited by Andrew Morton who had his own comments to make about the usefulness of such an association,

Not long ago the Director of the Institute of Classical Studies had at his disposal a research budget of £30 per annum. This might be compared with the costly capital equipment and large annual budget at the disposal of Professor Zampolli in his National University Computing Centre at Pisa. It was Oscar Wilde's advice to the rising politician to think with the Radicals but to dine with the Tories. The Association must think with the Literate but beg from the numerate...To convince Mrs Thatcher and her successors that the use of computers takes literary studies from the pencil and paper scale to the chrystallographic scale of funds would, by itself, justify the Association. (Morton 1974?)

Mrs Thatcher, for those who may not be aware, was then minister for education. Alas, she was not convinced, at least not to the scale of funding of which Andrew Morton dreamt.

1972/73 A new version of COCOA was announced by the Atlas Laboratory. This was a portable version, written in Fortran and distributed on punch cards. The new features, some of which were based on the Edinburgh Concord program, included a greater flexibility in delimiting the output by word endings, beginnings, frequencies, collocation of two words for example. The program size, if I've interpreted the data correctly, was a mere 40K. A word count of 10,000 words took approximately 5 minutes and a concordance of the same text, 16 minutes. Although COCOA was renowned for its ability to handle the character sets of non-English languages, the text input was constrained by the output device, i.e. the printer. The standard line printer could only print 64 characters (all upper case plus punctuation and mathematical symbols). However, Susan Hockey reported that the various transliteration schemes devised for different language character sets found the 64 character set quite adequate. The transliteration scheme used to represent accented Greek, for example, bears some resemblance to that developed by the TLG project. The final output could be sent through the GROATS package which enabled the printing on to microfilm of non-Roman alphabets such as Greek or Arabic.

A new journal, Computer Calepraxis published devoted to computers and classical studies (edited by S. Michaelson and Andrew Morton of Edinburgh, and A. Winspear at Calgary). The name, as one might have expected from Andrew Morton, was a word play - emphasising the origins of the institutions at Caledonia and Calgary and also epitomising the journals aim to establish by computer, h kale praxis.

1974 EYEBALL developed to analyze the sentences of an unedited English text, an aid to stylistic analysis (but not without its problems when it was run on local computers).

1975 Roberto Busa, as guest editor of the Association for Literary and Linguistic Computing Bulletin, argued that we do so little with the computer. Despite the publication of his indices to Aquinas within that year, he admits that "what we can do today by computer for publishing documents is no more than processing their indexes". He further writes,

Electronic data processing marked the beginning of a new era in the transfer of human information...At Gutenberg’s time typesetting started a new era in the distribution of human knowledge. Today, we have made another jump: we are now able to use an electronic alphabet which can be processed by machine at ‘electronic’ speeds and distances. But we are still at the starting point of the new era as far as language processing is concerned...[the computer] is just a tool, an off-line continuation of man’s fingers, as fingers may be described as the bodily computer on-line to the mind. It is man’s ingenuity which has to feed data and programs into it. (Busa 1975)

The comparison with Gutenberg continues to occur today, sometimes as if computing in the nineties was the most significant event since Gutenberg. Clearly, this feeling was present amongst humanities scholars in the seventies as well.

Busa asks not for better machines but for more data, more knowledge to feed into the computer. But, the temptation to ask computers to do things in the same way as before, should be avoided (and he has in mind here simple concordances). He concludes with words which still retain their relevance,

In language processing the use of computers is not aimed towards less human effort, or for doing things faster and with less labour, but for more human work, more mental effort; we must strive to know more systematically, deeper, and better, what is in our mouth at every moment, the mysterious world of our words.

In 1975 Andrew Morton defended Steve Raymond of Brixton Prison at the Old Bailey. Morton convinced the court that the police statements had more than one author.

1975/76 In a paper read at the International Conference on Computers and the Humanities Todd Bender argued for the development of a computer methodology which reflected the entire creation process of a text rather than merely the final printed text. The perception of the electronic text was determined by its physical structure on the printed page. That is, it was a one-time completed work. The text in the computer and the text on the page were perceived no differently. In the early years, of course, texts were typed into the computer, the computer processed them out of sight, and the results were viewed on a print out. As far as the scholar was concerned the text only briefly left the safety of the printed page. Bender predicted that most accurate edition would be a computer- readable copy of all the relevant editions stored one behind the other as layers of machine memory;

In the library of the future the works of authors will be conceived in a way radically different from the conception which underlies our current practices. The process of creation; the variations of spelling, punctuation, and style to suit shifting audiences; the development of the author's thought and attitudes in sequences of revisions will not be simplified and reduced to an editor's single choice. Ambiguities of the textual transmission will be preserved and investigated as a source of interest and not resolved or hidden in editorial selection. The more sophisticated view of the work is possible only when the work is conceived free from the limitations of printed paper as the vehicle for its representation. ((Bender 1976)

Susan Wittig (the following year) developed Bender’s concerns by asking the question which is endlessly asked today. Has the computer altered "our view of the literary universe"?. It is the idea of the text as a fixed and immobile form which Wittig is particularly keen to address. She writes,

If computer based criticism has tended to approach the text as a fixed and immobile form, it has also tended to view this form as an autonomous form. The text stands alone, utterly complete, completely coherent, coherently unified, and the critic’s job is to address himself to this perfectly finished artefact: to weigh it, measure it, objectively interpret it. This interpretation which is electronically facilitated through the computer, must be as thoroughly objective and scientific as possible... (Wittig 1977, 213)

Susan Wittig herself draws attention to the role of the reader, rather than or as well as, the author in giving meaning to a text. She does not, however, suggest any concrete proposals for computer-based research which places the electronic text within the context of this new criticism. Reading the article with the benefit of hindsight, of course, we can see that the whole notion of hypertext is one area where the electronic text and new forms of literary criticism have converged.

In the same volume of Computers & the Humanities in which Susan Wittig’s article appeared James Joyce, then of the University of California at Berkeley, advised humanists what hardware they should consider purchasing. The first microcomputers had arrived. The "smart terminal", a terminal with a computer inside, was still in development but gave the vague hope of word processing systems and even the ability to display Roman and Greek fonts on the "cathode ray tube";

Yet an aspect of such inexpensive computing that I believe should be a matter of concern is the period of time between the present and the future when home computers will be a standard part of a home television set and can be used by a humanist for either computer-aided research or recreational computing...Inexpensive, small computer systems allow academic departments and even individuals to purchase computer systems dedicated to their particular projects...Indeed, the central campus computing facility is in danger of being phased out as more and more departments acquire their own, smaller, computers... (Joyce 1977, 300 .

1976 Oxford Text Archive set up by Lou Burnard, a graduate in English Studies, who was working at the Computing Service as a database programmer. Initially Lou obtained or had in his possession thirty texts. It is now the one of the world’s largest collection of machine-readable texts with over 1,500 titles. The policy (which still exists) was to collect electronic texts from scholars and various institutions which would, at the very least, be securely archived and in most cases be available to other scholars for research. This intention to archive all and any texts which came their way was termed by Lou, "the dustbin policy of archiving". The original proposal for the archive acknowledges that, "no standard coding is currently envisaged [for the texts], as requirements vary considerably between texts, but some conversions may be carried out where possible. For example, where possible and requested, conversion of single-case (with shift characters) to upper and lower case will be carried out."

Hlink 6. One of the first texts in the Oxford Text Archive and the Rest. The Oxford Text Archive today

In 1976 Susan Hockey was giving a series of lectures for computing in the arts at Oxford University including, "Concordances, word indexes and dictionaries", "Stylistic analysis and authorship studies", a short course on COCOA, and a crash course on the programming language SNOBAL [developed at Bell Laboratories in the 1960's], described by Susan as "the obvious choice of programming language for arts people. Initially it is very simple in syntax and structure, and also concise so that a useful program can be written in two lines." (Hockey 1976) .

1977 Oxford Concordance Program first conceived. In 1978 funding received for its development and the first release occurred in 1981.

1978 Literary Detection published by Andrew Morton. Using statistics and stylometry to identify authors (or not as the case may be). Andrew Morton has appeared, apparently, as an expert witness in legal cases involving disputed authorship.

1978 A. J. P. Kenny published his study of the Aristotelian Ethics, comparing the Nicomachean Ethics and the Eudemean Ethics with three disputed books which appear within both Ethics. Using word frequency (concentrating, like Morton on common words) he showed that the differences between the disputed books and the Nicomachean Ethics were far greater than between them and the Eudemean Ethics. Using discrete chunks from each work Kenny also studied word length and sentence length.


[Introduction] | [Pioneers] | [Independence] | [Convergence]

This document created: 4 February 1996
This document last revised: 4 February 1996
Author: Michael Fraser
The URL of this document is http://info.ox.ac.uk/history/independ.html