A Hypertextual History of Humanities Computing: The Pioneers

We begin in the year 1949 when Roberto Busa began work on the Index Thomisticus, an index and linguistic analysis of the works of Thomas Aquinas. One million man-hours over five years by a team of 66 workers. The result, in 1974, was thirty-one printed volumes with around 36,000 pages.

It seems appropriate then that the first humanities scholar to employ the computer as a significant part of his work was the Jesuit Roberto Busa in 1949, and that he should have used it to compile indices to the works of Aquinas, who himself had attempted a structured, systematic presentation of theology as a science. Busa's work formed the basis of the Thomae Aquinatis Opera Omnia cum hypertextibus in CD-ROM. Who, in 1949, would have imagined such a portable Aquinas never mind the association of, "hypertextus" with his "opera omnia"?

1957 John William Ellison successfully defended a ground breaking two volume thesis at Harvard entitled, "The use of electronic computers in the study of the Greek New Testament text". The same year saw the publication of a complete concordance to the RSV which had been supervised by Ellison and assisted by the Harvard mainframe computing facilities. The concordance was, of course, printed just as concordances had always been. But, it was not long before a small mental leap was made from simply printing out a concordance for further (traditional) analysis and actually persuading the computer to do some of the data analysis.

In the same year that Ellison defended his thesis Andrew Q. Morton decided, with George H. C. Macgregor, to use a computer to aid them in the stylistic analysis of the New Testament.

Meanwhile in 1959 The Cornell Concordance to the poetry of Matthew Arnold published photographically from the computer printout. It required 38 hours of machine time.

1960 Ellison & Morton purchased a teletypewriter (with a Greek character set), a (paper) tape reader, and a control unit. They then set about typing in a machine-readable copy of the text. Andrew Morton recounts his own memories of this feat,

Memories of the early days are all of paper tape. It waved in and out of every machine, it dried and then cracked and split or it got damp when it lay limp and then sagged and stretched. Sometimes it curled round you like a hungry anaconda, at others it lay flat and lifeless and would not wind. Above all it extended to infinity in all directions. A Greek New Testament, half a million characters, ran to a mile of paper tape, and the complete concordance of it ran to seven miles (Morton 1980, 197).

1961 The Structure of the Fourth Gospel by G. MacGregor and Andrew Morton was published. The work had involved the use of the computer to analyse the length of sentences and paragraphs of John's gospel. On the basis of this research the authors concluded that the fourth gospel had two sources. Although the computer presented the authors with the data they differed on the reasoning behind the conclusion. Morton believed the gospel had originally been written in a codex form (permitting the dislocations of pages) whilst MacGregor had already arrived at a two-source hypothesis by more traditional methods of scholarship. Computers then, as now, were an aid to scholarship, they did not create it. As Busa, Ellison, Morton, and others found, the computer was best suited to doing the donkey work, to be treated as a sophisticated labour-saving device. The reasoning and final judgement still belonged to the scholar. The accusation that rather than improving scholarship computers would result in scholarly laziness was never entirely avoided.

Morton was reported in the New York Times, 7 November 1963 for his claim that a computer study of sentence length and Greek function words 'proved' Paul only wrote four of the letters attributed to him. John Ellison in "Computers and the Testaments" (Computers for the Humanities?, 1965) replied to Morton by using his methods on James Joyce's Ulysses (five authors) and Morton's own essays (several authors). Morton was also heavily criticised for deciding sentence length by modern punctuation. Subsequently Morton steered clear of the bible and remained with less contentious Greek texts

1964 Sally and Walter Sedelow dub the term "computational stylistics" (which I assume developed into computational linguistics).

1965 Ione Dobson, by hand created the complete concordance to the works of Byron. It took her 25 years. She described it as "the last of the handmade concordances" . However, she also muttered that "much pleasure would be lost on the unthinking machine".(Oakman 1980).

1965 Ted Nelson coins the term "hypertext" in the context of his Xanudu system which aimed to be a repository of everything and had ever been written, believing that all knowledge is entangled and therefore should be linked together in some system on which everyone and everybody is on-line. For it to Of course, Xanadu was never completed and is never likely to be.

By 1965 there were a catalogued 50 computer projects in Shakespearean studies.

1966 Allen Ellis and Andre Favat suggested that the computer would do for literary study what the telescope did for enlarging the picture of the world. In the same year A. Q. Morton and James McLeman, applying some of the work of W. C. Wake, published Paul, the Man and the Myth, a statistical analysis of the letters attributed to St Paul with the aim of determining once and for all the authorship of the letters. Four of the thirteen letters could safely be attributed to Paul by the analysis of sentence length and the comparative occurrence of common words.

Stylometry not unique to the development of computer assisted literary studies, although the computer was undoubtedly responsible for its rapid growth in the 1960's and 70's. Indeed probably the first attempt to make judgements about authorship is preserved in a letter by Augustus de Morgan written in 1851 (professor of mathematics, London). The letter suggests using word length to decide the contentious issue of Pauline authorship.

1966 The journal Computers and the Humanities commenced.

1967 COCOA word count and concordance program developed at the Atlas Computer Laboratory in Chilton (Berks) by Donald Russell. COCOA stood for Count and Concordance Generation on Atlas. The great benefit of COCOA was the userís ability to specify an alphabet and to declare up to three characters as a single unit which permitted the use of COCOA with non-English alphabets and the ability to sort together pairs of collocating items (rather than just single words). In 1969, for example, St Cross College in Oxford had a terminal link to the Atlas Laboratory where COCOA was employed in the analysis of Turkish newspapers and the poems of the Persian poet Hafiz.

1968 The Centre for the Electronic Processing of Documents (CETEDOC) established at the Catholic University of Louvain under the directorship of Paul Tombeur (who remains its director today). Its aim, simply stated, was to "develop automation in the field of the study of documents". CETEDOC was and still is a centre of research, especially in the field of medieval Latin. In 1972, for example, the Centre was involved in the computer assisted comparative study of manuscript traditions. Paul Tombeur outlines the three-stage process involved as 1) Storage (in his case on magnetic tape, one record for each variant), 2) Automatic comparison, and 3) Analysis, the most difficult stage involving the formulation of instructions for the computer to construct the most primitive text. One might note that at this time the Centre ran this and other text analysis programs on an IBM 360 with 128K of RAM (to be upgraded to an IBM 370 with 1MB RAM). Much of the work of the Centre then laid the foundations for the products for which CETEDOC is best known, Library of Christian Latin Texts, In Principio, and the Archive of Celtic- Latin Literature. The high quality of these text corpora very much reflect the concerns of Professor Tombeur. He wrote in 1972,

One important advantage of the application of data-processing methods lies in the complete analysis of the problems. The necessity of applying rigorous logic to all the stages of a study, of breaking down each stage into many simply elements, gives paramount importance to the methodological approaches to our problems. The computer forces us to master our problems as perfectly as possible; otherwise we run the risk of being furnished with deceptive output, and of having our principal questions left unanswered. (The Computer & Literary Studies, 340).

Meanwhile, in 1969, the American Philological Association had established a repository of Greek and Latin machine-readable texts at Dartmouth College in New Hampshire (more famous now for its Dante archive).

In the 1960's, the U.S. Department of Defense grappled with the problem of making a decentralized computer network so that it wouldn't have a single "point of failure", a network hub which could be targeted and disabled in nuclear attack. This experiment, administered by the Defense Advanced Research Projects Agency, became known as the ARPANet. 1969 September - December the installation of the first four network sites on the ARPANet. The seeds of the Internet had been planted. It was not until 1987, however, that the Internet as an international network truly took off. In 1987, the National Science Foundation created the NSFNet, a TCP/IP network to connect its supercomputing centers with universities. It was opened up to the public and to allies of the USA. The combination of the NSFNet and the regional networks that it spawned became known as the Internet.

By the end of the 1960's humanities computing was sufficiently well-established for there to be a number of university based centres. At Cambridge for example was created a centre for literary and linguistic computing.

