[Paper delivered to the New Testament Research Seminar, Faculty of Theology, University of Oxford, June 1996]

Tools and Techniques for Computer-Assisted Biblical Studies

Michael Fraser
CTI Textual Studies
Graduate Research Seminar, University of Oxford

I. Past

"A concordance is a dictionary, or an index to the Bible, wherein all the words, used thro' the Inspired Writings are ranged alphabetically, and the various places where they occur, are referred to, to assist us in finding out passages, and comparing the several significations of the same word". Thus wrote Alexander Cruden in the preface to the first edition of his Compete Concordance to the Holy Scriptures. As well as a definition of what his work is, Cruden was also thoughtful enough to provide a brief history of the concordance. The first concordance, to the Vulgate Bible, was compiled by Hugo de S. Charo (who died soon after in 1262). He employed 500 monks to assist him. In 1448 Rabbi Mordecai Nathan completed a concordance to the Hebrew Bible. It took him ten years. 1599 saw a concordance to the Greek New Testament published by Henry Stephens and the Septuagint was done a couple of years later by Conrad Kircher in 1602. The first concordance to the English bible was published in 1550 by Mr Marbeck, according to Cruden it did not employ the verse numbers devised by Robert Stephens in 1545 but "the pretty large concordance" of Mr Cotton did.

Cruden's concordance was first published in 1737, one of the first copies being personally presented to Queen Carolina on November 3, 1737. Cruden began work on his concordance in 1735 whilst a bookseller in London. It had taken the assistance of 500 monks for Hugo to complete his concordance of the Vulgate. Cruden worked alone from 7am to 1am every day and completed the bulk of the work in less than a year. The proofreading and layout took a little longer. His brain was occupied with nothing else, so much so that he failed to notice the diminishing stock in his bookshop and the consequent lack of custom. "Was there ever, before or since the year 1737", writes his biographer Edith Olivier, "another enthusiast for whom it was no drudgery, but a sustained passion of delight, to creep conscientiously word by word through every chapter of the Bible, and that not once only, but again and again?". Some commentators would have us believe that compiling the concordance sent Cruden insane. Indeed, Cruden was committed to private asylums on three occasions. It was not the effect of generating concordances but love (or rather his failure in love). Cruden was not driven mad by the concordance but rather the concordance was driven because Cruden was mad. The completion of the concordance left him without a project with which to occupy his mind. Inspiration came, however, from a prophecy which he spent the rest of his days attempting to fulfil, that "the Corrector would be Sir Alexander Cruden, twice Lord Mayor of London, and member of parliament for the said city". Alexander the Corrector was a title of which Cruden could be proud. It gave him a mission, to be appointed the nation's Corrector and so rid the land of profanity, blasphemy, and lewd looks. He was well received in Oxford, dined at high table and noone considered him in the slightest mad. At Cambridge he was mocked by the undergraduates and back in London, solus contra omnes, a young man who dared to swear in front of Alexander the Corrector, was duly corrected with the aid of a shovel. Alexander was seized and carted off in strait-waistcoat to Inskip's Asylum, Chelsea.

It might not have been insanity which accomplished Cruden's concordance but it might certainly be considered madness to attempt to compile a biblical concordance today in the same manner as Alexander Cruden or his predecessors. What took Cruden over a year can be accomplished within minutes by a personal computer.

[Demo concordancer]

Computers have a long history in biblical, especially New Testament, studies. The start of humanities computing is generally placed around 1949 when Roberto Busa realised that he required the assistance of a computer to analyse the vocabulary of St Thomas Aquinas. The work of Busa resulted in two products. A concordance, in 1974 (Index Thomisticus) which was printed, just as concordances and indexes has always been done, and in 1991 the far more exciting (especially from a Latinist's point of view) Thomae Aquinatis Opera Omnia cum hypertextibus in CD-ROM.

The first study of computers in New Testament studies was probably the thesis successfully defended by John William Ellison in 1957 entitled, "The use of electronic computers in the study of the Greek New Testament text". This is a title which a prospective PhD candidate could write a proposal for today. Ellison's work resulted in a complete concordance of the RSV. In 1960 Andrew Morton and George H. C. Macgregor purchased a teletypewriter (with a Greek character set), a (paper) tape reader, and a control unit. They then set about typing in a machine-readable copy of the Greek New Testament. The biblical texts were prepared by putting the text first onto paper tape or punch cards. This was then transferred onto magnetic tape by pulling or pushing the card/tape over a light which passed through the holes in the tape to photoelectric cells creating pulses of electricity. Andrew Morton recounts his own memories of this feat,

"Memories of the early days are all of paper tape. It waved in and out of every machine, it dried and then cracked and split or it got damp when it lay limp and then sagged and stretched. Sometimes it curled round you like a hungry anaconda, at others it lay flat and lifeless and would not wind. Above all it extended to infinity in all directions. A Greek New Testament, half a million characters, ran to a mile of paper tape, and the complete concordance of it ran to seven miles" (Morton 1980, 197).

For the sake of comparison Homer's Odyssey was twice the size at 1million characters and 2 miles of paper tape or 25,000 punch cards. Having been transferred onto magnetic tape the text was immediately printed out and compared with the original. This, writes Morton, was the real work, the task of proofreading the machine-readable text; the text often being checked by five or more readers. "The cost of the first copy of the Homer concordance was about 15,000 dollars, omitting the capital cost (about ten million dollars) of the computer involved in the manufacture" (It's Greek to the Computer, 39). Around this time the New York Times reported that, "It is maintained by a few literary men and computer scientists ... that the use of the computer will enormously lessen the burden on page and pencil" (Studies In Honour of Roberto Busa, 225).

Computer-generated and printed concordances were all the rage in other disciplines as well but it was not long before a small step was taken from simply printing out the concordance for further traditional analysis to persuading the computer itself to do some of the text analysis. This particularly took off with a form of stylistics to test authorship. Much of this work had its roots in classical Greek literature, testing the authorship of works of Homer or Aristotle. Apart from the appropriate computing equipment such work required a statistical analyst, a computer programmer and a classical scholar. However, as Andrew Morton notes, "the contribution of the classical scholar is bound to be relatively modest", something which characterised humanities computing in this era. It was a brave man indeed who turned his hand to the Pauline corpus to decide (perhaps) once and for all just which letters had been penned by the Apostle Paul. Andrew Morton was that man.

Authorship tests start with a fundamental premise, defined by Andrew Morton in these terms, "Exhaustive tests have shown that every writer has his own unique skeletal structure of language, peculiar to himself ... only such texts as the twelfth book of Isocrates, written in extreme old age and when obvious signs of senility had set in, are exceptions to this general rule" (Morton, It's Greek to the Computer, 9).

Concerning St Paul, Morton's second premise was that it was fairly indisputable that Paul had written the letter to the Galations. This was to be his base text against which all others were to be measured.

On the 7th November 1963 the New York Times ran a story about Andrew Morton and his claim that a computer study of sentence length and Greek function words showed that Paul only wrote four of the letters attributed to him. The following year saw publication of Morton and James McLeman's Paul, the Man and the Myth, a statistical analysis of the letters attributed to St Paul with the aim of determining once and for all the authorship of the letters. Four of the thirteen letters could safely be attributed to Paul by the analysis of sentence length and the comparative occurrence of common words.

Homer is a fairly uncontentious author with which to work. Paul is not. Soon after the publicity surrounding the analysis of the Pauline epistles Morton received a letter sent airmail from Chicago. It simply read, "Dear Sir, You are a dirty pig. Yours in Christ, Anon." It was probably reactions such as these which led to the publication of a curious work entitled, "Christianity and the Computer" by Morton and McLeman, apparently at the request of their publishers(Hodder & Stoughton) who desired a simple and non-technical explanation of the work done using a computer and the authors' views on wider issues. What the publishers actually got was around 22 pages dealing with stylistic analysis and a further fifty pages which examined the state of the Church today and the roots of religion including the observation that the church's claim to truth in the 20th century was akin to a schoolboy whistling in the dark.

Stylistic analysis is not without its problems (which is why studies continue on the authorship of the Pauline,Lukan, or Johannine works). Using sentence length,for example, can assume sentence structure apparent in modern editions of the Greek NT text but certainly not to be assumed in the ancient manscripts. Statistical analysis,by definition, requires a significant statistical sample with which to work. The entire Pauline corpus does not provide such a sample in all cases. Stylistic analysis does not always allow for different styles for different occasions and audiences (or for different literary styles for that matter,such as dialogue). Finally, T. M. Knox commented, "The Spirit moveth where it listeth and is not to be reduced to the numerical terms with which alone a computer can cope" (Kenny, Stylometric Study, 116). John Ellison's response to Morton was to suggest using these methods on James Joyce's Ulysses (five authors) and Morton's own essays (several authors). Someone else observed that statistics could demonstrate without a doubt the Pauline authorship of the Letter to the Hebrews. As Anthony Kenny points out towards the end of his A Stylometric Study of the New Testament, statistical analysis is really analogous to aerial photography - observing patterns. Scholars have long observed the long loose sentences of Ephesians, the computer merely shows this to be confirmed. The Tuebingen School had already proclaimed Paul to be the author of four of the epistles attributed to him, more than a century before Morton and his confounded machine.

II. Tools and Techniques Today

Computers and software for the manipulation of texts have come along way since the days about which Morton and others wrote. But, one might ask, have the techniques changed at all? Certainly, the likes of David Mealand at Edinburgh continue to publish on the stylistic analysis of Paul and Luke. A browse through the acts of the Association Internationale Bible et Informatique (AIBI), an association expressly set up to study and discuss the impact of computers on Biblical studies demonstrates a bias towards linguistic studies and, in fact, linguistic studies of the Hebrew bible. There has, of course been a move away from viewing computer-assisted biblical scholarship as something so novel that the results of the research demand some reference to the presence of a computer in the title, preface, or appendices. The majority of biblical scholars are quite used to creating electronic texts using a word-processor with Greek, Hebrew, or Syriac fonts. Many use electronic mail to not only send round departmental memoranda but to discuss research with far-flung colleagues either privately or through one of the numerous electronic mail discussion lists. Some either own or use biblical software packages such as BibleWorks, BibleWindows, Logos Bible Software, MacBible, or the Gramcord applications. Some have also ventured onto the World Wide Web, perhaps querying the Bible Gateway, downloading electronic texts, or generally being amused at the diversity of material available. A few might even have contributed to this mosaic of information and created their own WWW pages. However, rarely would one consider referencing the use of the Bible package or even preliminary email discussions in a finished article. Much computer-assisted biblical scholarship remains hidden which probably explains why the highly visible scholarship tends towards the heavy use of statistical analysis. much of which does not appear in biblical journals but rather in journals for linguistic computing. This is a good thing for it demonstrates the process of normalization whereby the computer fades into the background. Computer-assisted scholarship should, of course, be evaluated on the results it produces as is the fruits of research using any other tool or technique rather than being treated somewhat differently merely because it was computer-assisted. This applies, with some urgency, also to so-called computer-assisted learning.

Unlike in the days of Andrew Morton we do not require capital funding of 10 million dollars for the appropriate computer equipment and we rarely require the services of a computer scientist or programmer (though a little IT support can be useful). Smaller, cheaper, powerful machines on our desks together with relatively large amounts of disk space enable us to do the same sort of tasks performed on mainframes even ten years ago. Nor do we have to run machines overnight to get results or to send the results to a printer in order to inspect them. After thirty years it should certainly be the biblical scholar's last resort to create set about creating their own electronic text of the Greek New Testament (for example). The fruits of earlier computer assisted research are available to today's biblical scholar. Biblical texts in which each word is morphologically parsed for grammatical searches are available with software like Bible Windows or from text archives such as the Computer Centre for the Analysis of Texts (CCAT) at the University of Pennsylvania. Other electronic texts of good quality and provenance are being provided by the Oxford Text Archive and the Universities of Virginia, Michigan, and Princeton. Using ready-made biblical packages we can search across the Greek NT, the LXX, the Hebrew Bible, the Vulgate and numerous other texts in a matter of minutes, concordances for specific words or phrases can be viewed on screen, saved to a file, copy and pasted to a word-processor (nothing like copying and pasting chunks from the TLG CD-ROM into an article to ensure accuracy) or the results can be simply rejected. Today's software encourages experimentation and testing of hypotheses. Hypertext environments, either the World Wide Web or multimedia CD-ROMs, encourage exploration. It is simply more convenient to follow up references when the primary sources are there on the computer in front of you rather than embedded in the Bodleian or even behind you on the bookshelf.

Statistical analysis, perhaps strangely given its earlier popularity, is still performed with a variety of tools and still requires knowledge of statistical tests and so on. As far as I am aware there is no biblical package which will perform the sort of analysis on the fly done by Morton, Kenny and continued by Mealand. The Gramcord applications which specialise in tools for scholars particularly interested in grammatical analysis might well integrate such a feature in the future. At present BibleWorks includes simple statistical output for single or groups of words. Unfortunately, it tends to fall over when asked to do anything more (like a statistical presentation of the use of particles throughout the Greek New Testament). Any way it works something like this....

[Demonstrate BibleWorks]

The last couple of years have been marked by the rise of the digital library of which the hallmark is some uniform interface for accessing electronic texts together with standards for the cataloguing and presentation of those texts. Logos Bible Software 2.0 demonstrates a particularly well structured digital library for Biblical Studies.

[demonstrate Logos 2.0]

So what then should one look out for when purchasing a biblical package for research?

Multiple texts
Synchronous scrolling windows
Embedded grammatical information
Boolean searching on texts and morphology
User-defined linking between texts

Aiming for the full manipulation of the text.

Whilst all of the Logos texts are on CD-ROM it is becoming increasingly common for text corpora to be delivered over the network. The texts reside on a server, perhaps at the publishers, and licensed users are given a client for their own machine. Occasionally commercial texts can be searched using a WWW browser like Netscape. Institutions like Oxford have taken CD-ROMs and copied them to hard disks in order to network them across the University. Thus the TLG and the Patrologia Latina database, for example, can be searched here in the faculty. It is ceasing to matter to the search interface whether the data it searches is on a CD-ROM, a local had disk or on a computer across the other side of the world. Thus, for large bodies of text the CD-ROM is probably a transitional medium and the future of text searching and delivery lies in the client-server world. Chadwyck-Healey are expected to launch an online service of a selection of their literary text databases later this year.

CD-ROM still has a life for the presentation of multimedia packages. The network is still not able to handle the large data associated with video and sound files. Theology is not particularly well-served by multimedia CD-ROM products (which might not be a bad thing given the content of some products in other humanities disciplines). However, one package which is particularly relevant to biblical studies is The Dead Sea Scrolls Revealed published under licence from the Israel Antiquities Authority and distributed by Logos Research Systems.

[Demonstrate DSSR]

The DSSR is an introduction to the world of Qumran. It is not intended as a serious research tool. However, Oxford University Press (in collaboration with Oxford Centre for Postgraduate Hebrew Studies and the Israel Antiquities Authority) are in the final stages of releasing a CD-ROM edition of the scrolls containing the 3,500 black & white photographs of the scrolls (though strangely, it seems, no transcriptions). The Ancient Biblical Manuscript Center are also involved in a project to digitize their photographic holdings (mainly for preservation reasons). Once digitized, rare and precious manuscripts offer other possibilities to the scholar. In 1993 infra-red imaging technology from NASA’s Jet Propulsion Laboratory (this time probably was worth ten million dollars) was used to examine fragments from the Genesis Apocryphon. The digital infra-red digital camera greatly enhanced previously illegible characters.

One of the first institutions to make serious use of the World Wide Web was the Library of Congress and one of the first exhibitions it mounted was on the Dead Sea Scrolls. The exhibition can still be viewed with a WWW browser and, although it does not provide video or audio, it does have a fairly extensive archive of images both of the scrolls and of related artefacts. The WWW is now what the majority of recent users identify as the Internet. The WWW is famous for being thought up by particle physicists at Cern, famous for the easy sharing of research articles; it is famous for its ability to link together texts distributed over the world, embed images, and link to other digital media. It is also famous for giving access to a wide range of material which it might have better for authors and readers alike if it had remained hidden. Theology, as you can imagine has its fair share of the eccentric on the WWW. It is also fortunate to have a large number of potentially useful resources, especially in Biblical Studies. Not only can primary texts be easily downloaded but the Web can also act as an interface to other databases. The Bible Gateway and Richard Goerwitz's Bible Browser allows online searching, amongst others, of English and Latin editions of the Bible. Torrey Seland (Volda Collage, Norway) maintains an impressive index of biblical studies resources and includes links to the ancient Gnostic writings, The Gospel of Thomas site, the Hypertext Halacha and Rabbinic works, the Perseus site including their online Liddell-Scott-Jones Greek Lexicon, the Interpreting Manuscripts Web by Timothy Seid to teach textual criticism and the process of manuscript transmission, the Society for Biblical Literature homepage and directory of members, TC-the electronic journal of textual criticism, the University of Michigan's Mediterranean Archaeology site, and so on.

The Internet is not only the WWW. Electronic mail is still the most common use of the Internet. Electronic mail discussion lists have an early history in biblical studies. Ioudaios is still cited as the gathering place for all interested in first century Judaism and Christianity. Founded in 1990 by a small group of scholars with a similar interest in the works of Josephus the group has grown in size to represent a large range of international scholars researching first century Judaism and early Christianity. The group is a self-styled community appropriately based on the lost community of Qumran (though it could hardly be described as a desert community and it certainly does not have a strict admission policy). The friendliness of the community was epitomised in 1994 by an on line, festschrift present to Robert Kraft by email on the occasion of his sixtieth birthday. The recent upsurge in numbers having access to electronic mail has led, unfortunately, for lists like Ioudaios to veer off topic or to send out numerous postings per day, many of which are less than academic in their content. However, it is easy enough to delete email messages...

III. The future of Computer-Assisted Biblical Studies

As I indicated earlier I believe the future of digital resources for humanities lies in networked resources. I also believe that as the creation of digital resources increases and different editions of the same texts or themes compete or complement each other, in much the same way as published books have done, then the fact of the computer will fade into the background, digital resources in whatever form will appear on undergraduate reading lists alongside books, and I won't be asked to give presentations on computer-assisted anything! Whether a good or bad thing, digitization is driving us back to the primary sources. One very pragmatic reason is copyright thus we rarely find digital editions of modern secondary works unless created with that purpose in mind. Projects like the Thesaurus Linguae Gracae and the CETEDOC Library of Christian Latin Fathers are the exception in their choice of the best available edition of each text. Nineteenth century editions abound and it is now possible to download the entire Ante-Nicene, Nicene, and Post-Nicene Father series. On the positive side the ever increasing computing power means that projects are not simply stopping at the digital version of the printed edition but rather driving ever faster back to the extant manuscripts used to construct the printed edition in the first place. One exciting development is the Electronic New Testament Manuscripts Project which is an international, scholarly, volunteer effort to make images and transcriptions of New Testament manuscripts available freely on the Internet. The directorate consists of Tim Finney and James Tauber and the advisory board includes scholars (like David Mealand ad Keith Elliott), a publisher (Cambridge University Press) and advisers on digitization of primary source material. The WWW site currently includes an in-progress catalogue of papyri and manuscripts and related papers on collating New Testament Manuscripts.

And if you want to know what it will all look like when the project is finally complete then you have to look no further than CUP's Prologue to the Wife of Bath's Tale which was developed by the Universities of Sheffield and Oxford.

[Demonstrate WOB]