There are three main series of published catalogues of the western
manuscripts at the Bodleian Library: the so-called
Broadly speaking, this project focusses on two rather different tasks: firstly, the cataloguing of the manuscripts themselves, and secondly, the publishing of the resultant catalogue descriptions in both printed and electronic form. Each of these two main strands of work presents its own challenges, while the second, in particular, involves breaking new ground at the Library.
This paper reports on some of the problems addressed by the project, primarily from the bibliographic point of view, together with the technical approaches we have adopted for their resolution. Our basic approach has been to build on existing work as far as possible, while at the same time seeking to develop a system adequate to the Bodleian's arguably rather specialist needs. In that spirit, we have developed a set of extensions to the Text Encoding Initiative (TEI) proposals for general purpose text encoding (Sperberg-McQueen 1994), tailored to the needs of manuscript cataloguers. A detailed appendix to the paper documents this set of extensions, as they are currently formulated.
It should be emphasized that the proposals here presented are very
much
The material falling within the orbit of the project is diverse. It includes manuscripts written in most of the major languages and scripts current in Europe in the medieval period (excluding Greek), represented by fragments as well as complete codices, and ranging in date from the ninth to the early sixteenth century. A parallel project to compile and publish descriptions of Greek manuscripts acquired this century is also under way, but has not yet reached the stage of developing automated procedures.
Some of these manuscripts are well-known and extensively published
, whilst others have never even been mentioned in print. Most fall
somewhere between these two extremes: the majority have been described
in one or more unpublished typescript catalogues prepared by successive
generations of Bodleian staff, notably the loose-leaf
One of the first questions to be addressed was the actual cataloguing methods to be employed. Although the 1971 Lyell catalogue was widely considered exemplary, we nevertheless thought it right to re-examine questions of content and format in the light of developments during the last quarter of a century. During this period, cataloguing, of both existing collections and new accessions, has steadily continued, but the pressure of other duties has meant that no catalogue of any group of manuscripts had yet been brought to the point of publication. We thus had the opportunity to re-think, in some cases from first principles, what information we should be aiming to provide in the new catalogue. As with any limited-term project, the crux of the task was to find an acceptable compromise between the ideal and the realistic: the ideal might be an extremely detailed catalogue, embodying new research, and with a large number of reproductions, but achieving such a goal might well be unrealistic, given the limited resources at our disposal. On the other hand, if we set ourselves the much more modest aim of producing only a basic summary description of the manuscripts, prompt completion would be achieved—but the end product would, in all probability, fail to meet most needs of its intended audience. Our aim was to find an acceptable compromise by providing information at different levels of detail.
At the most fundamental level, a catalogue may be little more than
an inventory, simply informing potentially interested readers of the
existence of the manuscripts in a given collection, and providing a
shelfmark or other reference number to allow them to be located. To
satisfy this requirement a very brief
It is interesting to note
that this list of headings, which was substantially defined before the
the Studley Priory meeting was held, already includes most of the
At the opposite extreme, a catalogue may contain so much detail, so
clearly expressed, that students are able to glean the information they
need about manuscripts from the catalogue, without recourse to the
manuscripts themselves. A good catalogue will inform the user which
manuscripts do
A middle way between these extremes is to be tested in the foreseeable future. As stated above, many of the manuscripts covered by the Project have existing unpublished catalogue descriptions in typescript, prepared over the course of the past several decades. While not always meeting today's demanding standards, these descriptions were prepared by the Library's professional staff, and contain a wealth of unpublished information. It is therefore planned that these descriptions will be entered onto the online system, being checked for accuracy in the process, but otherwise with minimal alteration or expansion, to serve as yet another level of finding aid and information provision.
The preparation of a new catalogue involves the resolution of two potentially conflicting forces: provision of information for the ever-developing needs and interests of the scholarly (and, increasingly, the not-so-scholarly community), as reflected in the evolving methods employed in a variety of catalogues of other collections; and in-house styles, conventions, and methods, which cannot lightly be altered or abandoned.
There is no common standard for the cataloguing of medieval manuscripts, although various countries have each begun to form their own general consensus about cataloguing methods, often as a result of a major cataloguing effort or project. In the USA and UK there has been a tendency in recent years to follow the format and conventions developed by Neil Ker in his pioneering
The automation of manuscript cataloguing has to be able to handle descriptions at all the varying levels of detail and complexity discussed above. The interface for displaying these descriptions has to allow the user to search for manuscripts via a number of different paths, based on searchable and scrollable indexes (e.g. authors and texts, scribes and artists, owners and donors, iconographic subjects, etc.), and free-text searching; it has to be easily navigable, and visually acceptable to those who are familiar only with conventional printed media; it has to enable downloading and printing of catalogue entries; it should not be software-specific or require high-specification hardware; and it needs to co-exist with catalogues of other types of material, including post-medieval items.
Various automation options were examined over the course of a year,
with these aims in mind: we examined proprietary systems such as Cairs,
and experimented with the production of our own relational databases
using the
FoxPro package. The optimum solution which we have so far
identified, however, is SGML, and this has formed the basis of most of
our work during the project: for our collection-level, and minimal
item-level, descriptions we have been using the Encoded Archival
Description (EAD) (Library of Congress 1996)
and for detailed records we have extended the Text Encoding
Initiative
(TEI) (Sperberg-McQueen and Burnard 1994)
to improve its handling of manuscript descriptive information
(metadata).
The EAD had reached its alpha version when we began encoding our finding aids, and it has readily proved itself suitable for providing the information which we had traditionally included in our printed versions of collection descriptions. It does not, however, provide enough specific elements at the item level to allow the marking up of catalogue records for individual manuscripts in as much depth as had been used in the most recent printed Bodleian catalogues. This can be circumvented by using the generic ODD (Other Descriptive Data) element, but this represents something of a evasion. It was therefore decided to use the EAD for information from collection level down to a minimal item level description, and then link from an EAD entry to a corresponding TEI record, in which much greater detail could be encoded: the EAD has several ways of linking to external files, of which the simplest is to use an entity reference. We plan to employ the same user interface for both DTDs: the user will not have to know which one applies at any given point.
Initially we considered designing our own DTD for our detailed, item-level descriptions of manuscripts, but soon rejected this approach in favour of using an extended version of the TEI. Several reasons prompted this decision: the TEI is a robust, well-tested DTD, already used for manuscript transcriptions, and using it obviates much of the basic ground work involved in setting up a new DTD from scratch. It is readily extensible, with well established mechanisms for incorporating new elements and entities, and for rewriting those already present. The ease with which metadata, texts, and images can be linked in a single TEI file also gives the possibility of records acting as the basis of a wider range of applications, including electronic editions or image databases.
The TEI already includes a small number of elements designed specifically for the encoding
and description of manuscripts, particularly
These elements tend to prove inadequate for detailed manuscript metadata. Of
Ker's sixteen points, only the first (Date) is readily available within the TEI, and
one other (Script) is partially covered by the
The most radical extension we have designed to the TEI DTD is the
incorporation of a new element, the
The term
The
Further elements provide for descriptions of the script used (with
links via a
Provenance information for an entire manuscript or a constituent
part is contained within a repeatable
In addition to the new
An additional global
In section
Within the Bodleian, we have been marking up catalogue records
directly in SGML format using SoftQuad's SGML authoring package, Author/Editor.
A blank sample record is used by the cataloguer as a template —
each important section within the template is marked by an identifying
number, which corresponds to an entry in a detailed cataloguing manual
designed specifically for this project. The manual provides the
cataloguer with information on what is required within a given element,
how it should be expressed, and what attributes are used. In practice,
we have found that marking up a record directly into TEI takes little
longer than producing a version in a standard word-processor. The bulk
of the time spent producing a record in fact is taken up by the
bibliographic analysis of the item, rather than the encoding of
cataloguing details in TEI format.
Once a record is complete, the cataloguer validates it, exports it
from Author/Editor's proprietary format into SGML, and moves
it to a specified directory on our server. Here it is processed in batch
mode by a script which loads it onto our in-house WWW system.
The design and implementation of a user-interface for our manuscript
system is proving the most complex and time-consuming part of the whole
endeavour. We decided fairly early on to attempt to design our own WWW
interface, instead of using an SGML browser such as SoftQuad's Panorama.
Several factors told against the Panorama approach:
We have been designing our own in-house WWW interface to allow the
browsing of both EAD and TEI records: it also offers users the facility
to search the full-text of entries, or given indexes, by keyword, and to
browse alphabetically through the indexes themselves. It aims to make
our records easily accessible via any frames-compatible Web browser,
without the need to install any specialized software. This frames-based
application is based on Tcl scripts, and uses Open Text's PAT
software for searching and browsing.
The interface allows the user to browse up and down the hierarchies of an EAD file, displaying information relating to the current level being viewed and to move down to any lower level present. In addition, it can carry out keyword searches either on the full text of catalogue entries, or on a number of given indexes— full Boolean searching is available here, and the user has a choice of searching across all collections, or a single one only. The user may also browse a number of dynamically-created indexes (such as personal name, geographic name etc.), which can contain multiple levels of description.
The link from an EAD to a TEI record is invisible to the user: it appears as a further hierarchical level below an item description in the EAD record. The same frames interface is used to display the TEI record, reformatted to HTML: the user can choose to browse a record's basic details, contents, decoration, physical description, provenance or its attached bibliography.
The
It is planned that the printed version of the detailed catalogue descriptions will initially be made available in a series of fascicules: rather than wait until the completion of the entire Project, it is thought that it will be more beneficial to make groups of catalogue entries available in printed form as and when they are completed. Thus, the collection of about twenty-five medieval illuminated manuscripts collected by T. R. Buchanan was chosen as the first group to be tackled, since it has a certain homogeneity of content and provenance, and has provided a suitable test-group with which to develop the cataloguing methods and the automated system; this will be followed by the larger group of liturgical manuscripts from all other sources; and so on. Once all the manuscripts are published in this form, it may be desirable to reprint the descriptions as a single volume, with addenda and corrigenda, and cumulative indexes.
The user interface is likely to be subject to major revision once the XML (eXtensible Markup
Language) becomes established, and new WWW browsers will be able to view XML marked-up
texts directly. Instead of converting to HTML, the interface will rely on stylesheets based on
XML-conformant DSSSL (Document Style Semantics and Specification Language), which should
prove faster, more elegant, and thus easier to maintain than the current, complicated Tcl-based
scripts. It is hoped to incorporate digitised images into the system
shortly: both the EAD and TEI provide facilities to link to image files,
and it should prove relatively simple to include in-line images and
pointers to external files. Inline images may be useful for collation
diagrams, for instance, while links to external files would allow us to
use our catalogue records as the core of a digital archive of manuscript
images. The first such links may possibly be to the high-resolution images
produced by the Celtic Manuscripts Project
(Oxford University 1997), currently
underway at Oxford.
SGML has proved a useful medium for encoding information about manuscripts at both the collection and item level: its hierarchical functionality is ideal for expressing the intellectual structure of a collection, and its combination of flexibility and rigour make it suitable for a detailed item-level description. Our experience so far is that these features have made it far easier to implement an SGML-based solution than a complicated relational database equivalent. The TEI itself has proved a solid basis for a cataloguing standard, its modular structure and easy extensibility paying dividends when it comes to building up a set of elements for manuscript metadata, and in providing a structure in which to place them.
The Bodleian's archivists and cataloguers have been able to adopt
SGML as their cataloguing medium with very little difficulty, and can
now easily encode directly into SGML without the need for an intervening
interface (such as a database form). A sophisticated authoring package
such as
Author/Editor can make the process of encoding much faster, by
use of macros for instance, and allows fast and easy navigation of a
complex document.
If a well thought out interface is provided, the system's users
themselves need know nothing of SGML or the structures of the DTDs used.
The WWW has proved a convenient and powerful medium for dissemination
of SGML-based metadata: the conversion from SGML to HTML is easy to do,
using any common scripting language, and powerful SGML-aware software,
such as Open Text's PAT, can provide performance to match any
conventional database. There are, unfortunately, few suitable turnkey
SGML systems which can do the same, but, for those without the resources
to design their own interface, Panorama provides an acceptable
medium for making records available to the Internet, albeit with the
provisos noted above.
The cataloguing Project at the Bodleian Library described above has depended for its success on co-operation and consultation on a number of levels. At the local level, every catalogue entry is scrutinised in detail by the medieval specialists on the Bodleian's permanent staff, and benefits enormously from the dialogue that results from their comments and contributions. Similarly, every aspect of the cataloguing method has been (or is still in the process of being) discussed with SGML specialists, so that they may better understand the needs of the medievalist scholar, and so that the medievalists involved in the Project may better understand the possibilities and limitations of the SGML system being developed.
At the national and international level it is sincerely hoped that the cataloguing and encoding methods developed at the Bodleian will be discussed, commented upon, and constructively criticised by the participants of the MASTER Project, and others besides, so that the final solutions reached can be as widely applicable as possible, and bring us a significant step closer to meeting the needs of our readers, and providing an aid to research that will be of benefit for decades to come.
As noted above, the work presented here is an on-going project, and
we are conscious of several things in the present description which we
propose to change. Although our chief goal at this stage has been to
test the adequacy of our provisions only for the cataloguing of a part
of the Bodleian's collection of medieval Western manuscripts, we hope
that it will also provide at least a useful first attempt at the problem
of describing other types of hand written resources, ranging from clay
tablets and classical graffiti to modern notes and
For the most up to date version of the TEI extensions used in this
project, and related discussion, the interested reader is recommended to
consult the Project's web pages at