Initially we considered designing our own DTD for our detailed, item-level descriptions of manuscripts, but soon rejected this approach in favour of using an extended version of the TEI. Several reasons prompted this decision: the TEI is a robust, well-tested DTD, already used for manuscript transcriptions, and using it obviates much of the basic ground work involved in setting up a new DTD from scratch. It is readily extensible, with well established mechanisms for incorporating new elements and entities, and for rewriting those already present. The ease with which metadata, texts, and images can be linked in a single TEI file also gives the possibility of records acting as the basis of a wider range of applications, including electronic editions or image databases.
The TEI already includes a small number of elements designed specifically for the encoding and description of manuscripts, particularly <hand> and < handshift>, which are used to record information on distinct scribes or handwriting styles. Other parts of the manuscript catalogue record are already catered for by the TEI's standard tagsets: shelfmarks, for instance, are covered by the < idno> element, which has a type attribute that can be used to distinguish present from past shelfmarks, dates are covered by the standard <date> element, and bibliographies attached to records can be represented by the < listbibl>, < biblStruct> or < bibl> elements. The languages in which a manuscript is written are covered, as for any transcribed text, by the global lang attribute.
These elements tend to prove inadequate for detailed manuscript metadata. Of Ker's sixteen points, only the first (Date) is readily available within the TEI, and one other (Script) is partially covered by the < hand> element. This paucity of manuscript-specific elements has caused some problems in the past for the compilers of electronic editions: the Canterbury Tales Project, for instance, has tried to rectify this by the incorporation of extensions to the DTD specific to the project. For the detailed cataloguing of the Bodleian's manuscripts, it was decided to produce a set of extensions, using the facilities for modification provided within the TEI DTD, that were planned to have as generic an application as possible.
The most radical extension we have designed to the TEI DTD is the incorporation of a new element, the <mssStmt>, as a child of the <sourceDesc> within the TEI header. This acts as a container for an extensive set of sub-elements designed specifically to encode the types of catalogue data which we already incorporated into our printed catalogues. In figure 1 we show the overall structure of this element, which is more fully described in section 12 below. Overall structure of the proposed Manuscript Statement
The term manuscript can refer to a wide variety of physical objects - for example, a number of quite disparate items may be bound together and re-foliated, so requiring descriptions for an item as a whole as well as each of its components. The <mssStmt> may be repeated and nested to allow close modelling of these often complicated structures: in the case of the above example,one <mssStmt> may be used to describe features common to all parts of the manuscript, and further <mssStmt> elements within this parent may cover features applicable to each component. Subsidiary <mssStmt>s need only record those features by which a component differs from the whole item of which it forms a part: if nothing is noted, those values declared in the parent <mssStmt> are inherited by its children.
The <mssStmt> contains most of the new elements added to the TEI: its major constituents are as follows:-
Further elements provide for descriptions of the script used (with links via a hand attribute to information on scribes and handwriting styles enumerated in the <handlist> element within the TEI header's <profiledesc>), of rubrication, of the binding, and of secundo folio.
Provenance information for an entire manuscript or a constituent part is contained within a repeatable <provenance> element. If more than one of these is present, a containing <listProvenance> element can be used to group them together (on the model of <bibl> and <listbibl> in the standard TEI guidelines).
In addition to the new <mssStmt> element, some further modifications have been made to the TEI DTD to incorporate important information that can be used in both the catalogue description and transcribed text. The <bibl> element has been extended to include elements to mark up the name of a repository, the place of origin of a manuscript and the collection to which it belongs. New phrase-level elements <incipit>, <explicit>, <colophon>, or <heading> have been added to allow the mark-up of the corresponding features within the header or main text. These are particularly useful for the creation of indices of incipits etc. A new phrase-level element, <iconTerm>, is used, primarily within the <decoration> sub-elements, to describe iconographic subjects, and includes an optional alphanumeric Iconclass code ( ICONCLASS Research and Development Group 1997) . A <summary> sub-element is available for inclusion in all <div> elements to incorporate an abstract of their contents compiled by the cataloguer.
An additional global range attribute is defined, which can be used to specify the physical span of pages or folios covered by a given element. The <summary> element, for instance, uses this attribute to indicate the span of folios represented by the division, without the need to explicitly mark them in the text with <milestone> tags. In the description of collation, this attribute can be used to record the range of folios covered by a single quire sequence, or a larger grouping of these sequences. Within the <miniatures> sub-element of <decoration>, it is used to indicate the position of each miniature by its folio reference.
In section 12 below, we supply a fuller technical description of these extensions, including examples of their proposed use.