Computers & Texts No. 13
Table of Contents
December 1996

Arts and Humanities Data Service

Daniel Greenstein
Jennifer Trant
Arts and Humanities Data Service Executive
King's College London

AHDS/OTA Logo

Dan Greenstein, Director of the Arts and Humanities Data Service Executive describes the AHDS and discusses some of the issues of greatest significance to the work of the Service.

Introduction

Increasing scholarly use of computers and electronic resources raises a number of related challenges.

Computer-based research produces digital data with significant secondary use value. Yet that value cannot fully be realised unless the data are created and described according to relevant standards, systematically collected, preserved, and reported to the widest possible community.

The outpouring of digital resources which make up a growing share of our cultural heritage makes digital preservation an urgent cause. Commercial publishers and information services, the entertainment industry, and more traditional repositories - museums, archives, and libraries - are regularly 'publishing' in electronic form. The worldwide web hosts other forms of cultural expression.

Scholars need to search for relevant information across the numerous online indices and catalogues which provide references to the resources they require.

Aims and Organisation

In 1995 the Joint Information Systems Committee (JISC) of the Higher Education Funding Councils committed £1,500,000 over three years to establish The Arts and Humanities Data Service (AHDS). The AHDS will respond to and address these several problems as they confront a well-defined range of academic disciplines. It will collect, describe, and preserve the electronic resources which result from scholarly research in the humanities, and make its collections readily available to scholars through an online catalogue designed to interoperate with other electronic finding aids.

Because these aims can only be achieved by adopting community-wide standards, and because our users demand and deserve seamless access to scholarly resources irrespective of where or by whom they are managed and of the form that they take (e.g. paper-based, digital, or artefactual), the AHDS will also seek the widest possible collaboration to develop a generalised and extensible framework for digital resource creation, description, preservation, and location. That framework will act as an essential guide to the Service's own work, and be documented in a Service Providers Handbook and Standards Reference Guidelines. The AHDS will also produce less technical Guides to Good Practice which will help raise awareness amongst the scholarly community about the importance and value of electronic information and provide guidance in its creation, description, and use.

The Service Providers will:

The Executive will co-ordinate and support the work of the Service Providers, and take the lead in developing the AHDS's collection policy and in producing the Service Providers Handbook and Standards Reference Guidelines, the AHDS catalogue, and its Guides to Good Practice. [See screen shot for further details of the Service Providers]

Data Creation

In order to ensure the long-term viability and interchange of electronic resources some attention must be paid to data creation standards. A common approach to data creation is required at least four different levels:

  1. the formal language which is used to represent the syntactic and semantic features of a digital resource (e.g. SGML for an electronic text, the use of tables and fields in a database);
  2. how such languages are used to record the categories of information which communities need to identify in their digital research resources;
  3. the shared vocabularies defined by specific communities for recording content (viz. the numerous name and subject authority files which are used by bibliographers and many others world wide);
  4. the technical standards used to store that content in machine-readable form (viz. compression standards such as JPEG and file formats such as GIF and TIFF for image files).

Agreement about the use of common, or at least interoperable standards ensures a level of consistency across accessioned digital materials without which electronic resource management and preservation is impossible. They also enable the migration of electronic information from one processing (hardware and software) platform to another as technologies change.

Yet standards are not exclusively in the interest of digital archivists and data creators. They serve users' needs as well ensuring, for example, that the electronic resources which they require can be obtained in a form that is compatible with their local hardware and software environments. It is precisely because data creation standards promise to benefit the widest possible community, that they must be identified, documented, and actively promoted. Data creation standards should not be conceived as a set of prescriptive or restrictive practices. Rather we need to develop a flexible standards framework that accommodates local practice while ensuring the consistency essential for effective information interchange. Such a framework of data creation standards will be identified by the AHDS in the widest possible consultation and documented closely in the Service Providers' Handbook and Standards Reference Guidelines. They will also feature largely in the Guides to Good Practice which will offer explanation and instruction to a wider community.

Collection

Collection is central to the AHDS's mission but it is essential to redefine the term as appropriate to this extensively networked and increasingly digital age. Certainly the AHDS Service Providers will act as repositories for digital research data and will actively encourage scholars who are conducting computer-based research to safeguard their electronic outputs by depositing them with the AHDS. Indeed with its Guides to Good Practice, the AHDS will point potential depositors to the data creation standards and practices which they will need to consider in order to secure the longevity of their materials. Yet preservation is only one of the incentives with which the AHDS will attract depositors. Documentation is another. Without information about its contents and form, an electronic resource cannot be accessed. Imagine using a traditional library which does not have a catalogue or other signposts to the books, journals, monographs,manuscripts and other objects that comprise its collection. Nor is it sufficient for data creators simply to provide that level of documentation which they think is most appropriate for their electronic datasets. Just as preservation relies upon attention to data creation standards, location requires some level of conformity in the way that electronic resources are described. By depositing their data with the AHDS, data creators will ensure that their electronic products are documented according to the data description standards which are beginning internationally to emerge. Accordingly they will enhance the possibility that potentially interested users will find the resources they require.

The promises of preservation, documentation, and location have proved attractive to several UK funding agencies which support computer-based humanities research in the UK. So far, the Economic and Social Research Council and the Humanities Research Board of the British Academy require grant-holders to offer any datasets they create to the AHDS for deposit; the Leverhulme Trust and the Wellcome Unit for the History of Medicine encourage their grant-holders to consider the same. The combined experience of the Historical Data Service and the Oxford Text Archive‹two Service Providers which predate the AHDS‹demonstrates that the same promises are attractive to data creators more generally and they are invited to approach the AHDS to discuss the long-term disposition of their electronic resources.

The AHDS's collection policy must also take account of the fact that scholarly resources know no national boundaries. The AHDS's users will want to identify electronic resources which result from scholarly research outside the UK. Accordingly, we need to extend our definition of 'collection' to include datasets stored by other digital archives with which the AHDS can negotiate reciprocal agreements. There are good precedents for this already within the AHDS. The Oxford Text Archive has an agreement with the electronic text centre in Michigan. The Historical Data Service is part of an international network of social science data archives and benefits substantially from an integrated catalogue which allows users to search across their respective holdings. The AHDS seeks actively to multiply such agreements and to extend them into areas which are appropriate for the other arts communities that it serves, notably in archaeology and the visual and performing arts.

Not all datasets need to be deposited with the AHDS or with one of its associated data archives in order to be known to the AHDS's catalogue and its users. Increasingly, computer-based scholarly research results in datasets which are made available over the network from numerous sites. The AHDS shares the scholarly community's interest in preserving these materials and in enabling users to locate them. Where data creators and the AHDS can agree compatible data preservation and description procedures, data deposit is neither necessary nor desirable. Accordingly, our concept of 'collection' needs to be extended still further to include electronic resources which are known to the AHDS's catalogue but neither stored at nor managed by any of its Service Providers or associated data archives.

Just as the AHDS collection policy cannot require central deposit, it cannot require that every item in the collection be made freely available. Scholars need to find the materials upon which their research and teaching depends, irrespective of whether those materials are in the public domain. Equally, those responsible for commercial and other resources to which access may be restricted, have an interest in preserving those resources and making their existence known to the wider scholarly community. The AHDS's collection policy must take these realities into account. Accordingly, it will negotiate the acquisition of some commercial and other resources to which access may be restricted. Equally, it will ensure that its catalogue acts as a gateway to restricted resources that are maintained and managed at other sites and by other agencies.

In sum, the AHDS's collection policy will be built on our understanding that the time has passed (if it ever even existed) when any single agency could create a vast and comprehensive collection of scholarly digital resources. The challenge today is to develop 'collections' which can be preserved according to the same minimum standards and which may be integrated from the users point of view‹that is, accessed globally through several information gateways.

Preservation

Digital resource preservation is vital to the scholarly community. Linguistic corpora, for example, may act as the building blocks of increasingly comprehensive and synoptic analyses of language. They may also hold a key to accurate and instantaneous machine-translation of spoken and written texts. And of course, library, museum, and archive catalogues, indeed all indices of scholarly and other information, are themselves a kind of database without which access to information would be improbable if not impossible. The case for preserving digital databases is not therefore parochially academic. It is universal and it is compelling.

The case for computer-tractable texts is different, yet again, but no less urgent. Texts are fundamental to scholarship in the humanities and are regularly rendered into machine readable form. For more then a generation, arts scholars have been producing electronic texts in support of linguistic content, stylistic and other analyses which are most effectively conducted by computer (Hockey 1987). More recently, scholars have begun to deliver electronic critical editions (Cover 1995). As the corpus of electronic text expands, so do the horizons for scholarly investigation, but only if the corpus can be maintained over time. Commercial and scholarly book and journal publishers are turning increasingly toward electronic editions many of which are not or cannot be mirrored by more traditional paper based ones (Zweig 1992). The situation with images and with digital audio and video recordings is similar to that of texts; only the technology is newer so the corpus of currently available material is not perhaps so large. Yet the high tide is approaching. The entertainment industry is actively developing digital technologies and it is only a matter of time before its combined outputs are only available in computer-tractable form. Museums, archives, and libraries are also experimenting digitising collections in order to extend access to them (some good virtual exhibitions already exist on the 'net') or to protect the rarest objects from the ravages of physical handling and use. Without establishing viable methods of digital resource preservation these databases, texts, images, and sounds will be lost to future generations. What is at stake is nothing less than our cultural heritage.

It is one thing to recognise the urgent case for digital preservation. It is another to address it. The problems are vast and as yet without satisfactory solutions (Hedstrom 1995). There are technical problems to be sure. For example, no satisfactory or reliable estimates exist regarding the longevity of particular magnetic media. Strategic issues are more intractable. There is no agreement even about what preservation entails. Is it possible to preserve electronic information independently of the processing platforms upon which it is initially mounted without any loss of significant content? Does the content of a multimedia installation, in other words, comprise simply a collection of digital texts, images, and sounds linked together by a set of explicit pointers?

As may be expected, those communities which are traditionally responsible for preserving our cultural heritage‹library, archive, and museum communities‹are the ones struggling to define the problems inherent in digital resource preservation and to recommend tentative steps which may produce solutions. In particular these communities are seeking experimentation with different models of digital preservation, which can be applied to particular and well-defined subsets of electronic information, and then documented carefully to enable them to be costed and scaled to fit the needs of other preservation initiatives. The AHDS was established fully with this approach in view. Focusing on its own holdings, and in consultation with the wider community, it will develop and implement strategies for digital preservation, and document them in the Service Providers Handbook and Standards Reference Guidelines. Here we will not merely describe our practices. We will also cost them so that they may be scaled either up or down and evaluated with respect to their prospective application to other digital collections and in digital archives organised differently than the AHDS. In this respect, the AHDS hopes to make a significant contribution to the wider discussion which must take place within the library and archive communities in order to ensure that the electronic outputs of today are available for use and evaluation tomorrow.

Data Description

Data description standards are crucial and must be adopted, documented, and implemented on a community wide basis if we are to enable scholars to search seamlessly across the numerous online finding aids which point to the resources they require. Accordingly, the AHDS will collaborate extensively with other agencies to identify appropriate data description standards, document these in the Service Providers Handbook and Standards Reference Guidelines, and implement them with regard to its own collections and catalogue. The problem that we face is integrating the very different descriptions which are used to document the various resources upon which scholars depend. For example, records from a library catalogue may provide MARC-conformant information. Those from digital text archives, museums and archives may reveal information more closely conformant to the recommendations made by the TEI, The Consortium for the Computer Interchange of Museum Information (CIMI) and the Encoded Archival Description (EAD), respectively. What is required is a means of positioning the rich and distinctive descriptions that are appropriate to particular resources within a more general framework.

Resource Location

No framework may be developed for the preservation, integration, and location of scholarly electronic resources which does not benefit from the lessons of practical application. Accordingly, the AHDS catalogue will be developed as a means of testing, evaluating, and refining those recommendations which bear directly on data description, resource location and interchange. The catalogue will provide users with seamless access to the resources that are deposited at and managed by the AHDS Service Providers and to those which reside at sites with which the AHDS has data exchange agreements. To test our extension of the definition of 'catalogue' we also seek participation from a select number of institutions which manage online catalogues of both digital and non-digital materials, notably from the university, library, archive, and museum communities. Though we will concentrate initially on resources managed within the UK, we hope to extend our efforts at least on a limited international basis.

To elucidate take as an example an Elizabethan scholar who is interested in the Bard. That user must be able to enter a gateway (or, more probably, one of several gateways) to humanities resources and search for 'Shakespeare, William'. An initial query may return some very rudimentary information about the many resources which are known to a variety of interoperating electronic catalogues, indices, and other finding aids. Accordingly, the first five records returned by such a query may be drawn from catalogues which are maintained by the Oxford Text Archive, the Archaeology Data Service, a Theatre Museum, a manuscript archive, and a university library.

To permit this level of integration, metadata records describing these holdings must share at least a small range of information. Yet this range of information is not yet sufficient for the scholar to assess whether the resources identified are worth acquiring or pursuing further. A richer level of description is required for the electronic text, the digital excavation record, and the objects listed respectively in the performing arts, museum, archive, and library catalogues. We may imagine, then, that the user conducts a second-order search to retrieve fuller information on the excavation record from the Archaeological Data Service and acquires the more specific detail appropriate to that resource. In addition specific information may be required to enable the scholar to acquire, mount, and use the data locally. This more technical description may be acquired in a third-order search and may only be needed for digital data.

Accordingly, our resource location tools need knowledge of electronic resources regardless of their physical location. Our aim is not to construct a single gateway to humanities resources or to centralise the management of them; only to construct a working prototype which may demonstrate the prospects for interoperability and inter-change on a far wider scale.

Awareness and Education

We must educate a larger community of data creators about the importance of digital resource preservation and interchange and about the practices which they should consider adopting if we are collectively to achieve these dual aims of preservation and interchange. The AHDS's contribution to what must inevitably be a much broader exercise is a series of publications collectively referred to as Guides to Good Practice. These will target scholars contemplating data creation or secondary analysis and highlight issues and methods which they need to consider. They will also identify potential pitfalls and provide comprehensive references for further reading about particular subjects. Perhaps most importantly, they will be written by subject specialists (e.g. literary scholars) for like- minded subject specialists (e.g. other literary scholars) and thus employ vocabulary and illustrative examples which are more approachable than so much of the methodological literature that is available today.

The Case for Collaboration

A framework for data creation, description, preservation, and interchange cannot be developed by the AHDS working in isolation. Success requires substantial collaboration on at least two fronts. We must solicit input from both scholars who have an interest in using our collections and those who will add to the collections through deposit. Members of these two most crucial communities will be invited to inform us of their requirements so that we may ensure that they are met by the resources we choose to collect, by the framework that we document in our Service Providers' Handbook and Standards Reference Guidelines, by the operation of our catalogue, and by the instructional materials we provide in the Guides to Good Practice.

On another front, we must collaborate with other information services. The development of robust and viable strategies for digital resource preservation requires experimentation with different models and substantial collaboration amongst digital archivists and librarians. It also requires dialogue with organisations which document and promote the data creation standards that we all require. To enable scholars more coherent and uniform access to the vast and growing number of online catalogues, indices, and digital resources, the organisations which construct and maintain such finding aids and collections must work together to develop compatible approaches to data description and to build interoperable systems. While input from standards initiatives is crucial, we must also prototyping common solutions in collaboration with the institutions which create and maintain the online tools on which scholars increasingly rely.

Elsewhere the AHDS is described as a broker facilitating collaboration amongst these various communities. This function derives directly from the AHDS's very narrowly defined remit and from our recognition that that our goals cannot adequately be fulfilled without extensive consultation and co-operation. We believe that by collaborating in the development of a generalisable framework for the preservation and interchange of electronic resources, all stakeholders have the opportunity to improve their own services or practices, extend and encourage access to their own collections, and elaborate their own institutional or professional identities. In the hope that our causes are one and the same, the AHDS invites the widest possible participation in its work.

References

Cover, Robin and Peter Robinson. (1995). 'Encoding Textual Criticism' Computers and the Humanities 29:2, 123-136.

Hedstrom, Margaret. (1995). 'Mass storage and long-term preservation', paper delivered at 'Reconnecting Science and Humanities in Digital Libraries'. A Symposium Sponsored by The University of Kentucky and The British Library, 19-21 October, Lexington, Kentucky.

Hockey, Susan. (1987). 'An historical perspective', in Sebastian Rahtz (ed), Information Technology in the Humanities: Tools, Techniques and Applications (Colchester), 22.

Zweig, Ron. (1992). 'Virtual Records and Real History', History and Computing 4, 174-82.

Arts and Humanities Data Service Executive
King's College London
London WC2R 2LS
Email: daniel.greenstein@kcl.ac.uk
Tel/Fax: 0171 873-2445
http://www.kcl.ac.uk/projects/ahds/home.html

The full version of this article was first published in Ariadne 4 (July 1996) http://www.ukoln.ac.uk/ariadne/issue4/ It is reproduced here with permission of the authors.


[Table of Contents] [Letter to the Editor]


Computers & Texts 13 (1996), 10. Not to be republished in any form without the author's permission.

HTML Author: Michael Fraser (mike.fraser@oucs.ox.ac.uk)
Document Created: 7 January 1997
Document Modified:

The URL of this document is http://info.ox.ac.uk/ctitext/publish/comtxt/ct13/ahds.html