3 Preservation Description Information

3.1 Reference Information

The reference information stored within an AIP serves to identify and describe the archived object for access purposes. In principle, as metadata standards converge, it should become possible to dispense with any specification for such metadata peculiar to the CEDARS implementation: an AIP could simply indicate the location of its associated standard metadata record. In practice, there is no sign of any convergence on a single metadata standard appropriate to the full range of material an OAIS archive must handle. At the same time, rich metadata descriptions are increasingly prepared for different resources according to different domain- or medium-specific standards. Archives are typically described using EAD, digital texts using TEI Headers, commercially published resources using MARC, web-based resources using RDF or Dublin Core, museum resources using CIMI, and so on.

It would be foolish to discard or to duplicate these rich sources of metadata, when they are available. However, they may not be available in every case, and it may also be difficult to choose anmongst them. We propose here therefore that there should be a minimum guaranteed set of resource description elements present in any CEDARS aip, irrespective of any additional set of resource discovery elements provided by some non-CEDARS rich metadata record.

Following AACR2 and ISBD, we propose that this minimal set should comprise information to identify the work, those agencies responsible for creating its intellectual content, and those agencies responsible for distributing it in its original (pre-ingest) form: in terms of traditional bibliography, we mandate a title, a responsibility statement, and a publication statement.

In addition we allow for the existence of reference labels which supply standard identifiers for the work alternative or supplementary to its titles (such as ISBNs), and also for metadata records conforming to other standards.

The <reference> element is therefore defined as follows:

<!-- *  *-->
<ELEMENT reference     
           (resourceDesc, referenceLabel*, otherRecord*)>

3.1.1 PDI: Reference: Resource Description

The <resourcedesc> element is mandatory within any CEDARS AIP, as it provides minimal functionality needed for resource discovery and for rights management within the archive. It has the following three sub-components, each of which is further described below:

<title>provides title information for a digital object within the archive
<respStmt>provides information about intellectual responsibility for an object
<publicationStmt>specifies the original distributor

Here is an example resource description:

<!-- to be supplied --> Title information

The title information for a digital object should conform to standard bibliographic practice. This implies in particular that :

These distinctions are conveyed by means of the level and type attributes on the <title> element. The present draft does not attempt to constrain the values of these attributes in any way; at a later stage it may be preferable to define an explicit set of enumerated values, or to use the enhanced facilities offered by the XML Schema language for this purpose.

Similarly, at this stage, no thought has been given for further defining the content of the <title> element, for example to distinguish foreign words or proper names embedded within it.

The <title> element is defined as follows:

<!-- *  *-->
<!ELEMENT title (#PCDATA) >
                   level CDATA  #IMPLIED> Statements of Responsibility

The Metadata Framework document proposes use of an <author> element, or (following Dublin Core practice) extending the scope of this term to include any form of intellectual responsibility for the content of an object. Our suggestion here is to recognise the diversity of kinds of responsibility which it may be judged appropriate to record for digital objects by providing a generic repeatable <respStmt> element.

Each <respStmt> element has at least two subcomponents: a <resp> element which specifies the kind of responsibility concerned, and a <name> element, which specifies the name of the agency (person, organization, etc.) responsible, and which might also take attributes to specify the type of name (corporate, controlled, etc ) and a regularized or encoded form.

The <respStmt> element and its components are defined as follows:

<!-- *  *-->
<!ELEMENT respStmt (resp,name+) >
<!ELEMENT resp (#PCDATA) >
<!ELEMENT name (#PCDATA) >
                  reg   CDATA  #IMPLIED> Publication Statement

The purpose of the <publicationStmt> element is to record information about the agency responsible for distribution of the object at the time of its ingest into the Archive. This information is important for purposes of resource description, and also as a starting point for rights management information, and it is therefore mandatory within the CEDARS aip.

The <publicationStmt> element has the following components, which should be given in the order specified:

<name>name of the agency of publication or distribution; this is the same element as that described in above; mandatory.
<publDate>date of publication of the resource; mandatory.
<publPlace>place of publication; mandatory.
<publID>any identifier or name used by the publisher additional to that cited elsewhere in the CEDARS metadata; optional.

At some stage, it will be important to establish cataloguing rules or other mechanisms (such as authority files) to ensure consistency in such matters as the naming of places and dates. We do not address this issue at this stage.

The <publicationStmt> element and its components are defined as follows:

<!-- *  *-->
<!ELEMENT publicationStmt (name, publDate, publPlace, publID?)>
<!ELEMENT publDate (#PCDATA)>
<!ELEMENT publPlace (#PCDATA)>
<!ELEMENT identifier (#PCDATA)>

3.1.2 PDI: Reference: Label

In addition to titles, publishers' identifiers, and CRIDs, a digital resource may also carry a number of other identifiers or labels. Typically, these will be assigned by someone other than the publisher, and will persist throughout the object's life. Any number of such identifiers (examples include International Standard Book Numbers, or Universal Resource Names) may be associated with an AIP, using the <refLabel> element. This bears a scheme attribute to specify the scheme from which the identifier concerned was taken, as in the following example:

  <refLabel scheme="isbn">123.456.789</refLabel>
  <refLabel scheme="uri">foo.bar</refLabel>

The <refLabel> element is defined as follows:

<!-- *  *-->
<!ELEMENT refLabel (#PCDATA) > 
<!ATTLIST refLabel scheme (isbn|uri|url|other) "uri">

3.1.3 PDI: Reference: Other Record

It will increasingly be the case that rich descriptive metadata is either provided together with a digital resource or is created before the resource is archived. Such rich metadata may be stored within the AIP itself in any of the following ways:

Examples of each kind of strategy :

<!-- to be supplied -->

The <otherRecord> element is defined as follows:

<!-- *  *-->
<ELEMENT otherRecord    (#PCDATA) >
<ATTLIST otherRecord refCrid CDATA #IMPLIED
                      object ENTITY #IMPLIED
                      type    CDATA #IMPLIED    >

3.2 Context Information

If this element is to be used as a distinct part of the AIP, its function and the distinction between it and information held in the <provenance> element needs further clarification.

The present proposal assumes that there is a requirement to document simply the function of the archived object here, i.e. the reasons for which it was decided to give it long term preservation. This information is supplied within a <function> element, which consists of one or more paragraphs or descriptive notes, tagged with the <p> element. For the moment, we assume no substructure within the <p> element, although almost certainly there will be a requirement for elements to mark e.g. foreign words, lists, highlighted phrases etc.

If required, any number of <relatedObject> elements may be included within a <context> to supply details of other significantly related objects. If these objects have themselves been ingested to the archive, their CRID should be supplied, using the refCrid attribute, but this is not mandatory.

The <context> element and its components are defined as follows:

<!-- *  *-->
<ELEMENT context           (function, relatedObject*)>
<ELEMENT function          (p+)>
<ELEMENT relatedObject     (p+)>
<ATTLIST relatedObject refCrid CDATA #IMPLIED >
<ELEMENT p                 (#PCDATA)>

3.3 Provenance Information

The <provenance> element is used to record information about four distinct aspects of the archived object, each of which forms the content of a distinct element:

<origin>information about events affecting the archived object before its inclusion within the archive;
<management>information about events affecting the archived object after its inclusion within the archive;
<environment>information about the environment within which the archived object was originally used;
<rightsDesc>information about intellectual and other property rights relevant to the archived object.

Here is an example <provenance> element:

<!-- to be supplied -->

The <provenance> element is defined as follows:

<!-- *  *-->
<ELEMENT provenance (original?, management, environment, rightsDesc)>

3.3.1 PDI: Provenance: Origin

The <origin> element may be used to record information about the original function of the archived object, including the reason for its creation, its original status and usage etc., where these are not described by the <context> element. It should also be used to record any detailed information available about processes carried out on the object before it was included within the archive.

The following elements are used to represent actions or processes carried out on an archived object:

<action>describes any one event or action during the custodial or original history of an archived object
<date>contains a date
<change>contains a description of some event or action, or a note indicating that no action was performed
<respStmt>contains a statement of responsibility (this is the same element as that described in above)

The <origin> element has two parts, an optional <function> element, which contains one or more <p> elements describing the various categories of information suggested at the start of this section, and a second more structured <origHist> element, which consists of a series of one or more <action> elements organized as described above. Each <action> element contains a date, a change, and a responsibility statement, as in the following examples:

<!-- to be supplied -->

The <origin> element and its components are defined as follows:

<!-- *  *-->
<!ELEMENT origin         (function?, origHist) >
<!ELEMENT function       (p+)>
<!ELEMENT origHist       (action+)>
<!ELEMENT action         (date, change, respStmt+) >
<!ELEMENT date           (#PCDATA) >
<!ELEMENT change         (#PCDATA) > 

3.3.2 PDI: Provenance: Management

Management information, whether relating specifically to ingest, or to post-ingest custody, is similarly recorded using the <action> element introduced in 3.3.1 . Actions relating to the ingest of an object are grouped together in date order within a mandatory <ingest> elements; subsequent custodial actions (if any) are grouped within a <custHist> element, as in the following example:

<!-- to be supplied -->

The <management> element and its components are defined as follows:

<!-- *  *-->
<!ELEMENT management     (ingest, custHist) >
<!ELEMENT ingest         (action+) >
<!ELEMENT administration (action*)>

3.3.3 PDI: Provenance: Environment

Information about the original environment in which an archived object was deployed may be of interest even in the presence of detailed representation information. In its absence, it provides essential input to emulation software. We propose here a slightly simpler model than that described by the current Metadata Framework document.

The <environment> element holds prose description of three different categories of information relating to an object's original operating environment, which may be mixed in any way appropriate:

<prerequisite>describes a single hardware, software, or operating system component originally necessary for the successful deployment of an archived object
<procedure>describes a single operational procedure originally necessary either to run or to install an archived object
<documentation>describes any associated documentation

To distinguish amongst different types of <prerequisite>, it is proposed that rather than employing discrete elements, a type attribute with specific values (HW for hardware, OS for operating system, etc.) should be used. Similarly, a type attribute may be used to distinguish different kinds of procedure.

The present model provides no way of indicating whether the presence of, for example, multiple <procedure> elements within an <environment> should be interpreted as implying that all are needed, or that any one is sufficient. It is also at least arguable that this element more logically belongs within the <origin> element discussed in 3.3.1 . These and other refinements of the model remain important areas for further work.

The <environment> element and its components are defined as follows:

<!-- *  *-->
<!ELEMENT environment (prerequisite|procedure|documentation|p)* >
<!ATTLIST environment n CDATA #IMPLIED> 
<!ELEMENT prerequisite (#PCDATA) >
<!ATTLIST prerequisite type (HW|OS|SW) "SW"> 
<!ELEMENT procedure (#PCDATA) >
<!ATTLIST procedure type (run|install) "install" > 
<!ELEMENT documentation (bibl+)>
<!ATTLIST documentation scheme CDATA #IMPLIED
                           value CDATA #IMPLIED   > 
<!ELEMENT bibl (#PCDATA) >

3.3.4 PDI: Provenance: Rights Description

The present metadata framework document identifies a large number of categories of information relating to the storage of IPR information within the PDI. At least one such category (the <publicationStmt>, see ) is already provided for elsewhere within the PDI and therefore is not repeated here. The present proposals simplify the rest considerably, but are not fully worked out.

The <rightsDesc> element has the following components:

<negotiation>contains a description of the negotiations leading to submission of the digital object for preservation.
<iprStmt>contains a statement of the copyright position applicable to the digital object.
<actor>names a person or class of persons permitted to perform one or more actions according to the rights specified, together with the set of actions permitted.

The <negotiation> element if present consists of paragraphs of prose description; it should also contain dates.

The <iprStmt> element has two subcomponents:

<rightsWarning>contains the text of any standard IPR warning relating to the archived object
<otherRightsHolder>contains the name and contact details of an agency owning any other intellectual rights in the archived object, together with a description of those rights

The <actor> element also has two subcomponents:

<name>contains the name of an individual or a class of individuals
<permittedAction>describes an action or class of actions permitted

A type attribute is used to distinguish <permittedAction> elements where the action is permitted by licence, by statute, or for some other reason. Additionally, an authority attribute is used to supply a pointer to the authority by which the action is permitted. In the case of an action permiited by statute, this will be a piece of legislative text; in the case of an action permitted by licence, it will be a specific licence or contract. Both the metadata framework and the current document are vague as to the exact location of the text so indicated: this remains an area for further work!

The <rightsDesc> element and its components are defined as follows:

<!-- *  *-->
<!ELEMENT rightsDesc (negotiation, iprStmt, actor+)>
<!ELEMENT negotiation (p+)> 
<!ELEMENT iprStmt (rightsWarning, otherRightsHolder*)>
<!ELEMENT rightsWarning (#PCDATA)>
<!ELEMENT otherRightsHolder (#PCDATA|name)*>
<!ELEMENT actor (name+, permittedAction) >
<!ELEMENT permittedAction (#PCDATA)>
<!ATTLIST permittedAction type (statute|licence|other) "statute"
                        authority CDATA #IMPLIED>

3.4 Fixity Information

OAIS defines fixity as information which can be used to validate the authenticity of information extracted from an archived object, such as a digital signature or other authenticating data.

The <fixity> element within the AIP is used to hold one or more <fixityIndicator> elements, each of which contains specific data required for a particular type of authentication procedure applicable to the archived object, and attributes to authenticate the values quoted.

Further work is needed to define the range of attributes and values appropriate for a variety of authentication mechanisms.

The <fixity> element is currently defined as follows:

<!-- *  *-->
<ELEMENT fixity (fixityIndicator+) >
<ELEMENT fixityIndicator (#PCDATA)>
<ATTLIST fixityIndicator type (digsig|other) "digsig"
                          signatory CDATA #IMPLIED
                          authority CDATA #IMPLIED >