On the hermeneutic implications of text encoding

Lou Burnard

November 1998

This paper explores the meaning of the term hermeneutics, proposing for it an important new role in the applications of information technology within the humanities. The advent of new markup technologies, I claim, offer new possibilities for the preservation of the interpretative processes lying at the centre of humanistic scholarship. Markup theory is thus at the heart of the humanistic tradition, rather than an incidental technology or an irrelevant appendage to it.

1 Defining hermeneutics

The word hermeneutics has a long and informative history. In exploring its senses, as for those of other unfamiliar words, we might begin by recourse to a dictionary: the Shorter Oxford for example informs us that hermeneutics is ``the art or science of interpretation, esp. of Scripture. Commonly dist. from exegesis or practical exposition.''. The lexicographer who composed that entry seems to have been more than usually cautious -- are we dealing with an art or a science here? and is hermeneutics really best defined by contrast, as something inherently impractical (unlike, presumably, the eminently practical pursuit of biblical exegesis)? But the meaning of a word is often more than can be summed up by a lexicographer, however cautious or eminent. Learned words like this one frequently carry with them senses more directly derived from their etmological roots, and this one is no exception.

In Greek myth, Hermes was the mediator, responsible for explaining the messages of the gods to mere mortals, whence the Greek word hermeneus, an interpreter. In Egypt, his identity was merged with that of Thoth, the god of arcane knowledge and wisdom who later, in the shape of Hermes Trismegistus (Hermes the thrice-powerful), is credited with authorship of the hermetic scriptures in which all knowledge is embodied. The notion that all knowledge can be encapsulated in written texts, from which it may be recovered only by dint of divine mediation, is thus a very ancient one. It is also very modern: we see it in the seductive myth of the all-inclusive `virtual library', and also in the ways we privilege certain modes of interaction with text (`scholarship') above others (`reading'). A distinguishing feature of scholarship (as opposed to mere reading) seems to be provision of access to (or evidence of) the hermetic knowledge preserved by the hermeneutic tradition, made tangible in print form by such characteristically academic modes of discourse as the monograph, the footnote, or the critical edition. This association with the arcane seems particularly appropriate in a term which appears most frequently in the British National Corpus as an example of a word which is hard or impossible to understand.

What is incomprehensible for a secular rationalist is often revelatory for a mystic, and it is surely no coincidence therefore that we find a quasi-mystical paradox at the heart of much thinking about hermeneutics: we read, for example, of the hermeneutic circle, an intrinsically mystical notion, derived from the observation that to understand a part, its function in the whole must be clear; yet the function of the whole can only be derived from an understanding of its parts. Every explication thus becomes at once an exploration of the other, while remaining at the same time as a project of self-discovery.

Viewed historically, the focus of hermeneutics shifts from the divine to the secular, in tandem with that of society as a whole. If the business of hermeneutics for the 18th century and beyond was largely confined to biblical exegesis, under the influence of theorists such as Schleirmacher and Dilthey, it has become by the end of the 19th century the basis of a general methodology for the humanities.

In attempts to explicate how interpretations work, we see a similar shift from the presupposition of some kind of universal empathy (characteristic of 19th century romanticism) to a more linguistically-based notion of universal pragmatism, characteristic of 20th century philosophies of discourse and social interaction. In our own time, we now find the term applied in fields such as medecine (which talks of a hermeneutic approach to nursing) and social anthropology, as well as the more predictable and profound role it plays in linguistics. Like that other hard word ontology, it cannot be long before we find it entering in the technical vocabulary of computer science and natural language processing.

This is surely because, in the post-modern and post-structuralist world, hermeneutics has become a key part of what might be termed cultural cognition. Cultural objects do not simply require an interpretation, it is the act of bestowing that interpretation which validates their status as cultural objects in the first place. Texts, and other artefacts alike, are invested with meaning by our use of them, and it is therefore interpretation alone which confers value on them. Small wonder that Derrida, citing Montaigne, takes it as self-evident that ``We need to interpret interpretations more than to interpret things''.

If hermeneutics is the study of interpretation itself, it seems useful to investigate the goals of that process: what is the object of the hermeneutic process? Many of those goals seem to have been discredited by current thinking: we no longer see the objective of our analysis as being to uncover an eternal verity. In the more restricted world of literary criticism, we have seen such goals as the establishment of authorial intention, of the original authentic context, or the effect on an ideal reader, all become increasingly unfashionable.

This is only partly because the observer effect -- still a novelty in the sciences but central to the humanities -- begs the question as to whose interpretation it is we are seeking to apply. There is ample evidence that not all interpretations are equally useful or have equal explanatory force, yet on what grounds do we decide which interpretations can be disregarded?

The British semiotician Daviel Chandler remarks somewhere: ``We cannot write `without bias', but we can learn to become more aware of our biases, to make them more explicit for others, and to reflect critically on their implications. This is an aim which tends to distinguish social science from such arts as literary criticism.'' Although most literary critics of my acquaintance might probably wish to question the implied slur of the latter sentence, they they would generally endorse the force of the former.

It is an odd characteristic of the way we currently deal with our written heritage that, despite the debunking of author-ity, the canon remains alive and well in the marketplace. We may no longer identify literature exclusively with the production of dead white european males; instead however we create new canonical collections, of pop culture, of Afro-American literature, of womens writing. Canonicity itself, the desire to catch the whole of some class of valued cultural phenomena, often defined by exclusion, seems inescapable.

A curious enthusiasm surrounds us for reconstructions of imagined past times, whether in the fad for music performed on `authentic' instruments, or in unthinking reactions against modernism in architecture. I suggest this antiquarianism is also at the root of our contemporary desire to value the reproduction of allegedly primary textual sources, unmediated by the skill of the textual critic. My point here is not to suggest that such activities are worthless, but simply to call into question their ostensive claim to a higher purer reality. This seems to me (as it once did to Pope) to be dignifying antiquarianism with a higher purpose. Wittgenstein reminds us that if lions could talk we would not understand them: we should equally bear in mind that even if we could go back to the first night of King Lear at the Globe we would not be be to erase from our memories all the 300 years interpretation of that text which have intervened -- or if we could, it would be at the loss of our own identity.

A vigorous skepticism thus remains necessary. We would do well to bear in mind the etymological connexion between the words ``text'' and ``textile'', between what is written and what is woven. Every text should be seen as a tissue of explication, reacted against or reaffirmed, in the light of a continued tension between continuity and tradition, the driving force of this particular loom. It is part of the nature of (for example) women's writing that it has been systematically undervalued at several periods of history.

The hermeneutic act thus seems to have a crucial role in mediating and determing our experience of cultural objects. In selecting interpretations of such objects, we seek to explain those Others who created them but also to explain ourselves and our tangled reactions to them. In this complex business, hermeneutics has an important social function, not simply in broadening and enriching individual experience of the world, but also in motivating social coherence and social change. It is at the very heart of humanism, and of human society.

2 Text, textuality, encoding

One suggestive insight gained by investigating the difference between speech and writing seems to be the extent to which both forms of text seem to depend on semiotic systems beyond their immediate constituents. In speech, contextual features such as the relationship of the speakers to each other or their surroundings have at least as important an explicative role as what they actually say. In writing, the physical appearance of a text, the medium by which it is presented, and its audience's expectations of such forms are of equally great significance. (Some have even famously asserted that that the ``medium is the message''). It is not for purely technical reasons therefore that we require of scholarship an understanding of the relationship between the technologies of text and their application, as well as the historical results of that process.

In this section I focus on the semiotic aspects of hermeneutics, in the specific field of text encoding. I begin with a brief attempt to identify key characteristics of the coding systems associated with texts, whether these may be said to exist within, behind, or amongst, texts.

It seems self evident that a text has at least three major axes along which we may attempt to analyse it, and thus at least three interlocking semiotic systems. A text is simultaneously an image (which may be transferred from one physical instance to another, by various imaging techniques), a linguistic construct (which may equally be encoded using different modalities, as when a written text is performed), and an information structure (it has semantic content relating to a perception of the world at large). It may be noteworthy that these three dimensions seem also to be reflected in three different kinds of software: word processing software focussing on the appearance of text, text retrieval software focussing on its linguistic components, and database systems focussing on its `meaning'.

Texts and their meanings are not however to be constrained by the capabilities of software. They remain defiantly both linguistic and physical objects; their formal organization may seem to be linear but is generally not, being characterized by multiple hierarchic structures and interlinked components. Moreover, as cultural objects, they are at once products of and definers of specific contexts. (By context I mean here not simply a consideration of the agency carrying intellectual responsibility for a text, but also its intended, presumed, or actual audience, its intended or assumed function, and so forth. And in a highly textualized society such as ours, no text is an island: an important aspect of any text is thus the properties it shares with other texts, the reference it makes to itself and to others, its inter-textuality. And the same is true of the readings of texts.

The scope and variety of the encoding systems we need to envisage in developing a unified account of the way that hermeneutics works in texts may thus seem very large indeed. The claim of this paper is however that a unified approach remains feasible. As an example, we consider a much studied piece of parchment, sometimes known as MS Cotton Vitellius A xv, a rather poor representation of the start of which appears here. [Image of the first few lines of the ms] What exactly is going on when we process this image, when we make an interpretation of it? Clearly, there is a mapping process in which the various visual signals here are classified as either irrelevant noise or as signifiers of some kind -- as letters, punctuation, decoration, and so on. A scholarly reading goes further, identifying not just discrete letter forms, but also forms which appear to be discrete but are in fact common variants of each other (such as upper and lower case, italic and bold, etc). Structural signifiers -- the use of white space between words, in this example -- must also be identified. Not to labour the obvious, it is interesting to note that in a printed or written text the mapping between signifier and signified is fixed and conventional: though it may become inaccessible or misunderstood, it is not inherently flexible.

Here then, is one fixed reading of the above text (based on Wrenn's edition of 1953) :

Hwæt! we Gar-Dena in gear-dagum
þeod-cyninga þrym gefrunon,
hu ða æþelingas ellen fremedon.
Oft Scyld Scefing sceaþena þreatum,
5monegum mægþum meodo-setla ofteah;
egsode Eorl[e], syððan ærest wearð
feasceaft funden...

In this printed rendition, white space and lineation are used to flag explicitly the boundaries of metrical units (lines, stanzas, and even the hemistiches of Old English verse) not actually explicit in the manuscript. These units are the result of an act of interpretation; they both represent and determine a particular reading. The particular mapping chosen for each visual signal is informed by expectation, convention, and often somewhat arcane knowledge: we call this transcription, or editing and it is, I contend, the first step in a hermeneutic continuum. For many scholars, better experienced than I, the distinction between transcription and editing is a clear one, largely defined by differing goals. Transcription aims to represent the actual object, while editing aims to represent an idealized version of the object, which may never have existed in physical form. It seems to me, however, that while these goals may be clearly distinguishable, the processes by which they are achieved seem strikingly similar, involving the same essentially interpretative relationship between the agent/reader and the object/text.

This is particularly evident in the process of making a digital transcription. A digital transcription comes about as the result of applying a fixed selection from the many possible interpretive strategies which might be applied, effectively reifying the mapping chosen. Before a text can be encoded, it must first be decoded. This decoding process implies a selection from the many features implicit in the reading of a text, and their re-encoding in explicit and unambiguous terms.

For example, we might choose to encode the manuscript lines above in either of the two following ways (amongst many others)

<lg><l>Hwæt we Gar-Dena in gear-dagum</l>
<l>þeod-cyninga þrym gefrunon,</l>
<l>hu ða æþelingas ellen fremedon.<l></lg>
<lg><l>Oft Scyld Scefing sceaþena þreatum,</l>
<l>monegum mægþum meodo-setla ofteah; </l>
<l>egsode Eorle, syððan ærest wearð</l>
<l>feasceaft funden..
In this version, letter forms are normalized, by means of entity references where necessary, and spacing is silently normalized. Most strikingly however, the metrical structure is made explicit by the addition of tags which mark the boundaries of verse lines and stanzas. Much information about the original lineation and rendition is lost, but much information not explicitly present in the original is added. Contrast this with the following encoding:
<lb n=1><hi rend='caps'>&H;&wynn;æt we 
<lb n=2>in gear-dagum þeod cyninga
<lb n=3>þrym ge frunon huða 
æþelinga&s; ellen
<lb n=4>fremedon. oft scyld scefing 
<lb n=5>þreatum, moneg<expan sic=&ubar;>um</expan> 
mægðum meodo-setla
<lb n=6>of<damage desc=blot>teah egsode <sic corr=Eorle>eorl</sic>
syððan ærest wearð
<lb n=7>feasceaft funden...

In this second version, a rather different set of decisions has been taken. Again, the individual characters and interword spacing have been normalized, though the linguistically invisible space between ``ge'' and ``frunon'' has here been made explicit, and a slightly larger set of entity references has been used to make explicit some extra characters such as the peculiar initial ``H'', and the wynn following it. The lineation of the original has been represented explicitly using the <lb> tag, to which numbering has been added for convenience of reference, and the use of uppercase for the first few words has been made explicit. Moreover, letters not visible because of damage to the carrier (such as the ``na'' needed at the end of the fourth line) or manuscript convention (such as the ``um'' of ``monegum'' indicated in the manuscript by the breve mark) have been added, together with an explicit indication of their status by means of the <add> and <expan> tags respectively. The presence of an inkblot is signalled by the <damage> tag, and the <corr> tag has been used to make explicit a fairly non-controversial but still conjectural editorial emendation (``Eorle'' for ``eorl'' in line 5).

Neither of these digital versions is in any sense `wrong' or even necessarily `superior' to the other: they merely reflect differing priorities, differing research agendas, and consequently differing markup schemes.

3 The scope of markup

The term ``markup'' covers a range of interpretive acts. Like other semiotic systems, markup has its own lexis and its own syntax. The former determines which features are available for marking, the latter how those features co-exist; we focus here on the former. It seems clear that no violence is done to the term markup if we give it a rather wide ranging scope. We may use it to describe the process by which individual components of a writing or other scheme are represented, and for the simple reduction to linear form which digital recording requires. We can also use it for the more obvious acts of representing structure and appearance, whether original or intended. And markup is also able to represent characterizations such as analysis, interpretation, the affect of a text, or the contexts in which it was or is to be articulated -- the metadata associated with it. Since the range of such features is now more or less co-extensive with the range of interesting things one might want to say, the term is probably in need of some subcategorization. We therefore propose here three broad classes for the myriad textual features which text markup may make explicit:

Some typical compositional features include the formal structure of a text -- its constituent sections, chapters, headings etc., as well as its linguistic structure -- its constituent sentences, clauses, words, morphemes etc. From a different perspective, we might identify as compositional features the components of a text's discourse structure -- its exchanges, moves, acts, etc. A third view concerns itself more with the ontological status of a text's composition: its constituent revisions, deletions, additions etc., or its history as a shifting nexus of discrete fragments.

Some typical contextual features include a consideration of the agencies by which a text came into being or is identified as such (its author, title, publisher...) and of the situation in which it is experienced (the intended or actual audience, the mode of performance itself, the predefined category of text to which it explicitly or implicitly belongs...). Some may be identifiable only externally (its subject, text-type, mode), while others are internal (size, encoding, revision status)

Some typical interpretive features include linguistic properties such as morpho-syntactic classifications, lemmatization, sense-disambiguation, identication of particular semantic or discourse features, and in general all kinds of annotation and commentary, for example associating passages in one text with passages in another, or citing instances of a more abstract knowledge structure.

Despite the convenience of this kind of triage, it has to be stressed that at bottom all markup is interpretive. In most encoded texts, features of all three kinds typically co-occur. For example, the emendation of ``eorl'' to ``Eorle'' requires an understanding of both morphological information (a plural noun is appropriate) and semantic information (the sense "prince" is inappropriate here); its ontological status as an emendation is also important.

It now should be apparent why the availability of a single encoding scheme, a unified semiotic system, is of such importance to the emerging discipline of digital transcription. By using a single formalism we reduce the complexity inherent in representing the interconnectedness of all aspects of our hermeneutic analysis, and thus facilitate a polyvalent analysis.

Markup has however another function, in some ways a more critical one. By making explicit a theory about some aspect of a document, markup maps a (human) interpretation of the text into a set of codes on which computer processing can be performed. It thus enables us to record human interpretations in a mechanically shareable way. The availability of large language corpora enables us to improve on impressionistic intuition about the behaviour of language users with reference to something larger than individual experience. In rather the same way, the availability of encoded textual interpretations can make explicit, and thus shareable, a critical consensus about the status of any of the textual features discussed in the previous section for a given text or set of texts. It provides an interlingua for the sharing of interpretations, an accessible hermetic code.

If we see digitized and encoded texts as nothing less than the vehicle by which the scholarly tradition is to be maintained, questions of digital preservation take on a more than esoteric technical interest. And even here, in the world of archival stores and long term digital archiving, a consideration of hermeneutic theory is necessary. The continuity of comprehension on which scholarship depends implies, necessitates indeed, a continuity in the availability of digitally stored information. Digital media, however, are notoriously short lived, as anyone who has ever tried to rescue last year's floppy disk knows. To ensure that data stored on such media remains usable, it must be periodically `refreshed', that is, transferred from one medium to another. If this copying is done bit for bit, that is, with no intervening interpretation, the new copy will be indistinguishable from the original, and thus as usable as the original.

In that last phrase, however, there lurks a catch. Digital media suffer not only from physical decay, but also from technical obsolescence. The bits on a disk may have been preserved perfectly, but if a computer environment (software and hardware) no longer exists capable of processing them, they are so much noise. Computer environments have changed out of all recognition during the last few years, and show no sign of stabilizing at any point in the future. To ensure that digital data remains comprehensible therefore, simple refreshment of its media is not enough. Instead the data must periodically be `migrated' from one computer environment to another. Migration, in this context, is exactly analagous with the processes of decoding and encoding carried out by a human being when copying from one stored form of a text to another: there is a potential for information loss or transformation in both decoding and encoding stages.

Where digital encoding techniques may perhaps have an advantage over other forms of encoding information is in their clear separation of markup and content. As we have seen, the markup of a printed or written text may be expressed using a whole range of conventions and expectations, often not even physically explicit (and therefore not preservable) in it. By contrast, the markup of an electronic text may be carried out using a single semiotic system in which any aspect of its interpretation can be made explicit, and therefore preservable. If moreover this markup uses as metalanguage some scheme which is independent of any particular machine environment (for example international standards such as SGML, XML, or ASN1), the migration problem is reduced to preservation only of the metalanguage used to describe the markup rather than of all its possible applications.

5 Conclusions

Far from being peripheral or in opposition to the humanistic endeavour, text encoding and markup are central to it. Text encoding provides us with a single semiotic system for expressing the huge variety of scholarly knowledge now at our disposal, through which, by means of which, and in spite of which, our cultural tradition persists. Text markup is currently the best tool at our disposal for ensuring that the hermeneutic circle continues to turn, that our cultural tradition endures.

Automagically generated from the TEI Lite original by lite2html on 29 Nov 98