On the hermeneutic implications of text encoding

Lou Burnard

Humanities Computing Unit, Oxford University

Abstract for Literature, Philology and Computers, University of Edinburgh, September 1998

1. Hermeneutics

hermeneutics: the art or science of interpretation, esp. of Scripture. Commonly dist. from exegesis or practical exposition. (Shorter Oxford Dictionary)

In Greek myth, Hermes was responsible for explaining the messages of the gods to mere mortals; his name was applied also to Thoth the Egyptian god of arcane wisdom, who as Hermes Trismegistus, the scribe of the gods, is credited with the authorship of the 42 hermetic scriptures embodying all knowledge.

In our time, ownership of hermeneutic expertise forms the basis of the recurrent struggle between Catholic and Protestant, and then between church and state, as the focus of hermeneutics has moved beyond the arcane to the generality of human experience, under the direction of Heidegger and Schleirmacher, before refocussing on the textual and the linguistic in the works of Gadamer and Ricoeur. The hermeneutic circle remains intrinsically mystical: to understand a part, its function in the whole must be clear; yet the function of the whole can only be derived from an understanding of its parts. Every explication thus becomes at once an exploration of the other at the same time as a project of self-discovery.

Hermeneutic activities are at the core of the humanities: without explication and interpretation the objects and processes of human culture have no more intrinsic value than the rags, clay, or stone of which they are composed, or with which they are performed. Moreover, whereas physical scientists have only recently begun to wonder whether or not their perceptions of the natural world might be affected by some kind of an observer effect, we humanists have known all along that we are ourselves the objects of our study. To understand the human, one must be human: (whatever it is that language-understanding systems understand, it is not language). This is not, of course, to ground our hermeneutics in any particular set of subjectivities -- authorial intention, historicism, marxism, or any other unfashionable -ism that may come along -- but equally certainly not to retreat into mere relativism and anti-realism. Texts and other cultural artefacts are invested with meaning by our use of them, but not all meanings are equally useful or valid. The business of hermeneutics cannot exclude a value system.

Hermeneutics offers a way of both confering value on artefacts from other times and cultures and of interpreting the value so conferred. As Gadamer has shown, this is best achieved not by submerging the reader's own context in some hypothetical reconstruction of the work's "original" or "authentic" context, (a process which would imply the possibility of closure) but rather by engaging with the text with our current context and prejudices intact. The works of art we explicate are in large part the history of their own explication, whether by us or by those interpreters who preceded us, whose explications we inherit, react against, or re-affirm: this continuity is what we call tradition, and the spadework needed for its establishment is an important part of the business of scholarship.

Modern hermeneutics however goes beyond simple antiquarianism, or the explication of alien cultures; it has for example a psychological role in facilitating the exploration of alternative modes of perception, a social role in broadening and enriching experience, a political role in motivating social change and much more besides. In this paper I will argue for the centrality of text encoding to at least some part of the hermeneutic enterprise.

2. Text Encoding

Structuralist critics such as Greimas and Todorov take from transformational grammar the notion of an underlying formal system from which texts are generated or encoded; similarly, many workers in NLP have used "text grammars", "frames", "knowledge representations" and similar devices as means of expressing an abstract text from which many concrete texts may be generated. To date such efforts have been successful only in limited and highly formalized domains, (though the number of such domains in which linguistic competence is required continues to grow). More convincingly, Barthes has demonstrated the existence of multiple coding systems, evidenced by the reading of a text.

It is in this latter sense that we appropriate the term encoding for the seemingly mechanical activity of inserting markup into a digitized text. The markup inserted evidences a particular reading of the text, which may relate to any number of distinct aspects -- for example, its structural organization, its affect, its original format, or its context. We might describe it is the channel by which a (human) reading of the text is converted into a set of codes on which a (computer) reading of the text can be performed, if we were willing to entertain the notion that what a computer does when processing a text might be called "reading".

A text, digitized or not, requires interpretation, requires a hermeneutic. Applying Zipf's principle of least effort, we should not be surprised to find that our interpretive procedures tempt us continually towards unfashionable form/content and type/token distinctions: hermeneutics is hard, and cognition is pattern-based. A formal markup system (whether expressed as printing conventions, editorial guidelines, or an SGML dtd) helps us make such distinctions, by defining the background against which deviations can be identified, by providing the form into which and against which content is realized.

It is probably useful to ask how a digital encoding differs from any other, though the only answers I have are rather unexciting. First, and most obvious, instances of digitally encoded texts are paradoxically both easier and more difficult to maintain than non-digital ones. In either case, preservation requires a continuity of comprehension, a continued hermeneutic tradition. Digitally encoded texts, because of the ease with which they may be transferred from one medium to another, offer at least the opportunity of indefinitely long preservation, though at the price of continued maintenance. This ease of transfer is however conditional on a separation of medium and message, and thus entails some form of information loss or transformation at each step: it is only to the extent that our encodings are medium-independent therefore that we can be reasonably sure of maintaining our hermeneutic tradition.

Secondly, I believe that instances of digitally encoded texts may be read in qualititatively different ways from others. I noted above that only a play on words allows us to describe the processing of digital text by computer as "reading", but there are at least two respects in which human modes of interaction with digitally stored texts seem to redefine that notion. The first, and most frequently commented on, is the way in which the digital representation of texts facilitates decentred, non-linear, fragmented, and associative modes of cognition. I want to redress slightly the balance of attention paid to this aspect, by commenting more on a second facility which digitization offers: the ability to apprehend language use in a non-narrational, micro-contextual manner. Whether we use specifically designed language corpora, or the web itself, we now have access to evidence of language use grounded in something far larger and more varied than the experience and wisdom of even the the most informed and assiduous of readers. This does not of course fundamentally alter the interpretative process, but it opens up new opportunities and new kinds of evidence, much of it crucially dependent on the existence of formally standardized markup.

In summary, this paper asserts that, far from being peripheral or in opposition to the humanistic endeavour, text encoding and markup alike are central to it. Markup is the best tool at our disposal for ensuring that the hermeneutic circle continues to turn, that our cultural traditions endure.