<?xml version="1.0"?>
<!DOCTYPE TEI.2 PUBLIC "-//TEI//DTD TEI Lite XML ver. 1//EN" "../../../../tei-emacs/xml/dtds/tei/teixlite.dtd">
<TEI.2><teiHeader><fileDesc><titleStmt><title>Metadata for corpus work</title></titleStmt><publicationStmt><p>first dradft</p></publicationStmt><sourceDesc><p>none</p></sourceDesc></fileDesc><revisionDesc><change><date>11 feb 03</date><respStmt><name>LB</name></respStmt><item>first draft</item></change></revisionDesc></teiHeader><text><body><div><head>What is metadata and why do you need it?</head><p>Metadata is usually defined as <q>data about data</q>. It is no exaggeration to say that without it, analysis of language corpora is virtually impossible.  </p><p>A typical corpus analysis will gather topgether  many examples of linguistic usage., each  taken out of thje context in which it occurred and presented tpgeyjer with a number opf others. Metadata is the minformation which specifies that context: at its simplest </p><p>In many kinds of corpus analysis, the objective is to detect patterns of linguistic behaviour which are common to particular groups of texts. Sometimes, the analyst  examines occurrences of particular linguistic phenomena across a broad range of language samples, to see whether certain phenomena  are more characteristic of some categories of text than others.  Alternatively, the analyst may attempt to characterize the linguistic properties or regularities of a particular pre-defined category of texts. In either case, it is the metadata which defines the category of  text; without it, we have no way of distinguishing or grouping the component texts which make up  a large heterogenous corpus, nor  even of talking about the properties of a homogenous one. </p><p>Even where the purpose of the corpus analysis is simply to nothing to do with </p></div><div><head>The scope of metadata: what might it include?</head><p>There are at least four primary kinds of metadata of use in language work. <list><item>identification of the corpus text such as its  title and source</item><item>editorial information about the relationship between the corpus text in its original and encoded forms </item><item>classificatory information, concerned with internal or external properties of the corpus text</item><item>documentary information about thecorpus text, such as the date it was last revised, or  the filename used to store it </item></list></p></div><div><head>How to represent metadata</head><p><?xm-replace_text {p}?></p></div></body></text></TEI.2>

