Previous
Up
Next
2 Locating parts of element content

2 Locating parts of element content

The TEI extended pointer syntax is most reliably used to locate particular SGML element (or pseudo-element) occurrences. In the TEI scheme any SGML element can bear an ID attribute, which (together with the tree location methods described above) means that this is less of a restriction than it might appear.

Sometimes however the target of a cross reference does not correspond with any particular feature of a text, and so may not be tagged as an element, or its position within the SGML document tree is not reliably known. If the desired target is simply a point in the current document, the easiest way to mark it is by introducing an <anchor> element at the appropriate spot. If the target is some sequence of words not otherwise tagged, the <seg> element may be introduced to mark them.

In the following (imaginary) example, <xref> elements have been used to represent points in this text which are to be linked in some way to other parts of it; in the first case to a point, and in the second, to a sequence of words:

Returning to <xref from=id(ABCD)>the point where I dozed off</xref>, I noticed that <xref from="id(EFGH)"> three words</xref> had been circled in red by a previous reader

This encoding requires that elements with the specified identifiers (ABCD and EFGH in this example) are to be found somewhere else in the current document. Assuming that no element already exists to carry these identifiers, the <anchor> and <seg> elements might be used:

  .... <anchor type=bookmark id='ABCD'> ....
   ....<seg type=target id='EFGH'> ... </seg> ...

An alternative approach, useful when identifiers or other markup cannot be introduced into the target document, is to use the string, token, or pattern location methods provided in the TEI extended pointer syntax by the following keywords:

These three methods should not be used to count across element boundaries: they are provided chiefly to locate fine detail within a given document element, where such points are not already explicitly marked up. The token and str methods are defined as behaving in exactly the same way as the HyTime dataloc method, with quanta token and str respectively. The syntax used to define pattern locations is (yet another) subset of the regular expression syntax used by most Unix systems.

Some examples follow:

<p>This <xptr from="HERE token(3 5)">is not a very good idea.
selects the three tokens `a very good'.
<p>This <xptr from="HERE str(3 5)">is not a very good idea.
selects the string` no' (i.e. space, n, o)
<p>This <xptr from="HERE pattern([aeiou][aeiou])">is not a very good idea.
selects the first pair of adjacent vowels following the pointer, i.e. the string`oo' in `good'

Thus, assuming that the three words circled in red in the example above occurred at the start of the third paragraph in the chapter with identifier `C5', a pointer like the following would point to them:

I noticed that <xref from="id(C5) child(3 p) tokens(1 3)"> three words</xref> had been circled in red by a previous reader


Previous
Up
Next