Proposal for Anonymous Block Element

C. M. Sperberg-McQueen

3 February 1997 , revised 4 March

TEI TC Core W04

1 Background

The DTD of TEI P3 defines a large number of element types, with a wide variety of meanings. In addition, it defines one element (<seg>), which has no specified meaning. The <seg> element may be used:

Because <seg> has no defined meaning of its own beyond that inherent in the concept of an SGML element type, it may be regarded as a sort of `anonymous' element type (by analogy with the anonymous functions provided by some programming languages).

The <seg> element can be used only for phrase-level elements, because <seg> is a member of class phrase. It thus can appear within paragraphs, etc. (strictly: within any element with a content model of paraContent, specialPara, or phrase.seq), but not between paragraphs, directly within text divisions.

It would be convenient to have an anonymous element type usable at the component level of documents; this would allow a cleaner markup of

This document proposes the creation of a new element at the chunk or inter level, for which the name <ab> (anonymous block) element is suggested, in order to answer this need. This element makes it possible for an encoder to make explicit the structural distinction between anonymous phrase-level elements and anonymous chunk-level elements.

2 Proposal

The element <ab> will be added to the additional tag set for linking and alignment, in section 14.3, which is where <seg> is defined.

It will have the following description:``<ab>: contains any arbitrary component-level unit of text''. As a member of the seg class, it will inherit attributes type and ident (this last should be given a more meaningful name: function is proposed). Like <seg>,<ab> should also take an additional attribute subtype, with the description``provides a subcategorization of the text block, if needed''

The tag list at the beginning of the section should list the elements in the order <anchor>, <seg>, and <ab>, and the discussion of the <anchor> element should be moved from the end of the discussion section, where it is currently lost, to the beginning.

The discussion of <seg> and <ab> should read: ``

The <seg> and <ab> elements can be used at the encoder's discretion to mark almost any segment of the text which is of interest for processing. One use of these elements is to mark text features for which these Guidelines otherwise provide no appropriate markup, i.e. as a simple extension mechanism. Another use is to provide an identifier for some segment which is to be pointed at by some other element, i.e. to provide a target, or a part of a target, for a <ptr> or other similar element.

Several examples of uses for the <seg> element are provided elsewhere ...

(Continue with current discussion of <seg> element.)

The remainder of this chapter contains a number of examples of the use of the <seg> element simply to provide an element to which an identifier may be attached, for example so that another segment may be linked or related to it in some way.

The <ab> element performs a similar function for portions of the text which occur not within paragraphs or other component-level elements, but at the component level themselves. It may be used, for example, to tag the canonical verse divisions of Biblical texts:

<div1 type='book' n='Gen'>
<head>The First Book of Moses, Called</head>
<head type='main'>Genesis</head>
<div2 type='chapter' n='1'>
<ab n='1'>In the beginning God created the heaven and the
<ab n='2'>And the earth was without form, and void; and darkness
<hi>was</hi> upon the face of the deep.  And the Spirit of God
moved upon the face of the waters.</ab>
<ab n='3'>And God said, Let there be light:  and there was
<!-- ... -->

In other cases, where the text clearly indicates paragraph divisions containing one or more verses, the <p> element may be used to tag the paragraphs, and the <seg> element used to subdivide them. The <ab> element is provided as an alternative to the <p> element;it may not be used within paragraphs. The <seg> element, by contrast, may appear only within and not between paragraphs (or anonymous block elements).

<div1 type='book' n='Gen'><head>Das Erste Buch Mose.</head>
<div2 type='chapter' n='1'>
<seg n='1'>Am Anfang schuff Gott Himel vnd Erden.</seg>
<seg n='2'>Vnd die Erde war wüst vnd leer / vnd es war
finster auff der Tieffe / Vnd der Geist Gottes schwebet auff
dem Wasser.</seg>
<seg n='3'>Vnd Gott sprach / Es werde Liecht / Vnd es ward
<!-- ... -->

The <ab> element is also useful for marking dramatic speeches when it is not clear whether the speech is to be regarded as prose or verse. If, for example, am encoder does not wish to express an opinion as to whether the opening lines of The Tempest are to be regarded as prose or as verse, they might be tagged as follows:

<div1 type=act n='I'> <div2 type=scene n='1'> 
<head rend=italic>Actus primus, Scena prima.</head> 
<stage type=setting rend=italic> A tempestuous noise of Thunder and Lightning heard:  Enter a Ship-master, and a Boteswaine.</stage>
 <sp><speaker>Master.</speaker><ab> Bote-swaine.</ab></sp> <sp><speaker>Botes.</speaker><ab> Heere Master: What cheere?</ab></sp> <sp><speaker>Mast.</speaker><ab> Good: Speake to th' Mariners: fall too't, yarely, or we run our selues a ground, bestirre, bestirre. <stage type=move>Exit.</stage></ab></sp> <stage type=move>Enter Mariners.</stage> <sp><speaker>Botes.</speaker> <ab>Heigh my hearts, cheerely, cheerely my harts: yare, yare: Take in the toppe-sale: Tend to th' Masters whistle: Blow till thou burst thy winde, if roome e-nough.</ab></sp>   
See further section 6.11.2, "Core Tags for Drama," on p. 212, and section 10.2.4, "Speech Contents," on p. 285).


References to <seg> in10.2.4 , such as the following ``or <seg> elements, in case of doubt as to whether the material should be treated as verse or prose.'' should be changed to refer to <ab>..

Section 14.3 should be renamed Segments, Blocks, and Anchors .

The declaration for <ab> should be

<!ELEMENT ab         - O  (%paraContent;)                    >
<!ATTLIST ab       ;
          subtype            CDATA               #IMPLIED
          TEIform            CDATA               'ab'        >

Automagically generated by lite2html on 5 Mar 1997