<?xml version="1.0"?>
<?xml-stylesheet href="teixlite.css" type="text/css"?>
<!DOCTYPE TEI.2 SYSTEM "teixlite.dtd">
<TEI.2>
  <teiHeader>
<fileDesc>
<titleStmt>
  <title>XML: Fulfilling the SGML dream</title>
  <author>Lou Burnard</author>
</titleStmt>
<publicationStmt>
  <p>Distributed by the author</p>
</publicationStmt>
<sourceDesc>
  <p>Paper to be presented at the Online Information Conference,
London, December 1999.</p>
</sourceDesc>
</fileDesc>
<revisionDesc>
<list>
  <item>
<date>10 Oct 1999</date>First draft</item>
</list>
</revisionDesc>
  </teiHeader>
  <text>
<body>
<head>XML: fulfilling the SGML dream</head>

<div type="foo">
<head>Introduction</head>
<p>Let me start by reminding you about the SGML dream. It had three chief
ingredients: 
<list>
  <item>the ability to share data between applications </item>
  <item>the ability for users to regain and retain control of data</item>
  <item>and, putting those together, the ability to proceed towards a
seamless integration of data resources, in a new kind of processing
interlingua, a digital demotic.</item>
</list></p>

<p>But the way was hard, and the path was only for the upright in
spirit, and financially unchallenged. Then, round about 1994 or 1995,
something rather unusual happened to an obscure experiment in
scientific documentation methods at a well known European Research
Centre in Switzerland. Suddenly, before anyone quite understood what
was happening, we were all ensnared in a world wide web.</p>

<p>I think it took a while for people to realise what this meant: for
example, at the SGML Europe conference in 1996, the major talking
points were not the decision (taken a year earlier) to open up the
internet to commercial competition, not the publication of HTML 3.2,
but rather the publication of DSSSL; and some reorganization and
realignment of various competing areas of HyTime and DSSSL activities,
notably the definition of the Standard Document Query Language and of
the HyTime "general facilities" (aka the useful bits: architectural
forms, property sets, groves, formal system identifiers etc).</p>

<p>The web was mentioned of course, but am I alone in remembering that
there seems to have been a little reluctance amongst the SGML
illuminati to invite this ill-bred grubby infant into the parlour?
HTML, it was easy to point out,was only retrospectively an SGML
application. If only it had been done <hi>right</hi>! but of
course those dreadful web heads would never listen to reason...</p>

<p>It's worth remembering also that the original objective of the XML
effort, which came along shortly afterwards, was purely and simply to
enable the delivery of SGML over the web. But in achieving this goal,
almost accidentally, Bosak's Boys and Girls found themselves
refocussing the whole of web development: effectively rewriting the
rules of the game by adding intelligence to data. It's worth wondering
why that came about: what underlies the unbelievable success of this
wonderful after thought.</p>

<p>XML has exactly the same stated goals as SGML (that's why it's such
a great way of re-purposing old SGML talks and training materials): it
gives users control of their data, by allowing them to define their
own markup language in a way that can be formally verified; it makes
sharing of data between applications simple by adding an abstraction
layer (a data model, if you like); it strongly encourages the
development of semantically meaningful data models by separating data
processing from data representation; it unlocks the information buried
in plain text files, and so on. You heard all this in the "SGML for
beginners" course you went to, and were excited by -- until you
started trying to use the SGML tools.</p>

<p>But XML grafted on to those important principles a few important
additional priorities which the developers of SGML had not perhaps
considered so important. XML was designed from the start to do without
the perceived obscurities and rococo complexities of SGML. Producing a
fully conformant validating SGML parser (never mind an application)
requires thousands of lines of code and a team of skilled programmers.
Producing a fully conformant validating XML parser is a viable
assignment for a computer science graduate, while producing useful XML
applications is something even I can manage.</p>

<p>If XML was the right idea it also had the good fortune to happen at
just the right time during the development of the Web -- it may be
hard for some to remember that without XML we might well have seen the
greatest strength of the web disappear as a consequence of an
uncontrolled and uncontrollable proliferation of vendor-specific
dialects of HTML, XML was also blessed in gaining the co-operation of
all the major industry players without the dominance of any one of
them, in itself a fairly remarkable achievement. Which is not to deny
the value of the more or less unprecedented amount of co-operative
human energy which was lovingly poured into its creation during the 18
hectic months of its evolution by some of the best brains of the
planet.</p>




<div><head>The XML Social Agenda</head>
<p>For some of you I am sure that an important goal is business as
usual: at a recent conference, I heard that using XML had been
found to confer the following three benefits: 
<list>
  <item>reduction in production time</item>
  <item>reduction in time to market</item>
  <item>and some improvement in quality of service</item>
</list>all of which sounds fine. But as an incorrigible dreamer, I found
myself  wondering whether the
points really should have been placed in that order of priority. </p>

<p>After all, XML is not just about exchanging data between
machines. It's also about communication between humans. XML is not
just about the web. It's about information in general. XML is not just
about technology. It's also about the social and political
relationship between content creators and software vendors. </p> 

<p>As I said earlier, there is a key part of the SGML dream which has
to do with user ownership of content, and that is something which the
social agenda of XML has inherited. By enabling, in a practical and
visible way, the elusive freedom from proprietary data formats, vendor
neutrality, platform neutrality, and language neutrality, to which all
Open systems vendors give lip service, XML has brought back on to the
agenda some rather interesting and important economic and political
considerations. </p>

<p>Jon Bosak has pointed out that, if realised, the dream-team
combination combination of XML and XSL could easily replace all
existing word-processing and publishing formats. This kind of change,
the practical liberation of users from a combination of proprietary
formats, should mean an end to domination of the market by a few big
companies, and an end to domination of the market by a few big
countries. What can prevent that? Clearly companies whose business
models have been built on control through proprietary formats can be
expected to resist it. But other companies, with more flexible
business models, more in touch with the technological realities of
today will replace them as surely as small furry creatures replaced
large cold-blooded ones during the ice ages. The information business
thrives on change; systems that facilitate change and flexibility
therefore stand a better chance of survival than those which do
not.</p>

<p>There is a problem here: to say that the XML agenda is one of
user-empowerment is easy enough. But empowerment is not so easily
achieved.  It's not unusual to encounter significant resistance to the
idea that the user should be able to take control from users
themselves: empowerment is not quite such an easy sell when your
clientele really wants a completely packaged solution. That's why the
production of cheap and easily customised tools should be given a much
higher priority within our community: only with tools that offer the
full power of XML to the mass market will that market be able to
develop. We should question marketing factoids about the need to dumb
down our software. Software for the masses doesn't have to be
feature- and complexity- free: you heard it here first.</p> 

<p>Turning to a question which maybe has exercised some of you for a
long time: why exactly is it that SGML and XML have not been
universally taken up as the technologies of choice for the digital
library? After all, librarians have been trying to set information
free from proprietary forms (that's books, to you and me) for
centuries; they were the first (some might say the best)
infomediaries, and they were the first to develop really powerful
platform-independent metadata repositories which could communicate in
a reliable way. (that's catalogues and interlibrary loan to you and
me). So why do all librarians swear by (and occasionally at) the
decidedly non-standard MARC standard, developed in the 1970s, and
vigorously maintained ever since rather than migrate to more modern
interchange formats? The answer, of course, is that the
conversion-cost is simply not warranted once you have a tool that does
the job -- however unfashionably. What benefit is there in "going-XML"
for the librarian who has already made massive investment in a well
tested and debugged solution to the same problems? Only by focussing
on the value-added, on areas where solutions don't already exist, will
XML will make its case: as witness the fact that the most enthusiastic
proponents of XML in the library community are those concerned with
areas that traditional library systems handle only grudgingly or
incompletely -- such as full text electronic libraries, repositories
of digitized images and archival document descriptions. </p>

<p>A new term has emerged for something that XML makes easier:
<term>data warehousing</term>. Now, like many other metaphors,
this is an insidious one: it sounds alluring, but it leads you
astray. Data is not really something you keep in a warehouse: a
warehouse is designed to keep commodities like bicycles or fruit safe
until you decide to take them out and sell them.  Once out of the
warehouse they are gone. But is that true of data too? if I sell you
an apple, I don't have it to sell to someone else anymore. But if I
sell you access to my data -- you have it and I <hi>still</hi>have
it. Is this metaphor helping us understand the way information should
be managed in the next millenium, or is it getting in the way?</p>
</div>

<div><head>Does SGML have a future?</head>

<p>Everything (or almost) said so  far applies equally  to SGML and to
XML.  I'd like  to close by trying  to pick at the differences between
the two: in particular,  the technical considerations which might make
one more appropriate for a given project than another.</p>

<p>I stressed earlier the simplicity and ease of use of XML. This does
not come without a price. To achieve it, the designers of XML had to
discard a number of SGML features, nearly all of them relating to the
metalanguage itself. Consequently, although the differences between
SGML and XML documents are mostly trivial surface features such as the
requirement for end-tags, and the insistence on a single concrete
syntax, the differences between SGML and XML document type definitions
are far from trivial.</p>

<p>A valid XML document can be processed perfectly satisfactorily
without any separate document type definition -- this is one of the
benefits of insisting on a single concrete syntax. An SGML document
however cannot be understood without its accompanying DTD -- this is
the downside of permitting not only variant concrete syntaxes, but
also such seemingly abstruse features as SGML exceptions,
minimization, marked sections, and the SGML optional features.</p>

<p>These features are not however mere esoterica. They have an
important role to play in several important application areas. In an
intensive document production environment, where highly complex
document structures must be enforced rigidly, and where economies of
scale are of particular importance, several of these SGML features
have proved their worth, and cannot be lightly abandoned. For example,
the use of inclusion and exclusion exceptions in a document model
greatly reduces the complexity and manageability of a dtd, while
optional marked sections are an excellent way of coping with the
management of shared document production. Similarly, as long as we
have to manage legacy data, we will need to be able to cope with
variant character encoding schemes and variant markup syntaxes.</p>

<p>In terms of software, while it is true that new XML tools and
applications are announced daily, it is far from true that SGML tools
and applications have disappeared from the market.  On the contrary,
most SGML products have now reached a degree of maturity and stability
XML vendors can only envy.</p>

<p>It is also important to remember that neither SGML or XML is
standing still: the critical question to the survival of both is
perhaps the degree to which their present high degree of compatibility
(a declared goal of both) can be retained. With the development of the
XML Schema language, for example, XML applications will gain
datatyping features long perceived as lacking in SGML. At the same
time, such SGML concepts as architectural forms and concurrent
hierarchies look set to transform the way in which we realize the SGML
dream.</p>


</div>

  

</div></body></text></TEI.2>
