Oxford University Computing Services
Conference report on XML Europe 2004
This report is about my attendance at the XML Europe 2004 conference, held in Amsterdam, 19-21 April 2004.
This is the sixth conference in this series which I have attended on and off since the mid 90s. It was held this year in conjunction with a Seybold meeting, which was mostly about PDF workflows, so far as I could see.
The meeting was held at the Amsterdam RAI centre, a concrete wasteland on the edge of the city; I stayed in an expectedly soulless nearby business hotel which lied about its networking. Aah, Holland, you can forgive them anything for the quality of their cycle lanes.
The conference (perhaps 200 delegates?) was spread over four parallel sessions, and a vendor exhibition shared with a Seybold seminar. Lunch was held with the exhibits, none of which delayed me for more than a cursory glance. The food was pretty bad, facilities were poor (no networking to speak of), and the organisation looked non-existent at times - weak chairing (some of the sessions I attended did not even have a chairman, apparently), and speakers were missing. There were no printed abstracts or proceedings, not even on a CD, and they have still not appeared on the web—bad show, boys! However, more or less all the papers I attended were interesting, so it was in fact rather more worthwhile than I had expected.
My experience started badly on the Monday morning with a demo and talk about XForms and Webservices from a Company call Export.Net. An XForms processor for Windows IE, using something called XFormation. They make a form partly driven by an XSD schema. The application reads a schema, asks which elements you want and makes a form. Um. Flaky demo indeed. He actually had to reboot! Hard to see the point, really.
Another way to re-use XML, to vary the usual routes to HTML or PDF. The customer requirement was to work with Word 2000, on the basis of a work flow in XML; they needed Word at the end for an unspecified reason, and wanted a cheap and easy solution, not using new software. This ruled out:
So they decided on a custom solution, using COM to build a document. It is free, takes advantage of Word power, and is known to work. They scripted it in Python, which has good COM and XML programming support. Basically, they walk over XML, and push data to Word via COM calls. COM understands documents, paragraphs. ranges of characters, and tables. Most block items are paras with style, with no concept of nesting. assumes pre-existing Word template for new styles.
A demonstration showed that it worked; a pretty simple idea, but it does the job. The performance is not brilliant, and the table implementation is simplistic. It does depend on having Word running on the server box.
WordML for typesetters. Mary McRae, DMSi (http://www.dmsi-world.com)
This time, XML inside Word 2003. The basic support involves Word ML, the higher-spec professional edition does XML editing. You can set up with your own schema, and add a WordXML Toolbox to provide new facilities. The WordML markup has a complete description of Word documents. When it it is all working properly, you can load a schema, and edit like XMetal. Or you can apply styles, save as plain XML or as Word. You can also have both your XML and Word in the same file, where the relationship is managed by transformations of Word ML, stored with multiple namespaces. The paper discussed Word styles and all their complexities in XML; it is a pretty flat model. and very full of attributes. A document can be associated with an `expansion pack' which contains stylesheets, schemas, templates, macros etc. which take effect when XML is loaded. An XSLT transformation takes the original, gets style, and adds formatting markup in the Word ML namespace. This game of mixed namespaces starts to make sense to me.
He wants sophisticated, intelligently searchable, web sites with multiple output formats. The application is his site about antiquarian books and images with lots of metadata. RDF is used for explaining what is going on in the pictures. Data includes things like a geopolitical database about English counties in 1907, and a database with info about physical image values. A CGI script creates XQuery which accesses the resources, runs the query and generates XHTML; it also does graphical display of results using SVG. Information about graphics held is a relational database is served as URL by a helper script; ideally XQuery would be able to access SQL as well. The whole idea was amusing enough. Liam liked strong datatyping in XQuery. He used the Qizx/open engine (http://www.xfra.net/qizxopen/).
The usual set of demos of how clever Cocoon is; maybe we should be using it. He says it is fun :-) It is built around
Basically a Java servlet which accesses repositories and serves up output as desired. The key feature is separating responsibilities into their own components. The sitemap joins things together. It even goes as far as transformation to Word (http://jakarta.apache.org/poi/plan/POI20Vision.html), though I'd take that with a pinch of salt.
All the pipelines do the same three actions
with plenty of cacheing at all stages. Generators can include file, http, XSP. JSP etc etc. Transformers include Trax, LDAP, SQL, and I18N tools. Serializers include (X)HTML, SVG, PNG, PDF, Excel, ZIP. There is matching of URIs in complex ways, selectiors for UA, etc. All good stuff. The have own form language, whole subset called Woody.
Eric talked about the The ISO structure for a family of schema languages (eg Relax NG, Schematron) and interfaces, No one language can cover all of the structure, datatyping and integrity constraints (including between documents, so beyond ID / IDREF), and business rules. W3C XML is too complex and tries to do too much (beyond validation, ie providing PSVI for other applications down the tool chain). So the ISO committee provides a choice of related standards.
Overall. it is a mish-mash of new stuff, and people trying to put the SGML kitchen sink back in. Will it be the Brabazon of our times?
Some interesting constraints in FO, ie some elements behave differently in different contexts depending on attribute values and on position — this may involve inheritance of attributes. They also described a nasty case of explicitly inherited attribute values using expressions. Calculated vallues using expressions in attributes cause problems. Some things like this are almost impossible to do in Relax NG fully; it needs a new datatype library plugged in.
Relax NG is a bit weak on error reporting. This is done better in XSLT validation. Their answer is a double validation; the first looks for serious errors, the second is for warning about less important stuff. Another answer may be doing Relax NG and XSLT for two stages.
Downloadable from http://xep.attic.com
Vincent Quint (http://wwww.fuchsia-design.com), SVG 1.2
SVG does rich rendering in vector and raster, with interactivity and animation. It is properly standards-based and device-independent. There is, for instance, demonstratable use on mobile phones for doing location mapping.
SVG 1.2 adds bits from other specs; now includes XPath, XML Events. Pure W3C XML, not like stuff like Microsoft (XAML) or Macromedia (Flash, SWF). They are also adding audio, transitions, video, and "pages" for printing. The latter is needed for cartoons, for instance. Some rather fun video demos, showing integration of video with normal SVG drawing.
For writing user interfaces, they have covered mouse capture, editable text, rendering hints, flowing text and graphics, and network API (eg cookies). XBL adds a clean way to specify extensions, which are encapsulated, recursive and re-useable.
W3C schema has the notion of degrees of validity, attached to every element and attribute in the infoset. Was validation attempted? Was the result completely OK, partially valid, or invalid? The value may also be "unknown", for an element which occurs unexpectedly. Eg if <A> allows <x>, <y> and <z> as its content model, but the instance has <x>, <y>, and <foo>, then <x> and <y> are still valid. We should be able to isolate this situation in the PSVI and check to see if it can be solved by another version of the schema. Alternatively, strip out the unknown nodes and see if that validates against the original schema. Then you can use your application OK which only processes against the original schema.
This all needs a pipeline to be able to describe the process; amazingly, there is a W3C note about pipelining (http://www.w3.org/Submission/2002/01/). Thompson's company, Markup Technologies (http://www.markup.co.uk), removes the dependency bits of that scheme, and implements a basic pipeline. No, sorry, I did not really get this. I don't quite see the need, or how it deals with anything other than versions which simply increment previous editions. Thompson himself admitted that there were all sorts of problems.
Eliot dealt with general issues of information being re-used; basically, a problem of technical documentation. These documents are quite simple but often have shared components. There is a short life cycle in some cases, very long cycles in others. Very personalized in yet other cases, even when a lot is common; multiple languages, too. It must often be highly accurate, and that implies document history.
XInclude (now a W3C CR, http://www.w3.org/TR/xinclude/) provides some help, as it allows content re-use by reference. It is easy to implement but can be more powerful than expected. It is important to differentiate between this, and entity syntactic references which are purely notation conveniences, solved by the XML parser. These cannot be addressed, are tied to context, and cannot be extended. XInclude is about semantic referencing; they are different, parseable objects which can be annotated. Link resolution happens after parsing. The <xinclude> element has two attributes, "xpointer" and "href". <fallback> is a child element which specifies what to do if the main include fails.
It is not so easy to get authors using this, or to provide for it in a DTD. Makes nonsense of the constraining rules of a DTD—there is no way to say what can be found at the other end of the link. Kimber makes new DTD elements which mimic and extend XInclude, to make it clear to authors what is happening; eg he includes a "reftype" attribute to specify what should arrive. From HyTime, he says. This raises the problem of links across compound documents, and separation of the ones in the compound doc from the genuine external ones. What about the ambiguity of links to places which might get included twice? Allowing for an author being able to preview what is going on, providing list of targets to authors, is hard.
Showed the language behind ReportLab (http://www.reportlab.com), which is something like XSL FO but simpler, used in dynamic, fast, PDF delivery. They find FO too slow and complex, they use a templating language instead of XSLT, targetting just high-level PDF. It does not go for complex typographical standards or things like writing directions. The system models pages, paragraphs, and stories, with high-level tags like "barcode". Sophisticated reuse of external PDF pictures. Clever stuff.
A practical study of designing DTDs for all the authoring of material for state government. Dale went all out for a DTD simple enough to use in real time and modular enough to make it useable in all areas of the workplace. Perhaps an old-fashioned talk about the "real world", but entertainingly delivered.
I cannot summarize this talk, as I was delivering it. However, the paper on which it was based is at http://www.tei-c.org/Activities/META/xmleurope2004.pdf. There was a good discussion afterwards led by Henry Thompson; people did seem to be interested in the ideas.
WSDL is the language layer where Web services are described; version 2.0 is targeted for mid-2005. Note the different layers:
WSDL 1.1 is proprietary, from IBM and Microsoft. 2.0 will be a pure W3C standard.
WSDL consists of:
used in two modes, RPC mode and "document" mode.
Alex went through ideas about XML, which was mildly entertaining. For instance, does not the FAQ about when or not to an attribute vs an element indicate that XML still is not right? http://www.pault.com/pault/pxml/xmlalternatives.html has list of XML alternatives; this paper is yet another. Alexwants a subset of XML, which leaves leaves out DTDs, comments, PIs, CDATA marked sections. Just elements, or did he allow attributes? Anyway. Do we really want a schism in XML? Alex proposes a new simplification. Hmm. I don't think this is the start of revolution, however much one may agree that XML is a bit weird. The fact that some groups are upset by XML probably shows it is OK.
They (Mark and Dan Connolly) are looking at adding RDF-like semantic information to XHTML directly (http://www.w3.org/MarkUp/2004/02/xhtml-rdf.html). Mark started by trying to explain RDF triples, but as usual I didn't really get it. Anyway, their proposal uses RDF to replace <meta> and <link> and make them much more global and useful. For instance, add a "content" attribute to <span> which is like that from old <meta>. or "property" eg <span property="dc.creator">Fred Flintstone</span>. Such things to relate to the nearest ID'ed parent (though this may cause problems) or otherwise the document as a whole. It can use an "about" attribute to point to something specific. People argued about how to identify objects in the world, but that is rather a different subject. <span content="2004-04-21" datatype="xsd:date">yesterday</span>. If adopted, this would make for richer websites (cool mouseovers), better searching, something for the semantic web.
The final plenary session was Edd Dumbill talking about where XML is going. The poor man is chair of this conference, so I guess he drew the short straw. Sadly, they changed the time of the talk by quite a long way, and I have to leave for the airport, so I didn't hear it. Luckily, the guts of it is at http://www.xml.com/pub/a/2004/04/21/state.html.
None, really. I had a stimulating time, and have got a small list of things to look at in the short and medium term. Where is XML going? Probably nowhere, as an abstract idea. It is a convenience, not a subject in its own right.