Oxford University Computing Services

Conference report on XML Europe 2004


Contents

This report is about my attendance at the XML Europe 2004 conference, held in Amsterdam, 19-21 April 2004.

1. History and setting

This is the sixth conference in this series which I have attended on and off since the mid 90s. It was held this year in conjunction with a Seybold meeting, which was mostly about PDF workflows, so far as I could see.

The meeting was held at the Amsterdam RAI centre, a concrete wasteland on the edge of the city; I stayed in an expectedly soulless nearby business hotel which lied about its networking. Aah, Holland, you can forgive them anything for the quality of their cycle lanes.

The conference (perhaps 200 delegates?) was spread over four parallel sessions, and a vendor exhibition shared with a Seybold seminar. Lunch was held with the exhibits, none of which delayed me for more than a cursory glance. The food was pretty bad, facilities were poor (no networking to speak of), and the organisation looked non-existent at times - weak chairing (some of the sessions I attended did not even have a chairman, apparently), and speakers were missing. There were no printed abstracts or proceedings, not even on a CD, and they have still not appeared on the web—bad show, boys! However, more or less all the papers I attended were interesting, so it was in fact rather more worthwhile than I had expected.

I used my Sharp Zaurus PDA for taking all notes, which worked surprisingly well—once I got the font big enough. Now I have to get the calendaring sorted.

My experience started badly on the Monday morning with a demo and talk about XForms and Webservices from a Company call Export.Net. An XForms processor for Windows IE, using something called XFormation. They make a form partly driven by an XSD schema. The application reads a schema, asks which elements you want and makes a form. Um. Flaky demo indeed. He actually had to reboot! Hard to see the point, really.

2. Presentations

Josh Reynolds, ISOGEN, Word documents from XML

Another way to re-use XML, to vary the usual routes to HTML or PDF. The customer requirement was to work with Word 2000, on the basis of a work flow in XML; they needed Word at the end for an unspecified reason, and wanted a cheap and easy solution, not using new software. This ruled out:

So they decided on a custom solution, using COM to build a document. It is free, takes advantage of Word power, and is known to work. They scripted it in Python, which has good COM and XML programming support. Basically, they walk over XML, and push data to Word via COM calls. COM understands documents, paragraphs. ranges of characters, and tables. Most block items are paras with style, with no concept of nesting. assumes pre-existing Word template for new styles.

The downside was that style was embedded in code. They changed to a preceding XSLT transformation which makes intermediate XML which matches the Word object model.

A demonstration showed that it worked; a pretty simple idea, but it does the job. The performance is not brilliant, and the table implementation is simplistic. It does depend on having Word running on the server box.

WordML for typesetters. Mary McRae, DMSi (http://www.dmsi-world.com)

(Delivered by Dale Waldt)

This time, XML inside Word 2003. The basic support involves Word ML, the higher-spec professional edition does XML editing. You can set up with your own schema, and add a WordXML Toolbox to provide new facilities. The WordML markup has a complete description of Word documents. When it it is all working properly, you can load a schema, and edit like XMetal. Or you can apply styles, save as plain XML or as Word. You can also have both your XML and Word in the same file, where the relationship is managed by transformations of Word ML, stored with multiple namespaces. The paper discussed Word styles and all their complexities in XML; it is a pretty flat model. and very full of attributes. A document can be associated with an `expansion pack' which contains stylesheets, schemas, templates, macros etc. which take effect when XML is loaded. An XSLT transformation takes the original, gets style, and adds formatting markup in the Word ML namespace. This game of mixed namespaces starts to make sense to me.

An interesting paper. Good to see Word 2003 XML in real life. This must be something we have to master in Oxford.

Liam Quin, Practical XQuery, RDF, XHTML, SVG and web site experiences

He wants sophisticated, intelligently searchable, web sites with multiple output formats. The application is his site about antiquarian books and images with lots of metadata. RDF is used for explaining what is going on in the pictures. Data includes things like a geopolitical database about English counties in 1907, and a database with info about physical image values. A CGI script creates XQuery which accesses the resources, runs the query and generates XHTML; it also does graphical display of results using SVG. Information about graphics held is a relational database is served as URL by a helper script; ideally XQuery would be able to access SQL as well. The whole idea was amusing enough. Liam liked strong datatyping in XQuery. He used the Qizx/open engine (http://www.xfra.net/qizxopen/).

Steven Noels, Belgian SME, Apache Cocoon

The usual set of demos of how clever Cocoon is; maybe we should be using it. He says it is fun :-) It is built around

Basically a Java servlet which accesses repositories and serves up output as desired. The key feature is separating responsibilities into their own components. The sitemap joins things together. It even goes as far as transformation to Word (http://jakarta.apache.org/poi/plan/POI20Vision.html), though I'd take that with a pinch of salt.

Avalon provides connection pooling. The configuration. state provided by `continuations', and there is portlet frameeork which does JSR-168.

All the pipelines do the same three actions

with plenty of cacheing at all stages. Generators can include file, http, XSP. JSP etc etc. Transformers include Trax, LDAP, SQL, and I18N tools. Serializers include (X)HTML, SVG, PNG, PDF, Excel, ZIP. There is matching of URIs in complex ways, selectiors for UA, etc. All good stuff. The have own form language, whole subset called Woody.

It looks as it this needs a lot of thought to get right, but we will have to try.

Eric van der Vlist, ISO DSDL

Eric talked about the The ISO structure for a family of schema languages (eg Relax NG, Schematron) and interfaces, No one language can cover all of the structure, datatyping and integrity constraints (including between documents, so beyond ID / IDREF), and business rules. W3C XML is too complex and tries to do too much (beyond validation, ie providing PSVI for other applications down the tool chain). So the ISO committee provides a choice of related standards.

DSDL contains:

  1. The overview
  2. Relax NG (identical to OASIS Relax NG). Interestingly, RelaxNG allows for ambiguous schemas, which W3C cannot allow because of the need to do PSVI. ie a document can be provably valid, but for several reasons; Relax NG does not commit itself to which route…
  3. Schematron (still work in progress). They will add rules for attribute, and support for languages other than XPath.
  4. Selection of validation candidates. Includes Clark's NRL. One reason is to specify different schemas for different namespaces; this would also cover rules for how they are allowed to be combined.
  5. Datatypes. To allow one to create new primitive types. Proposal from Jeni Tennison on the table, an XML notation for describing syntax of datatypes.
  6. Path-based integrity constraints (ID and IDREF, and beyond)
  7. Character Repertoire Validation. To specify rules for text.
  8. Declarative Document Architectures. Son of architectural forms.
  9. Namespace and Datatype-aware DTDs. Keep the old stuff alive.
  10. Validation Management. structure which says which schema to apply

Overall. it is a mish-mash of new stuff, and people trying to put the SGML kitchen sink back in. Will it be the Brabazon of our times?

Alexander Peshkov and David Tolpin, A Relax NG schema for XSL FO

Strangely, XSL FO never had a normative schema or DTD. XEP used to use DTD, then XSLT; they are now switching to Relax NG. It produces the fastest validation, is efficient and simple.

Some interesting constraints in FO, ie some elements behave differently in different contexts depending on attribute values and on position — this may involve inheritance of attributes. They also described a nasty case of explicitly inherited attribute values using expressions. Calculated vallues using expressions in attributes cause problems. Some things like this are almost impossible to do in Relax NG fully; it needs a new datatype library plugged in.

Relax NG is a bit weak on error reporting. This is done better in XSLT validation. Their answer is a double validation; the first looks for serious errors, the second is for warning about less important stuff. Another answer may be doing Relax NG and XSLT for two stages.

Downloadable from http://xep.attic.com

Really a surprisingly interesting and informative paper, which gave me lots of ideas for TEI schemas.

Vincent Quint (http://wwww.fuchsia-design.com), SVG 1.2

Vincent talk about using SVG to create an XML client. This makes for more interesting XML which does something; it defines a rendering of information, dynamically interpreted on the client.

SVG does rich rendering in vector and raster, with interactivity and animation. It is properly standards-based and device-independent. There is, for instance, demonstratable use on mobile phones for doing location mapping.

SVG 1.2 adds bits from other specs; now includes XPath, XML Events. Pure W3C XML, not like stuff like Microsoft (XAML) or Macromedia (Flash, SWF). They are also adding audio, transitions, video, and "pages" for printing. The latter is needed for cartoons, for instance. Some rather fun video demos, showing integration of video with normal SVG drawing.

For writing user interfaces, they have covered mouse capture, editable text, rendering hints, flowing text and graphics, and network API (eg cookies). XBL adds a clean way to specify extensions, which are encapsulated, recursive and re-useable.

All a bit breathless and exciteable. Is a client implementation there? Authoring support? Not quite.

Henry Thompson, W3C Schema and XML Pipelines for versioning

The issue is that of versions of schemas for a document type. Old code supporting old versions is hard to get rid of. Some schema authors prepare for versioning by wildcards, allowing for extension.

W3C schema has the notion of degrees of validity, attached to every element and attribute in the infoset. Was validation attempted? Was the result completely OK, partially valid, or invalid? The value may also be "unknown", for an element which occurs unexpectedly. Eg if <A> allows <x>, <y> and <z> as its content model, but the instance has <x>, <y>, and <foo>, then <x> and <y> are still valid. We should be able to isolate this situation in the PSVI and check to see if it can be solved by another version of the schema. Alternatively, strip out the unknown nodes and see if that validates against the original schema. Then you can use your application OK which only processes against the original schema.

This all needs a pipeline to be able to describe the process; amazingly, there is a W3C note about pipelining (http://www.w3.org/Submission/2002/01/). Thompson's company, Markup Technologies (http://www.markup.co.uk), removes the dependency bits of that scheme, and implements a basic pipeline. No, sorry, I did not really get this. I don't quite see the need, or how it deals with anything other than versions which simply increment previous editions. Thompson himself admitted that there were all sorts of problems.

Eliot Kimber (ISOGen), XInclude

Eliot dealt with general issues of information being re-used; basically, a problem of technical documentation. These documents are quite simple but often have shared components. There is a short life cycle in some cases, very long cycles in others. Very personalized in yet other cases, even when a lot is common; multiple languages, too. It must often be highly accurate, and that implies document history.

We want to maximize efficiency of this business because technical docs generate no revenue; but re-use is hard and expensive. The author interface must be very good.

XInclude (now a W3C CR, http://www.w3.org/TR/xinclude/) provides some help, as it allows content re-use by reference. It is easy to implement but can be more powerful than expected. It is important to differentiate between this, and entity syntactic references which are purely notation conveniences, solved by the XML parser. These cannot be addressed, are tied to context, and cannot be extended. XInclude is about semantic referencing; they are different, parseable objects which can be annotated. Link resolution happens after parsing. The <xinclude> element has two attributes, "xpointer" and "href". <fallback> is a child element which specifies what to do if the main include fails.

It is not so easy to get authors using this, or to provide for it in a DTD. Makes nonsense of the constraining rules of a DTD—there is no way to say what can be found at the other end of the link. Kimber makes new DTD elements which mimic and extend XInclude, to make it clear to authors what is happening; eg he includes a "reftype" attribute to specify what should arrive. From HyTime, he says. This raises the problem of links across compound documents, and separation of the ones in the compound doc from the genuine external ones. What about the ambiguity of links to places which might get included twice? Allowing for an author being able to preview what is going on, providing list of targets to authors, is hard.

In his demo Kimber resolved xincludes using an XSLT transformation.

Phew. Much scarier than one would have thought. I hope I never have to do this for real.

Andy Robinson, ReportLab

Showed the language behind ReportLab (http://www.reportlab.com), which is something like XSL FO but simpler, used in dynamic, fast, PDF delivery. They find FO too slow and complex, they use a templating language instead of XSLT, targetting just high-level PDF. It does not go for complex typographical standards or things like writing directions. The system models pages, paragraphs, and stories, with high-level tags like "barcode". Sophisticated reuse of external PDF pictures. Clever stuff.

Dale Waldt, Simplifying DTDs for the Minnesota State Legislature.

A practical study of designing DTDs for all the authoring of material for state government. Dale went all out for a DTD simple enough to use in real time and modular enough to make it useable in all areas of the workplace. Perhaps an old-fashioned talk about the "real world", but entertainingly delivered.

Sebastian Rahtz, Norm Walsh, and Lou Burnard, Combining TEI and Docbook

I cannot summarize this talk, as I was delivering it. However, the paper on which it was based is at http://www.tei-c.org/Activities/META/xmleurope2004.pdf. There was a good discussion afterwards led by Henry Thompson; people did seem to be interested in the ideas.

Jean-Jacques Moreau (Canon France), WSDL 2.0

WSDL is the language layer where Web services are described; version 2.0 is targeted for mid-2005. Note the different layers:

SOAP
language for message exchange
UDDI
language for searching for services
WSDL
language for describing a given service

WSDL 1.1 is proprietary, from IBM and Microsoft. 2.0 will be a pure W3C standard.

WSDL consists of:

used in two modes, RPC mode and "document" mode.

All rather dry, and I was not really understanding it.

Alex Brown, Refactoring XML

Alex went through ideas about XML, which was mildly entertaining. For instance, does not the FAQ about when or not to an attribute vs an element indicate that XML still is not right? http://www.pault.com/pault/pxml/xmlalternatives.html has list of XML alternatives; this paper is yet another. Alexwants a subset of XML, which leaves leaves out DTDs, comments, PIs, CDATA marked sections. Just elements, or did he allow attributes? Anyway. Do we really want a schism in XML? Alex proposes a new simplification. Hmm. I don't think this is the start of revolution, however much one may agree that XML is a bit weird. The fact that some groups are upset by XML probably shows it is OK.

The audience was unconvinced; Henry Thompson blew it away. The good thing about XML is that it was designed by committee.

Mark Birbeck (x-Port), RDF and XHTML

They (Mark and Dan Connolly) are looking at adding RDF-like semantic information to XHTML directly (http://www.w3.org/MarkUp/2004/02/xhtml-rdf.html). Mark started by trying to explain RDF triples, but as usual I didn't really get it. Anyway, their proposal uses RDF to replace <meta> and <link> and make them much more global and useful. For instance, add a "content" attribute to <span> which is like that from old <meta>. or "property" eg <span property="dc.creator">Fred Flintstone</span>. Such things to relate to the nearest ID'ed parent (though this may cause problems) or otherwise the document as a whole. It can use an "about" attribute to point to something specific. People argued about how to identify objects in the world, but that is rather a different subject. <span content="2004-04-21" datatype="xsd:date">yesterday</span>. If adopted, this would make for richer websites (cool mouseovers), better searching, something for the semantic web.

Liam Quin made interesting criticism about use of namespaces in attribute values. Not clear where to go with that.

Good interesting stuff but far from finished.

The final plenary session was Edd Dumbill talking about where XML is going. The poor man is chair of this conference, so I guess he drew the short straw. Sadly, they changed the time of the talk by quite a long way, and I have to leave for the airport, so I didn't hear it. Luckily, the guts of it is at http://www.xml.com/pub/a/2004/04/21/state.html.

3. Conclusions

None, really. I had a stimulating time, and have got a small list of things to look at in the short and medium term. Where is XML going? Probably nowhere, as an abstract idea. It is a convenience, not a subject in its own right.



Date: April 23rd 2004 (revised 06/05/2002) Author: Sebastian Rahtz (revised : rahtz).
This page is copyrighted