DSDL Examined
June 26, 2002
What is DSDL?
While researching this week's article, I was surprised to note that while DSDL has been referred to for some time in several forums, including XML-DEV and this column, there isn't much published information about the project, which is actually in a very embryonic stage. Some quality googling turned up a bit more information which I'll summarize here.
DSDL, as its home page notes, aims to "create a framework within which multiple validation tasks of different types can be applied to an XML document in order to achieve more complete validation results than just the application of a single technology". DSDL will provide a framework for mixing together multiple schema languages, which allows the user to avoid choosing a single language and to benefit from the strengths of several languages. Originally, as noted in Ken Holman's now dated DSDL Part 0 - Overview, the project encompassed all four of the major schema languages: RELAX NG, Schematron, DTD and W3C Schema.
However, as noted in my coverage of the XML Europe 2002 conference, DSDL has recently changed its focus from a general schema framework to one targeted specifically at publishing use cases. This involved adding several new modules to the framework, e.g. "Character repertoire validation", as well as the removal of W3C XML Schemas entirely. No more W3C XML Schemas Part 1. But Part 2 is likely to make an appearance within the Datatypes module, given that RELAX NG currently supports it.
Unfortunately there's not a great deal more information publicly available. The RELAX NG module ("Part 2 Grammar-based validation") has been published, although it's just a lightly edited version of the current RELAX NG specification. Schematron may not be far behind, although Rick Jelliffe has noted that there is likely to be at least one new feature: the ability to define variables to make XPath expressions more manageable. A dig through the meeting notes of the ISO/IEC JTC 1/SC 34 unearths the list of editors for each of the DSDL modules. Only DSDL Part 9 "Datatype- and namespace-aware DTDs" hasn't had an editor appointed.
XML-DEV recently debated the possibilities for Part 9, following John Cowan's posting of a "ragbag of ideas" which explored the possibility of adding namespace and data type support to DTDs by creating an enhanced syntax. Cowan also introduced the concept of a Validation DTD which would be applied to a document for validation purposes only (possibly in conjunction with a 'native DTD'); and, thus, which wouldn't allow attribute defaulting and similar features. The complete debate, which involved some of the finer points of DTD syntax, is too long to summarize here. However it highlighted the fact that this module also may never see the light of day. Ken Holman explained that the DSDL editors were still performing their "due-diligence" to unearth use cases that will determine whether the work is actually worthwhile. Holman noted that he'd be "...interested to hear from others if they think DTD syntax is even needed for the features we've quoted here ... or will people just rely on Part 2 (RELAX-NG) for their grammar needs?"
The DSDL Framework
The core of DSDL will be the Interoperability Framework (Part 1): the glue that binds together the other modules. This week Eric van der Vlist, who is the appointed editor of this section, and Rick Jelliffe have separately produced proposals that aim to explore these kind of framework structures in more detail. The two proposals, neither of which have any formal standing, take very different approaches to the same problem.
Van der Vlist's XML Validation Interoperability Framework (XVIF) takes the approach of embedding validation and transformation pipelines within another vocabulary. The specification and online demonstrator both show how this could be achieved by embedding the pipelines within a schema language, but in principle the XVIF is language-neutral so could be embedded within an XSLT transformation for example. XVIF elements just rely on their container to provide the context node on which they will interact. The embedded pipelines may generate other nodes or a simple boolean validation flag. Van der Vlist has produced a prototype that supports using pipelines containing XPath expressions, XSLT transformations, and manipulating content with simple regular expressions, or using Regular Fragmentations.
In contrast, Rick Jelliffe's proposal, "Schemachine" is closer to other pipeline frameworks such as XPipe and Cocoon in that the pipelines are defined by a separate vocabulary. In fact Jelliffe notes that the proposal borrows a lot from XPipe and Schematron in that it has a number of similar elements and structures, e.g. phases. Schemachine divides pipeline elements up into particular roles such as Selectors (e.g. XPath expressions), Tokenizers (e.g. Regular Fragmentations) and Validators (e.g. RELAX NG, Schematron). Jelliffe differentiated XVIF and Schemachine as "innies and outies".
Technology aside, the important aspect of these proposals is the intent: publicly exploring strawman proposals and implementations to gather feedback before considering standardization. That's a path which seems not only likely to produce viable results, but may actually deliver useful tools that others benefit from in the shorter term.
My Last XML-Deviant
After writing this column for a bit more than two years, this will be my last XML-Deviant column.
It was a hard decision to step down, but I want to spend more time with my family. Having recently become a father -- my son, Ethan, is nearly 9 months old -- I'm keen to spend less time at work. I'm also hoping to explore some personal XML-based projects that I've had on the back-burner for some time. XML has grown so quickly over the last few years that it's become increasingly hard to keep track of the interesting discussions happening in parallel on a multitude of mailing lists each week.
I've enjoyed writing the XML-Deviant column. Some of my favorite columns include reporting on the birth of RDDL; the Schemarama discussion, which no doubt set the stage for DSDL; community projects like the RDF Calendar Task Force; and helping to capture and document best practices, including how to choose a database for an XML application.
If I have any abiding annoyances, it's that progress on issues like the processing model and packaging hasn't been as rapid as it might have been, even though the issues seem to be well-understood. It's also a shame when ideas such as the Collected Works of SAX falter because of lack of time. I am hopeful, though, that summarizing debates and ideas might eventually contribute to some forward movement. I know that I've learned a great deal from writing this column, and I hope that others have benefited as well.
I'd like to thank Kendall Clark for editing the column, and Edd Dumbill for giving me the opportunity to write it in the first place. I'd also like to thank the members of the XML-DEV mailing list for invariably delivering enough juicy debate to fill this column each week. Keep up the good work.