Old Ghosts: XML Namespaces
January 10, 2001
While some of us were enjoying holiday celebrations, XML-DEV was haunted once more by that ghostly question, "what does a Namespace URI resolve to?" This time, however, the community was reluctant to descend into a two thousand message discussion. This article summarizes the promising progress made to date.
Negotiating Peace
To say that XML Namespaces have been hotly debated over the last two years is a major understatement. Much confusion has been the result of using URIs, or more commonly URLs, as the unique identifier for a Namespace. For many developers this naturally raises the question: "What does this URL resolve to?" In the Namespace FAQ you'll find that the answer is undefined. It can be something or nothing. The Namespace Recommendation is not forthcoming on what could or should be placed at a URL used as a Namespace identifier. Much debate has resolved around whether the specification should have gone further and mandated some behavior, or whether its job is complete in simply defining a means of uniquely identifying XML elements.
The current situation of undefined behavior has been described by Rick Jelliffe as a "lucky dip." The increasingly common practice of placing a schema for the namespace at the URI has been decried by many vocal members of the XML community, primarily because this doesn't take into account the many and varied schema languages currently in use.
Shortly before Christmas, XML-DEV returned to this topic once more, exploring a worrying possibility raised by Paul Tchistopolskii. In this scenario, if some popular tool were to define some behavior involved when dereferencing a Namespace URI, then this de facto usage would limit the possibility of standardizing the intended behavior in the future. A potentially serious loophole. Michael Champion dubbed it the "Tool X Horror Scenario."
The example that the "Tool X" discussion brings to my mind is Microsoft's implementation of a draft XSLT spec in IE5. Even though this was done with the best of intentions, and MS has worked very hard to provide updates that supported the XSLT Recommendation, IE5 became an "evil de facto implementation" that has caused immense confusion and additional work for the foot soldiers of XML.
I could easily imagine a similar round of confusion if some popular tool (IE6, or Xerces perhaps?) -- with the best of intentions, and in full compliance with the letter of the namespaces spec -- were to offer some added functionality *if* namespace URIs point to some specific type of object. (If you really want a nightmare scenario, imagine that it is some proprietary object, not one defined by an open standard.) As the history of HTML shows, the Ordinary Joe developers of the world care a lot more about the "standards" defined by the behavior of popular tools than by the words in some document on the W3C website.
Naturally this scenario prompted a lot of heated debate, threatening to reignite the Namespace discussion. Rick Jelliffe attempted to defuse the situation, suggesting that effort would better invested in discussing solutions to the problem, rather than debating the Namespace specification itself. Jelliffe colorfully named this the "Treaty of Wulai" after the hot springs he'd spent the day enjoying.
Jelliffe suggested how the issue might be resolved by proposing a convention to be used when dereferencing a Namespace URI.
For example, the convention could be that dereferencng the namespace URI (when it is an http:, at least) results in:
- a structural schema (XML Schema, DTD, or HTML documentation, or other schema like Schematron, RELAX, XDR, SOX, DSD, etc determined by content negotiation);
- a semantic schema (RDF Schema) also containing links to structural schema(s) according to a well-known convention;
- some definite kind of directory or resource discovery document, to be decided, which allows systematic retrieval of lots of different kinds of resource, including links to semantic and structural schemas;
- or nothing.
While many XML-DEV members couldn't resist the opportunity to do battle on the Namespace issue once more, for many the prospect of a peace treaty was very welcome.
Under the Radar
Of the four options presented by Jelliffe only the last two found supporters: deprecate dereferencing Namespace URIs or provide a resource discovery document. Paul Tchistopolskii in particular strongly urged deprecation.
I think it is not sane to keep this hole open. I think it will take years to understand what could *really* be pointed by that URL/URI and until that - let us close the door ? Right?
... Let us look at this hole. Whatever will be attached to that URI - it will affect almost every XML document in the world and this could be done at any point of time.
Tchistopolskii believed that closing the loophole was the important first step, giving the community time to fully debate the alternative options.
Tim Bray took the opposing view, believing that an XHTML based resource discovery format could be devised relatively quickly.
I think this would be the ideal kind of thing to retrieve when you dereference a namespace URI. Readable by humans and also machine-processable and fully extensible.
If I were feeling particularly grandiose, I'd also describe such a thing as a key building block for the Semantic Web.
Bray's plan found favor very quickly, and proposals rapidly began appearing. Dan Brickley had already observed that RSS 1.0 and Dublin Core use a similar technique (human-readable documentation referencing machine-processable schema). Simon St. Laurent made a quick XLink based proposal, Jonathan Borden contributed XMLCatalog, and Tim Bray himself proposed XML Namespace Related-resource Language. Debate quickly centered on the relative merits of the proposals, prompting further revisions. Sean Palmer began the winnowing process, synthesizing Borden and Brays separate proposals into XML Namespace Catalog Language.
There isn't space here to discuss the differences between each proposal, but, happily, they have rapidly stabilized into a clear favorite: Resource Discovery Description Language (RDDL).
Resource Discovery Description Language
RDDL is an extension of XHTML Basic. It provides a simple way to document your namespaces and provide links to additional resources useful when processing documents containing those Namespaces. As an XHTML-based specification it is directly browseable, so human-oriented documentation can also be included. The resources referenceable from an RDDL document are not restricted in any way and could include schemas (of varying types), CSS documents, stylesheets, executable code, etc. Michael Brennan provided an enthusiastic summary of the advantages of RDDL.
I'm very excited about RDDL. RDDL is simple, lightweight, easy to implement, and offers the added bonus that it is displayable in ordinary web browsers. The approach of placing human readable documentation at the end of the URL that also contains a machine readable catalog of other related resources is a perfect approach, IMO. I love this. This is great!
XML-DEV has begun discussing an API for RDDL which will provide a means to process RDDL documents from within XML applications. It should be noted that although the primary use case for RDDL is as the format retrieved when dereferencing a Namespace URI, it is not limited to that. It could be used when an application requires any additional resources while processing an XML document.
However there are still issues to be resolved. Notably, Rick Jelliffe has been urging that XML-DEV consider existing situations where Namespace URIs are being resolved directly to useful resources.
I don't see that RDDL "solves" the namespaces problem, because the problem is not "what is the best thing a namespace URL ought to point to?" but "how do we support what people are doing and want to do with namespace URIs?" I am not sure how RDDL can flourish if it ignores the people who actually are overloading the namespace URI for retrieving useful things now.
These are important to be addressed early in the evolution of RDDL. It should be possible to incorporate the use cases that Jelliffe is highlighting without significant changes to the RDDL proposal as it stands.
Open issues aside, the promising thing is that XML-DEV is at last talking about concrete solutions rather than indulging in Namespace flame wars. It's worth noting that the last project that garnered such immediate progress on XML-DEV was the SAX API. In this light, RDDL can be seen as a promising start to the New Year.