News from the Trenches
May 24, 2000
Last week's XML-Deviant introduced a debate at the W3C concerning issues with the Namespaces Recommendation and URIs. Since then, the discussion on the newly created xml-uri mailing list has exploded, with well over 400 messages being posted in the first week! (And this from many who were spending the week in Amsterdam at WWW9.)
Approaching such an in-depth discussion is not an easy task. This week's column poses some questions -- and attempts to provide some answers -- to help guide developers concerned over the current debate. (If you've not read last week's column yet, it's probably a good idea to give it a read before carrying on.)
Why are we having a debate?
We gave good coverage of the background to this debate in last week's column, "Namespace Trouble." What has become clear during subsequent discussion is that, before it became public, the issue was rigorously tackled by the XML Plenary (the collective group of all members of the W3C's XML activities). Despite that, as Joe Kesselman noted, a consensus still wasn't reached:
I think everyone on the plenary understood the issues and honestly disagreed anyway, and I was very impressed with the amount of time and skill invested in considering the alternatives.
It would appear this deadlock is what precipitated the now notorious "Pope 32767" leak to XML-DEV.
What's the problem?
The main point under debate is the use of relative URI references within namespace identifiers. There is a general consensus that they are a cause for concern, and that significant action is needed to address them. To recap, the Namespaces Recommendation says equality between namespace identifiers should be character-for-character string equality. However, this is in conflict with their essential "URI-ness" and brings unpredictability into any system that relies on dereferencing a relative namespace URI.
Tim Bray, co-editor of the Namespace Recommendation, has stated that he believes relative URIs are a bug in the recommendation:
... it is my view a huge bug that the Namespace Recommendation doesn't forbid the use of relative URI references.
The obvious reaction to finding a bug is to fix it. While this works for software, it's a little more complex with W3C Recommendations. There are already many documents in existence that make use of relative URIs in namespace declarations. Some Microsoft tools make use of relative URIs to reference intra-document schemas (i.e., the schema definition is given within the referencing document). This usage is quite legal, and well within the letter of the specification.
John Cowan, who has been leading much of the debate, has made it clear that any solution must not break these documents:
These documents were issued in conformance with a W3C Recommendation, which is supposed to be stable, and people are supposed to be able to rely on it.
Cowan has characterized the issue as the "Moral Problem." In a nutshell, any fixes must be backwards compatible.
We'll go on to look at proposed solutions, but first cover two points germane to the debate. The first is to define "absolutization," a technique featured in several solutions. Secondly, we'll look at the significance (if any) of a namespace URI to a processing application.
What does "absolutize" mean?
The desperately unwieldy term absolutization has been defined on the xml-uri list as "RFC2396-style relative reference resolution." The relevant section in RFC 2396 is Section 5.2. "Resolving Relative References to Absolute Form." Basically, the term describes the process of turning a relative URI into an absolute URI.
What does a namespace URI point to?
This is a common question, and has been raised again because the most common use of relative URIs is to reference a schema (for instance, in some of the Microsoft documents we mentioned above). Another way of asking the question would be "What happens when I dereference a Namespace URI?" In English, this means, "If I point my browser at a Namespace URI, what is returned?"
The answer: whatever you like.
The Namespace Recommendation leaves room for an individual application, or specification, to define what the namespace identifier signifies. It may be human-readable documentation, a schema, or even nothing. It is a point of debate whether or not this is useful, but there is some precedent for it. For example, the RDF specification says that
... the namespace name serves as the identifier for the corresponding RDF schema ... An RDF processor can expect to use the schema URI to access the schema content. This specification places no further requirements on the content that might be supplied at that URI, nor how (if at all) the URI might be modified to obtain alternate forms or a fragment of the schema.
So, in RDF, the namespace URI reference identifies the location of the RDF schema, but not necessarily the format of that schema. However, the W3C XML Schemas specification uses a different mechanism, based around an optional schemaLocation attribute. This solution avoids mandating the use of namespace URIs for locating schemas.
Additionally, to assume that a namespace URI dereferences to a single kind of resource is quite limiting. As Tim Bray has illustrated, there is a wide range of content that could potentially be retrieved:
I see a multiplicity of interesting schema facilities (XML-Schema, RDF Schema, Relax, Schematron, DTDs, more coming), a multiplicity of other interesting stuff about vocabularies that isn't captured by schemas (style sheet contents, Java classes, PICS ratings, tons and tons of interesting non-schema RDF), and for any one schema or other related resource, the versioning problem, the human-language problem, the application-segment problem (the authoring version vs. the database-load version) ... and a single URI is going to guide me through this swamp? I don't think so.
As Bray goes on to suggest, this is a packaging issue and one that could usefully be addressed in the short term. (See "Good Things Come in Small Packages" for related discussion.)
What are the proposed solutions?
The following three proposals have been discussed so far, and are extracted from an excellent summary by Joe Kesselman:
Proposal | Description |
---|---|
FORBID | Forbid relative URI references. All namespace identifiers would have to be absolute URI references. The most significant downside to this proposal is that it would break existing documents. |
ABSOLUTIZE | Relative references would be allowed, but would be absolutized by the XML processor before being seen by the application. The downside here is that the references are not stable: move the document and the namespace changes. This could have repercussions on how the document is handled by applications. |
LITERAL | This would make namespace names plain text strings, whose syntax conforms to that of URIs. Relative URIs would still be allowed. |
There are obviously more details to each proposal, but these are the key points. From certain perspectives, the different proposals can blur into one another. For example, a short term solution may be to follow the LITERAL option, with the qualification that relative URIs are to be phased out, and that ultimately all namespace identifiers will be absolute URI references -- which gives, in effect, the FORBID proposal.
Ignoring the philosophical debates about "What's in a Name?", and "What is a Resource?", etc., a bald statement of the problem could be as one of change management. There is a bug in an installed base of software (or documents in this case). How do we manage the roll-out of a bug fix? For the developer on the ground this is the main concern.
What does the debate mean to the XML developer?
If you're using relative URIs in your namespaces identifiers then you should probably reconsider whether or not they add real benefit. Tim Berners-Lee has presented a brief example of a potentially valid use for relative namespaces in "virtual" documents. These are documents created on the fly by a database and thus have no representation outside a database request.
Here are two questions to consider:
- If you're using a relative URI to reference a schema locally (which seems to be the most common usage), can you guarantee that the schema will always be distributed with the document?
- Secondly, if you're using a relative URI to reference a schema defined within your document, then can you guarantee that some XSLT transformation may not separate the schema and the content?
If the answer is "no" to either case, then using an absolute URI that references a fixed location is going to be a better option.
If you're not using relative URI references in namespaces, then you are largely immune to the debate. The Namespace Recommendation still stands, and it can be used safely.
As to the fall-out from this debate, it remains to be seen what will happen. It's possible that one or more W3C specifications will get revised. The Namespace Recommendation is the obvious one, although XPath, RDF, and the DOM all interact with namespaces in different ways and some revisions may have to propagate through the specifications.
If you're working on XML tools or parsers, then there may be changes required at various levels -- from the parser right through to DOM implementations. This shouldn't be a cause for undue concern, but it's something to be aware of.
How will it all end?
Some order needs to be imposed onto the xml-uri list shortly, to better propose a plan of action. It being the first time a W3C issue has been taken public in this manner, there is no defined process for how closure will be reached. Judging from current comments it would appear that even the W3C doesn't quite know how to end it yet. Michael Champion suggested that a poll would be extremely useful to determine the current course of the debate:
... I'll bet many of us would like to know quantitatively where people in the XML community, not just the W3C, stand on the issue.
It's by no means certain that the community at large will be any more effective at finding a clear way through the problem than the XML Plenary. Yet the public discussion has not been in vain. Although we've not covered "philosophical" issues in this article, the recent debate has revealed an interesting contrast in vision between the W3C core team and the XML community, and provided both groups with a useful mirror in which to see themselves.
Whatever the outcome, you can be sure that we'll be here to keep you informed!