Namespace Nuances

July 5, 2001

Q: I can't validate a document using namespaces, can I?

I'm trying to validate an XML file. It uses XML namespaces, but I can't figure out how to express them inside the DTD. Here's a sample XML document:

<?xml version="1.0"?> <!DOCTYPE checkbook SYSTEM "checkbook.dtd"> <checkbook xmlns:f="http://schemas.ar-ent.net/soap/file/" xmlns:s="http://schemas.xmlsoap.org/soap/envelope/" xmlns:m="http://schemas.ar-ent.net/test/soap.tr/checkbook/" xmlns:ars="http://schemas.ar-ent.net/soap/"> <f:deposit type="direct-deposit"> <payor>Bob's Bolts</payor> <amount>987.32</amount> <date>21-6-00</date> <description category="income">Paycheck</description> </f:deposit> </checkbook>

And here's a portion of the DTD, covering the markup included above,

And this is error message I'm getting:

Unknown element 'f:deposit'

A: I can see why you're frustrated. It certainly appears as though you've got everything accounted for in your document, with all those namespace declarations.

To help resolve your problem, let's review the basics of namespaces and their declarations, covering only what you need to know to deal with this particular problem. (A terrific resource for all kinds of questions about namespaces is Ron Bourret's XML Namespaces FAQ.)

Why namespaces?

Namespaces enable you to mix, in one XML document, element (and sometimes attribute) names from more than one XML vocabulary. Let's assume you've got a document in some vocabulary which looks, in part, like the following:

<furniture> <table material="mahogany" type="dining"/> <chair material="mahogany" type="dining"/> <chair material="mahogany" type="dining"/> <lamp material="brass" type="chandelier"/> </furniture>

You show this to your boss at the furniture warehouse. He's not exactly the brightest bulb in the lamp, but he is your boss, and he says, "Well, that's okay. But I really want you to enclose all those individual types of furniture in a table."

"A table?" you ask.

"Sure. Like in a Web page. Rows and columns. A table."

What he's looking for, in short, would be something like this:

<furniture> <table> <tr> <td><table material="mahogany" type="dining"/></td> <td><chair material="mahogany" type="dining"/></td> <td><chair material="mahogany" type="dining"/></td> <td><lamp material="brass" type="chandelier"/></td> </tr> </table> </furniture>

See the problem? You've got two element types, representing two different things, both called table. You need to disambiguate the names -- that is, make it clear which kind of table element you're referring to at any point in the document.

Declaring a Namespace

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

To declare a namespace is to declare which vocabulary an element name comes from. The specific device for doing so is a special attribute, xmlns, which can be placed on any element in a document. This attribute takes the form:
xmlns:prefix="namespaceURI"
This is what we refer to when we speak of a "namespace declaration." As for the pieces of the declaration:

xmlns is required; it identifies this as an XML namespace declaration.
:prefix (note the leading colon) is optional. If you include it, all element names in the document from the indicated namespace (vocabulary) must be prefixed with these characters, followed by a colon. (We'll see examples in a moment.)
namespaceURI is required. It uniquely identifies a namespace in this document and perhaps in others. The term "URI" here is a little misleading; although it looks like a familiar Web URI, it needn't actually "point to" anything in particular. Even many of the standard W3C namespace URIs don't locate a document, like a DTD, which formally describes the vocabulary. The important feature of the namespace URI is that it be unique among all namespace URIs in the document.

For the furniture example above, you could do something like the following (changes in boldface):

<furn:furniture xmlns:furn="http://myfurn/namespace" xmlns="http://www.w3.org/1999/xhtml"> <table> <tr> <td><furn:table material="mahogany" type="dining"/></td> <td><furn:chair material="mahogany" type="dining"/></td> <td><furn:chair material="mahogany" type="dining"/></td> <td><furn:lamp material="brass" type="chandelier"/></td> </tr> </table> </furn:furniture>

Now the furniture element and all its descendants will "know about" the furn namespace prefix. (Namespaces, like special attributes such as xml:lang, are in effect for the scope of whatever element declares them, unless redefined by some descendant.) The second namespace declaration asserts that all unprefixed element names in the document also come from a particular namespace (XHTML 1.0, in this case). Consequently, any namespace-aware application which processes this document will recognize two distinct types of table element in this document: one from the furniture-related vocabulary and one from XHTML 1.0.

Complications ensue...

You may have picked up on a couple of odd, unexplained features of the preceding document.

Once you start using namespaces in a particular document, you must commit to going the whole way. In theory, only the names of the two table element types needed to be disambiguated. In practice, though, you use namespaces to disambiguate entire vocabularies -- even the names of elements, like td and chair above, which are already unambiguous. Thus, if you decide to require that the furniture-type table have a furn: prefix, you're committed to using that prefix on the names of the furniture, chair, and lamp elements as well.

As previously noted, a particular namespace prefix's associated URI need not be the "web address" of anything in particular. There is nothing at the URI http://myfurn/namespace (except a "document not found" error). On the other hand, there's definitely something at the http://www.w3.org/1999/xhtml URI associated with the "empty namespace prefix." (I'll leave for you the exercise of inspecting that "something.")

But the above document introduces some more profound questions.

Deeper mysteries

The first mystery is that the document above no longer contains two distinct elements named table. It now contains a table element, and a furn:table element. The prefix is part of the element name.

The second mystery is the real killer, and it's the reason why the original questioner is having trouble with the checkbook application. If you mix element types from two different vocabularies, how can you possibly validate a document at all, given that a valid document may contain no more than one DOCTYPE declaration, referencing no more than one DTD?

The answer is weird but also (once you think about it) obvious. Either (a) you can't validate it at all, or (b) you can validate it only if you include, in the one referenced DTD, all element names -- including their prefixes and all namespace-declaring attributes.

Case (a) isn't as outlandish an option as you might imagine. It's one of the most common solutions, thanks in part to XSLT's popularity. An XSLT style sheet must contain elements from the XSLT vocabulary, such as xsl:stylesheet and xsl:template, and these are intermingled in the stylesheet with elements from the result tree vocabulary. Validating an XSLT style sheet is a remote -- but only remote -- possibility. The whole thing works wonderfully using the simpler alternative of well-formedness.

(For some reason, case (a) seems to drive many otherwise sane users of XML absolutely batty: "If I can't validate a document, how do I know it's correct?" This has never bothered me because in terms of XML 1.0 well-formedness is just as "correct" as validity. If a document works in an application that needs to use the document, who cares if it works in the framework of some other arbitrary application -- like a validating parser?)

The question at hand

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

For starters, the XML document with which this whole discussion opened is a little strange -- given what you now know about namespaces. Its root element, checkbook, declares four namespace prefixes and their associated URIs: f, s, m, and ars. Of these, only one is actually used anywhere in the document: f, on the f:deposit element. Furthermore, there is no namespace declaration for the "empty prefix" -- which is actually, by default, implicit in the names of all other elements in the document (amount, date, and so on).

Let's assume that validation must be achieved somehow, that simple well-formedness won't suffice. Let's also assume that the original document is a fragment of a more complete one, which actually does at some point need to use the s, m, and ars prefixes as well as f. Here's how the fragment of a DTD above, way back at the beginning, could be modified to accommodate both validation and namespaces.

<!ELEMENT checkbook (f:deposit|payment)*> <!ATTLIST checkbook xmlns:f CDATA #FIXED "http://schemas.ar-ent.net/soap/file/" xmlns:s CDATA #FIXED "http://schemas.xmlsoap.org/soap/envelope/" xmlns:m CDATA #FIXED "http://schemas.ar-ent.net/test/soap.tr/checkbook/" xmlns:ars CDATA #FIXED "http://schemas.ar-ent.net/soap/" xmlns CDATA #FIXED "http://mycheckbookURI"> <!ELEMENT f:deposit (payor, amount, date, description?)> <!ATTLIST f:deposit type (cash|check|direct-deposit|transfer) #REQUIRED> <!ELEMENT amount (#PCDATA)> <!ELEMENT date (#PCDATA)> <!ELEMENT payor (#PCDATA)> <!ELEMENT description (#PCDATA)> <!ATTLIST description category (cash|entertainment|food|income|work) 'food'>

Now your application will find an element named f:deposit in the DTD, whereas before the DTD declared only an element named deposit (no prefix). And now the rest of the document can use any of the four explicit prefixes on any element name, as long as those names, including prefixes, are declared in the DTD. If an element named s:envelope appears in the document, an element named s:envelope must be declared in the DTD. A declaration for a simple envelope element won't suffice.

Simple? Probably not. Possible? You bet.