Namespace Nuances
July 5, 2001
Q: I can't validate a document using namespaces, can I?
I'm trying to validate an XML file. It uses XML namespaces, but I can't figure out how to express them inside the DTD. Here's a sample XML document:
<?xml version="1.0"?>
<!DOCTYPE checkbook SYSTEM
"checkbook.dtd">
<checkbook xmlns:f="http://schemas.ar-ent.net/soap/file/"
xmlns:s="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:m="http://schemas.ar-ent.net/test/soap.tr/checkbook/"
xmlns:ars="http://schemas.ar-ent.net/soap/">
<f:deposit
type="direct-deposit">
<payor>Bob's Bolts</payor>
<amount>987.32</amount>
<date>21-6-00</date>
<description category="income">Paycheck</description>
</f:deposit>
</checkbook>
And here's a portion of the DTD, covering the markup included above,
<!ELEMENT checkbook (deposit|payment)*>
<!ELEMENT deposit (payor,
amount, date, description?)>
<!ATTLIST deposit
type
(cash|check|direct-deposit|transfer) #REQUIRED>
<!ELEMENT amount
(#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT payor
(#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description
category (cash|entertainment|food|income|work) 'food'>
And this is error message I'm getting:
Unknown element 'f:deposit'
A: I can see why you're frustrated. It certainly appears as though you've got everything accounted for in your document, with all those namespace declarations.
To help resolve your problem, let's review the basics of namespaces and their declarations, covering only what you need to know to deal with this particular problem. (A terrific resource for all kinds of questions about namespaces is Ron Bourret's XML Namespaces FAQ.)
Why namespaces?
Namespaces enable you to mix, in one XML document, element (and sometimes attribute) names from more than one XML vocabulary. Let's assume you've got a document in some vocabulary which looks, in part, like the following:
<furniture>
<table material="mahogany" type="dining"/>
<chair
material="mahogany" type="dining"/>
<chair material="mahogany"
type="dining"/>
<lamp material="brass" type="chandelier"/>
</furniture>
You show this to your boss at the furniture warehouse. He's not exactly the brightest bulb in the lamp, but he is your boss, and he says, "Well, that's okay. But I really want you to enclose all those individual types of furniture in a table."
"A table?" you ask.
"Sure. Like in a Web page. Rows and columns. A table."
What he's looking for, in short, would be something like this:
<furniture>
<table>
<tr>
<td><table
material="mahogany" type="dining"/></td>
<td><chair
material="mahogany" type="dining"/></td>
<td><chair
material="mahogany" type="dining"/></td>
<td><lamp material="brass"
type="chandelier"/></td>
</tr>
</table>
</furniture>
See the problem? You've got two element types, representing two different things,
both
called table
. You need to disambiguate the names -- that is, make it
clear which kind of table
element you're referring to at any point in the
document.
Declaring a Namespace
Also in XML Q&A |
|
To declare a namespace is to declare which vocabulary an element name comes from.
The
specific device for doing so is a special attribute, xmlns
, which can be placed
on any element in a document. This attribute takes the form:
xmlns:prefix="namespaceURI"
This is what we refer to when we
speak of a "namespace declaration." As for the pieces of the declaration:
xmlns
is required; it identifies this as an XML namespace declaration.:prefix
(note the leading colon) is optional. If you include it, all element names in the document from the indicated namespace (vocabulary) must be prefixed with these characters, followed by a colon. (We'll see examples in a moment.)namespaceURI
is required. It uniquely identifies a namespace in this document and perhaps in others. The term "URI" here is a little misleading; although it looks like a familiar Web URI, it needn't actually "point to" anything in particular. Even many of the standard W3C namespace URIs don't locate a document, like a DTD, which formally describes the vocabulary. The important feature of the namespace URI is that it be unique among all namespace URIs in the document.
For the furniture example above, you could do something like the following (changes in boldface):
<furn:furniture xmlns:furn="http://myfurn/namespace"
xmlns="http://www.w3.org/1999/xhtml">
<table>
<tr>
<td><furn:table material="mahogany" type="dining"/></td>
<td><furn:chair material="mahogany" type="dining"/></td>
<td><furn:chair material="mahogany" type="dining"/></td>
<td><furn:lamp material="brass" type="chandelier"/></td>
</tr>
</table>
</furn:furniture>
Now the furniture
element and all its descendants will "know about" the
furn
namespace prefix. (Namespaces, like special attributes such as
xml:lang
, are in effect for the scope of whatever element declares them,
unless redefined by some descendant.) The second namespace declaration asserts that
all
unprefixed element names in the document also come from a particular namespace (XHTML
1.0, in this case). Consequently, any namespace-aware application which processes
this
document will recognize two distinct types of table
element in this document:
one from the furniture-related vocabulary and one from XHTML 1.0.
Complications ensue...
You may have picked up on a couple of odd, unexplained features of the preceding document.
Once you start using namespaces in a particular document, you must commit to going
the
whole way. In theory, only the names of the two table
element types needed to
be disambiguated. In practice, though, you use namespaces to disambiguate entire
vocabularies -- even the names of elements, like td
and chair
above, which are already unambiguous. Thus, if you decide to require that the furniture-type
table
have a furn:
prefix, you're committed to using that prefix
on the names of the furniture
, chair
, and lamp
elements as well.
As previously noted, a particular namespace prefix's associated URI need not be the
"web
address" of anything in particular. There is nothing at the URI
http://myfurn/namespace
(except a "document not found" error). On the other
hand, there's definitely something at the
http://www.w3.org/1999/xhtml
URI associated with the "empty namespace
prefix." (I'll leave for you the exercise of inspecting that "something.")
But the above document introduces some more profound questions.
Deeper mysteries
The first mystery is that the document above no longer contains two distinct elements
named table
. It now contains a table
element, and a
furn:table
element. The prefix is part of the element name.
The second mystery is the real killer, and it's the reason why the original questioner
is
having trouble with the checkbook application. If you mix element types from two different
vocabularies, how can you possibly validate a document at all, given that a valid
document
may contain no more than one DOCTYPE
declaration, referencing no more than one
DTD?
The answer is weird but also (once you think about it) obvious. Either (a) you can't validate it at all, or (b) you can validate it only if you include, in the one referenced DTD, all element names -- including their prefixes and all namespace-declaring attributes.
Case (a) isn't as outlandish an option as you might imagine. It's one of the most
common
solutions, thanks in part to XSLT's popularity. An XSLT style sheet must contain elements
from the XSLT vocabulary, such as xsl:stylesheet
and xsl:template
,
and these are intermingled in the stylesheet with elements from the result tree vocabulary.
Validating an XSLT style sheet is a remote -- but only remote -- possibility. The
whole
thing works wonderfully using the simpler alternative of well-formedness.
(For some reason, case (a) seems to drive many otherwise sane users of XML absolutely batty: "If I can't validate a document, how do I know it's correct?" This has never bothered me because in terms of XML 1.0 well-formedness is just as "correct" as validity. If a document works in an application that needs to use the document, who cares if it works in the framework of some other arbitrary application -- like a validating parser?)
The question at hand
Also in XML Q&A |
|
For starters, the XML document with which this whole discussion opened is a little
strange
-- given what you now know about namespaces. Its root element, checkbook
,
declares four namespace prefixes and their associated URIs: f
, s
,
m
, and ars
. Of these, only one is actually used anywhere in the
document: f
, on the f:deposit
element. Furthermore, there is
no namespace declaration for the "empty prefix" -- which is actually, by default,
implicit in the names of all other elements in the document (amount
,
date
, and so on).
Let's assume that validation must be achieved somehow, that simple well-formedness
won't
suffice. Let's also assume that the original document is a fragment of a more complete
one,
which actually does at some point need to use the s
, m
, and
ars
prefixes as well as f
. Here's how the fragment of a DTD
above, way back at the beginning, could be modified to accommodate both validation
and
namespaces.
<!ELEMENT checkbook (f:deposit|payment)*>
<!ATTLIST checkbook
xmlns:f CDATA #FIXED
"http://schemas.ar-ent.net/soap/file/"
xmlns:s CDATA #FIXED
"http://schemas.xmlsoap.org/soap/envelope/"
xmlns:m CDATA #FIXED
"http://schemas.ar-ent.net/test/soap.tr/checkbook/"
xmlns:ars CDATA #FIXED
"http://schemas.ar-ent.net/soap/"
xmlns CDATA #FIXED
"http://mycheckbookURI">
<!ELEMENT f:deposit (payor, amount, date,
description?)>
<!ATTLIST f:deposit
type
(cash|check|direct-deposit|transfer) #REQUIRED>
<!ELEMENT amount
(#PCDATA)>
<!ELEMENT date (#PCDATA)>
<!ELEMENT payor
(#PCDATA)>
<!ELEMENT description (#PCDATA)>
<!ATTLIST description
category (cash|entertainment|food|income|work) 'food'>
Now your application will find an element named f:deposit
in the DTD,
whereas before the DTD declared only an element named deposit
(no prefix). And
now the rest of the document can use any of the four explicit prefixes on any element
name,
as long as those names, including prefixes, are declared in the DTD.
If an element named s:envelope
appears in the document, an element named
s:envelope
must be declared in the DTD. A declaration for a simple
envelope
element won't suffice.
Simple? Probably not. Possible? You bet.