Little Back Corners
February 25, 2004
Q: I can't find what I'm looking for in my GML document.
Whenever I try to specify an XPath location path in a GML document, I receive a message
saying XPath returned no results. Queries I have used include
//FeatureCollection
and //FeatureCollection/gml:featureMember
.
The document in question looks like this:
<?xml version="1.0" encoding="UTF-8"?> <FeatureCollection xmlns="http://mydomain/schemas" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:gml="http://www.opengis.net/gml" xsi:schemaLocation="http://mydomain/schemas event3.xsd"> <gml:boundedBy> <gml:Box> <gml:coordinates>100,100,100,100</gml:coordinates> </gml:Box> </gml:boundedBy> <gml:featureMember> <event> <geometryProperty> <gml:Point> <gml:coordinates>100,100</gml:coordinates> </gml:Point> </geometryProperty> <siteCode>AL1234</siteCode> <date>2004-10-10</date> <locationDescription>Somewhere</locationDescription> <eventType>Excavation</eventType> <period>Roman</period> </event> </gml:featureMember> </FeatureCollection>
A: I'll get to your question in a moment. First, though, allow me a bit of a digression.
One of the things I like most about writing this column is the opportunity to poke into little back corners of the XML universe. Of course, these aren't little to the people who deal with them every day; they are little only in the sense that they're well-known only among the relative handful of people who deal with them every day. We all know about the W3C standards (although I daresay nobody knows everything about every one of them); we all know about the big corporate and open-source players and tools. But only a few of us get to deal with some of the truly intriguing uses of XML.
Your question deals with one of these interesting niches. GML -- the Geography Markup Language -- is a specification of the Open GIS Consortium (OGC). According to the GML FAQ, the markup language:
provides an XML-based encoding of geospatial data; it can be viewed as a basic application framework for handling geographic information in an open and non-proprietary way. By leveraging related XML technologies (e.g. XML Schema, XSLT, XLink, XPointer, SVG) a GML dataset becomes easier to process in heterogeneous environments, and it can be readily intermixed with other types of data: text, video, imagery, etc.
The GML specification is now at version 3.1, most recently updated in June, 2003. Included in the mix are specifications for use of XLink and SMIL with "pure" GML. You can find the various schemas at the Open GIS site. An excellent, albeit slightly overwhelming source of information is the GML (version 3) Implementation Specification, a 548-page PDF behemoth.
For more information about GML, check the indispensable Robin Cover's "Cover Pages" entry on GML; the above-mentioned GML FAQ; and the GML Central site. A company called Snowflake Software offers a GML viewer called OS MasterMap Viewer, which purports to read not only raw GML documents but those compressed using WinZip/gzip formats.
Your document includes five elements in the GML namespace: gml:boundedBy
,
gml:featureMember
, gml:Box
, gml:coordinates
, and
gml:Point
. These elements are used to assert the characteristics of a given
geographic feature; taken together with the elements in the default namespace (such
as
siteCode
, period
, and so on), they seem to describe an
archaeological site from the Roman era. (This is all presumably just "play data,"
right down
to the coordinates defined in gml:Box
-- the coordinate pairs of 100,100 (x)
and 100,100 (y) just define a single point in space.)
The fault in the default
On to your question, which is not really about GML per se but rather about how to find, using XPath, some content in a GML document. (I hope the irony is not lost on you of not being able to locate something in a document whose very purpose is to locate something in the physical world...) The only problem with your XPath location paths, it turns out, is not XPath syntax as such, but its use when working with namespace declarations. In particular, the problem is your declaration for the default namespace:
xmlns="http://mydomain/schemas"
I don't know why you need that declaration, since the namespace URI is clearly a dummy
or
placeholder. If, in any case, you remove that namespace declaration, you'll find that
the
location paths //FeatureCollection
and
//FeatureCollection/gml:featureMember
work just fine.
What's going on here?
In the XPath spec, we learn that
a node test that is a QName is true if and only if the type of the node (see [5 Data Model]) is the principal node type and has an expanded-name equal to the expanded-name specified by the QName.
What this means in practice is that an XPath processor doesn't deal with plain old element names, except those element names for which no namespace has been declared. If there's any namespace declaration at all, including one for the default (unprefixed) namespace, the processor uses the expanded-name (that is, the "qualified name," or qname) to identify the element.
While there's no formal requirement for how to form an expanded-name, a de facto standard
seems to exist among XPath processors: replace the namespace prefix with the namespace
URI
enclosed in "curly braces," the {
and }
characters. In the case of
your elements in the non-default namespace, such as gml:Box
, the XPath
processor is therefore expanding both the element name in the XPath expression and
the
element name(s) as they appear in the source document, as follows:
{http://www.opengis.net/gml}Box
This works marvelously to solve the problems associated with "real" namespaces --
in
particular, it allows you to use more than one namespace prefix to represent the same
namespace, should you want to do that. But it introduces a very weird problem of its
own
when dealing with element names in the declared default namespace. In essence the
expanded
name of your original FeatureCollection
element (in the default namespace) is
{}FeatureCollection
.
The real difficulty is that XPath syntax needs to satisfy two irreconcilable requirements:
handling elements in a declared but default (unprefixed) namespace and handling elements
in
no namespace at all, which do not have expanded names. In reconciling this dilemma,
the
XPath spec says that an unprefixed name in an XPath expression is assumed to be in
an undeclared namespace, even when the name as it appears in the instance document
has (as a result of a namespace declaration) an expanded name. Thus, your
//FeatureCollection
"query" is instructing XPath to locate an element which
does not exist, a FeatureCollection
element in an undeclared namespace.
![]() |
|
Also in XML Q&A |
|
The same holds true for the //FeatureCollection/gml:featureMember
location
path, by the way. Since -- to the XPath processor's squinty eyes -- there is no
FeatureCollection
element, it has no children at all, named
"gml:featureMember" or anything else. If you want to locate the
gml:featureMember
. element, just remove the reference to its non-existent
FeatureCollection
parent: //gml:featureMember
.
Suppose you can't, for some reason, simply strip out the default namespace declaration?
In
this case you have to jump through a minor hoop: instruct XPath to locate all elements
in
the document, and then refine (via a predicate and the local-name()
function)
the node-set of candidates to those with a local name of "FeatureCollection". (The
local
name is the element name sans namespace prefix, and it is not subject to expansion
even if in the declared default namespace.) Your location path will now look like
this:
//*[local-name()="FeatureCollection"]
You can also use this technique to locate the gml:featureMember
element:
//*[local-name()="FeatureCollection"]/gml:featureMember
Don't feel chagrined by not having previously picked up on XPath's treatment of expanded names in a default namespace. While it does make sense of a sort -- they needed to reconcile the irreconcilable somehow -- it remains one of the strangest little back corners of the XPath universe, even to people who deal with XPath every day!