Finding IDs
June 25, 2003
Before getting into this month's questions, I wanted to discuss a change in this column's standard operating procedures. In the nearly three years that I've been writing "XML Q&A," all questions I've answered have been drawn from a single source: the O'Reilly Network XML Forum. (Annual exceptions -- the August columns -- have been the "Nobody Asked Me, But..." pieces, based on questions that no one in particular asked but that I wanted to tackle anyway.) In May, the XML Forum was discontinued by O'Reilly; its subject matter, after all, overlapped with numerous other online resources.
Starting this month, I'll be perusing those other online resources for questions to answer here. Included are the following mailing lists and newsgroups, in addition to others (links are to archives, subscription pages, or the Google Groups root for the given resource):
In all cases, I'll focus on questions which haven't yet been answered and continue to focus on questions of broad interest.
Now on to this month's items.
Q: XPath to IDs?
I want something like this:
<!-- a.xml --> <a> <elt id="a" value="1"/> <elt id="b" value="2"/> <elt id="c" value="3"/> </a> <!-- a.xsl --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:template match="/"> <output> <xsl:value-of select="#b/@value"/> </output> </xsl:template> </xsl:stylesheet>
and to get the output:
<output>2</output>
Is there a syntax for this in XPath 1 or one being considered for XPath 2.0?
A: First, I assume you know that simply naming an attribute "id" doesn't make it
an ID-type attribute. The only way to ensure that an attribute is of the ID type is
to
declare it so, via either DTD or XML Schema. So most of my answer takes it for granted
that
the id
attributes in your document are formally declared as such.
Just to highlight the portion of your stylesheet which you're proposing will locate the correct element:
<xsl:value-of select="#b/@value"/>
It's not quite that easy, but it's not much harder, either. (Aside from its simplicity,
though, the above is syntactically incorrect. An XPath-aware processor, such as an
XSLT
engine, will complain about the #
character.) Instead of a simple "named
anchor"-style selection, use the id()
function to locate the element in
question. It takes one argument, the ID value(s) you're looking for. Replace your
xsl:value-of
element with this one:
<xsl:value-of select="id('b')/@value"/>
And what if the id
attributes are undeclared? You can still locate the right
element (assuming no two elements share the same id
value) with:
<xsl:value-of select="descendant::*[@id='b']/@value"/>
As an aside, note that you needn't pass the id()
function just a single value.
You can pass it multiple values in a whitespace-delimited list, as here:
<xsl:value-of select="id('b c')/@value"/>
This locates the first element matching either of the two ID values. Furthermore,
the argument needn't be a string. If it's a number or Boolean, the argument will be
converted to a string. This behavior is consistent with that of other XPath functions.
But,
and this is the interesting part, if the argument is a node-set, the id()
function behaves quite differently. Rather than returning a single node, it returns
a
node-set containing all element nodes whose ID-type attributes match any of the
string-values of nodes in the passed node-set. Thus the id()
function can
actually locate more than one node, which seems to be a contradiction.
The notion is hard to visualize with the sample document the questioner has provided,
since
there's no correspondence between any string-values in the document and the values
of the
id
attributes.
But consider a common scenario: You control your own XML vocabulary, but not some other XML-based resource whose contents you want to use to locate ID-based values in one of your own documents. For instance, say you've got a document listing book titles (call it, say, books_details.xml):
<books_details> <book isbn="b0684833395">Catch-22</book> <book isbn="b0440180295">Slaughterhouse 5</book> <book isbn="b0764547771">XML: A Primer</book> <book isbn="b0446670251">The Virgin Suicides</book> <book isbn="b0440215625">Dragonfly in Amber</book> <book isbn="b088184800X">Crossed Wires</book> <book isbn="b0679736379">Sophie's Choice</book> <book isbn="b0596002521">XML Schema</book> </books_details>
(Note that isbn
is declared as an ID-type attribute. Also note a side-effect
of this declaration: the attribute's value may not start with a digit.)
Elsewhere, in some other document, you have a list of books arranged by subject (as they might be shelved in a bookstore, for example). This document (books_shelves.xml) might look something like this:
<books_details> <category shelf="fiction"> <isbn>b0684833395</isbn> <isbn>b0440180295</isbn> <isbn>b0446670251</isbn> <isbn>b0679736379</isbn> </category> <category shelf="tech"> <isbn>b0764547771</isbn> <isbn>b0596002521</isbn> </category> <category shelf="romance"> <isbn>b0440215625</isbn> </category> <category shelf="mystery"> <isbn>b088184800X</isbn> </category> </books_details>
Obviously, if you controlled both of these vocabularies, a simple solution would be to merge the two documents into one. But if you can't do so, for any of a thousand reasons, you can still use the second document to locate in the first all books which are shelved as, say, fiction. A stylesheet template to achieve this, by transforming books_details.xml, might look like the following:
<xsl:template match="/"> <xsl:for-each select="id(document('books_shelves.xml')//isbn [../@shelf='fiction'])"> <output> <xsl:value-of select="."/> </output> </xsl:for-each> </xsl:template>
The operative portion of this template -- the portion highlighted in boldface -- uses
the
id()
function, in conjunction with the document()
function, to
locate multiple nodes in the first document (books_details.xml ) based on the string-values
of nodes in the second (books_shelves.xml). Translated into English, the value of
the
xsl:for-each
element's select attribute might read something like this:
- The inner call to the
document()
function locates, in books_shelves.xml, a node-set consisting of allisbn
elements whose parents have ashelf
attribute with a value of "fiction." - The outer call to
id()
locates, in books_details.xml, each element with an ID-type attribute equal to the string-value of one of the nodes in the node-set located in the preceding step.
The result tree from this transformation is:
<output>Catch-22</output> <output>Slaughterhouse 5</output> <output>The Virgin Suicides</output> <output>Sophie's Choice</output>
By the way, note that this result tree isn't well-formed on its own, consisting as it does of more than one root element.
Also in XML Q&A |
|
There's more than one way to obtain these results. Instead of using the id ()
function, for instance, you could use keys to locate the desired nodes. (This is absolutely
the way to go if the attributes in question aren't ID-type attributes in the first
place.
See Bob DuCharme's "Declaring Keys and
Performing Lookups" here on XML.com for more details.) Still, if you've got
ID-type attributes you might as well take advantage of their uniqueness.
Follow-up: XML-based résumés
In last month's column, I reported on the XML Résumé Library for capturing curriculum vitae information. Shortly after that column appeared, I was contacted by Aaron Straup Cope, who has taken it upon himself to extend the XML Résumé Library with some (IMO) notable improvements.
At a minimum, Cope's extensions add to the XML Résumé Library's DTD a new element, activities, and several offspring elements. The activities element, says Cope, identifies "personal, or group, projects that are not directly 'work' related." For instance, you could include memberships in civic organizations under this category. Much more interesting is the set of stylesheets which Cope has prepared; these provide you with the ability to exclude certain information (address and phone number, for example) from the output, to define more than one CSS stylesheet depending on output device, and so on.
If you found the XML Résumé Library interesting, by all means head over to Cope's aaronlind.info XSLT tools page.