Of Grouping, Counting, and Context
July 31, 2002
John Simpson is the author of XPath and XPointer.
Q: How do I count the number of elements with a given attribute value?
My XML file looks like this:
<Match> <Date>21/3/2005</Date> <Stadium>Wembley</Stadium> <Team Name="Liverpool"> <Goal Scorer="O'Reilly"/> <Goal Scorer="Smith"/> <Goal Scorer="O'Reilly"/> </Team> <Team Name="Real Madrid"> <Goal Scorer="Charles"/> <Goal Scorer="Humble"/> <Goal Scorer="Humble"/> <Goal Scorer="Santana"/> <Goal Scorer="Humble"/> </Team> </Match>
I want to get output like this (for the Liverpool team):
Player Goals O'Reilly 2 Smith 1
Any ideas?
A: Great question. The answer requires knowledge of a couple of XSLT techniques: grouping
(with XSLT keys) and using the count()
function.
Here's the relevant portion of an XSLT stylesheet to solve the problem; notes on the code follow, particularly on the portions which are in boldface:
<xsl:key name="player" match="@Scorer" use="."/> <xsl:template match="Team"> <table border="1"> <tr><th colspan="3">Team: <xsl:value-of select="@Name"/></th></tr> <tr><th>Player</th><th>Goals</th><th>Gen'd ID</th></tr> <xsl:for-each select="Goal/@Scorer[generate-id()=generate-id(key('player', .))]"> <tr> <td><xsl:value-of select="."/></td> <td><xsl:value-of select="count(../../Goal[@Scorer=current()])"/></td> <td><xsl:value-of select="generate-id(.)"/></td> </tr> </xsl:for-each> </table> <br /> </xsl:template>
This code fragment first sets up an XSLT key -- something like an index in DBMS
terms. The name of the key is "player"; it matches on a pattern identified by the
XPath
expression @Scorer
. The value of the key, given by the use
attribute, is the string-value of that match pattern. Note that the xsl:key
element is a top-level element, a child of the root xsl:stylesheet
element.
Thus the match pattern isn't "relative" to anything at all in the source tree -- there's
nothing like a context node to which it can be relative -- until the key is
actually invoked, using the key()
function, by some lower-level stylesheet
template or instruction. Importantly, unlike a DBMS unique index or an XML ID-type
attribute, an XSLT key may point to more than one "thing" at a time. Also unlike ID-type
attributes, keys aren't restricted to identifying elements only. In this case, we're
going
to be grouping on the basis of attributes which share the same value: the
Scorer
attributes in the source tree.
There's one template rule in this stylesheet fragment. For every occurrence of a
Team
element in the source tree, as located by the
xsl:template
's match
attribute, the template constructs a
three-column table. Two columns display the names of the scoring players on each team
and
the number of goals scored; I've added the third column just to demonstrate how the
keying
works.
The first thing of interest in the template rule is the xsl:for-each
element
within it. Its select
attribute starts clearly enough --"for each
Scorer
attribute of a Goal
child of the context
Team
element" -- but then seems to trail off into gibberish when it moves
into the predicate.
What that predicate is doing is a form of the so-called Muenchian method for grouping
data
in ways not "built-in" to the structure of the source tree. The name of this technique,
by
the way, comes from Steve Muench of Oracle Corporation, who first popularized it on
the
XSL-List mailing list. It takes advantage of a couple of features of the XSLT
generate-id()
function:
- An XSLT processor isn't required to follow any particular algorithm in generating a unique identifier for a given node in the source tree.
- However, whatever algorithm it uses, an XSLT processor must generate the same key value for any given "seed value" (such as a node's string-value) in a given processing instance. (It need not generate the same key value for a given seed value in every processing instance.)
Essentially, this predicate says to restrict the node-set located by the
select
attribute to just those Scorer
attributes whose generated
IDs are identical to that of the current Scorer
attribute. This is where the
grouping occurs: instead of getting a separate table row for every single
Scorer
attribute (which would yield, say, two rows for O'Reilly and three for
Humble), you get one row for each unique
Scorer
attribute value.
The other boldfaced portion of the template rule is where the goals are counted for
each
unique Scorer
attribute value. The count()
function takes one
argument, an XPath expression -- a relative location path, here. The location path
tells the
count()
function to count all Goal
element children of the
current node's "grandparent" (that's the ../../
portion of the location path),
as long as those Goal
elements' Scorer
attributes have
the same value as the current Scorer
attribute.
Also in XML Q&A |
|
By the way, note the use of the current()
function in the predicate. Under
most circumstances, the current node is the same as the context node. Within an
xsl:for-each
block in particular, though the two will not necessarily be the same.
At the point of the call to the current()
function in that
xsl:value-of
element's select
attribute, the context
node has shifted to the Goal
element selected by the portion of the
count()
function's argument which precedes the predicate. Of course this element's
string-value is never equal to that of the Scorer
attribute, so (a) the
predicate is never true, (b) no matching nodes are ever selected, and (c) the number
of
goals is therefore always zero. The current node, on the other hand, is unaffected
by the xsl:value-of
's resetting of the context node: the current node, in this
case, is always the node established by the current pass through the
xsl:for-each
loop -- that is, the current (keyed) Scorer
attribute value.
The third column in the table simply shows how the generate-id()
function
works for each unique
Scorer
value. Again, remember the two rules in the preceding list: a given key
must be unique for any given "seed value," and an XSLT processor is free to use any
key-generation algorithm it wants as long as it results in the same key (a generated
ID, in
this case) for a given seed value in a given processing instance. Here's how Microsoft's
MSXML processor formats the table in a typical instance:
If you use the Saxon processor to transform the source tree with the above XSLT code, you might get something like the following instead (results obtained from Saxon 6.2):
Note (in the third column) that the two processors employ quite different rules for generating the IDs, but still produce the same results in the important columns: counts of goals, grouped by scorer. Remove that third column and the two processors (as should all other compliant processors) produce identical results.