Of Grouping, Counting, and Context

July 31, 2002

John Simpson is the author of XPath and XPointer.

Q: How do I count the number of elements with a given attribute value?

My XML file looks like this:

   <Match>

      <Date>21/3/2005</Date>

      <Stadium>Wembley</Stadium>

      <Team Name="Liverpool">

         <Goal Scorer="O'Reilly"/>

         <Goal Scorer="Smith"/>

         <Goal Scorer="O'Reilly"/>

      </Team> 

      <Team Name="Real Madrid">

         <Goal Scorer="Charles"/>

         <Goal Scorer="Humble"/>

         <Goal Scorer="Humble"/>

         <Goal Scorer="Santana"/>

         <Goal Scorer="Humble"/>

      </Team>

   </Match>

I want to get output like this (for the Liverpool team):

   Player   Goals

   O'Reilly 2

   Smith    1

Any ideas?

A: Great question. The answer requires knowledge of a couple of XSLT techniques: grouping (with XSLT keys) and using the count() function.

Here's the relevant portion of an XSLT stylesheet to solve the problem; notes on the code follow, particularly on the portions which are in boldface:

   <xsl:key name="player" match="@Scorer"

use="."/>



   <xsl:template match="Team">

      <table border="1">

         <tr><th colspan="3">Team: <xsl:value-of

select="@Name"/></th></tr>

        

<tr><th>Player</th><th>Goals</th><th>Gen'd

ID</th></tr>

         <xsl:for-each

select="Goal/@Scorer[generate-id()=generate-id(key('player',

.))]">

            <tr>

               <td><xsl:value-of

select="."/></td>

               <td><xsl:value-of

select="count(../../Goal[@Scorer=current()])"/></td>



               <td><xsl:value-of

select="generate-id(.)"/></td>

            </tr>

         </xsl:for-each>

      </table>

      <br />

   </xsl:template>

This code fragment first sets up an XSLT key -- something like an index in DBMS terms. The name of the key is "player"; it matches on a pattern identified by the XPath expression @Scorer. The value of the key, given by the use attribute, is the string-value of that match pattern. Note that the xsl:key element is a top-level element, a child of the root xsl:stylesheet element. Thus the match pattern isn't "relative" to anything at all in the source tree -- there's nothing like a context node to which it can be relative -- until the key is actually invoked, using the key() function, by some lower-level stylesheet template or instruction. Importantly, unlike a DBMS unique index or an XML ID-type attribute, an XSLT key may point to more than one "thing" at a time. Also unlike ID-type attributes, keys aren't restricted to identifying elements only. In this case, we're going to be grouping on the basis of attributes which share the same value: the Scorer attributes in the source tree.

There's one template rule in this stylesheet fragment. For every occurrence of a Team element in the source tree, as located by the xsl:template's match attribute, the template constructs a three-column table. Two columns display the names of the scoring players on each team and the number of goals scored; I've added the third column just to demonstrate how the keying works.

The first thing of interest in the template rule is the xsl:for-each element within it. Its select attribute starts clearly enough --"for each Scorer attribute of a Goal child of the context Team element" -- but then seems to trail off into gibberish when it moves into the predicate.

What that predicate is doing is a form of the so-called Muenchian method for grouping data in ways not "built-in" to the structure of the source tree. The name of this technique, by the way, comes from Steve Muench of Oracle Corporation, who first popularized it on the XSL-List mailing list. It takes advantage of a couple of features of the XSLT generate-id() function:

An XSLT processor isn't required to follow any particular algorithm in generating a unique identifier for a given node in the source tree.
However, whatever algorithm it uses, an XSLT processor must generate the same key value for any given "seed value" (such as a node's string-value) in a given processing instance. (It need not generate the same key value for a given seed value in every processing instance.)

Essentially, this predicate says to restrict the node-set located by the select attribute to just those Scorer attributes whose generated IDs are identical to that of the current Scorer attribute. This is where the grouping occurs: instead of getting a separate table row for every single Scorer attribute (which would yield, say, two rows for O'Reilly and three for Humble), you get one row for each unique Scorer attribute value.

The other boldfaced portion of the template rule is where the goals are counted for each unique Scorer attribute value. The count() function takes one argument, an XPath expression -- a relative location path, here. The location path tells the count() function to count all Goal element children of the current node's "grandparent" (that's the ../../ portion of the location path), as long as those Goal elements' Scorer attributes have the same value as the current Scorer attribute.

Also in XML Q&A

From English to Dutch?

Trickledown Namespaces?

From XML to SMIL

From One String to Many

Getting in Touch with XML Contacts

By the way, note the use of the current() function in the predicate. Under most circumstances, the current node is the same as the context node. Within an xsl:for-each block in particular, though the two will not necessarily be the same. At the point of the call to the current() function in that xsl:value-of element's select attribute, the context node has shifted to the Goal element selected by the portion of the count() function's argument which precedes the predicate. Of course this element's string-value is never equal to that of the Scorer attribute, so (a) the predicate is never true, (b) no matching nodes are ever selected, and (c) the number of goals is therefore always zero. The current node, on the other hand, is unaffected by the xsl:value-of's resetting of the context node: the current node, in this case, is always the node established by the current pass through the xsl:for-each loop -- that is, the current (keyed) Scorer attribute value.

The third column in the table simply shows how the generate-id() function works for each unique Scorer value. Again, remember the two rules in the preceding list: a given key must be unique for any given "seed value," and an XSLT processor is free to use any key-generation algorithm it wants as long as it results in the same key (a generated ID, in this case) for a given seed value in a given processing instance. Here's how Microsoft's MSXML processor formats the table in a typical instance:

If you use the Saxon processor to transform the source tree with the above XSLT code, you might get something like the following instead (results obtained from Saxon 6.2):

XSLT keys generated by Saxon 6.2 processor

Note (in the third column) that the two processors employ quite different rules for generating the IDs, but still produce the same results in the important columns: counts of goals, grouped by scorer. Remove that third column and the two processors (as should all other compliant processors) produce identical results.