What's New in XSLT 2.0

April 10, 2002

In my previous article, I went through a brief overview of some of the features that XPath 2.0 offers over and above XPath 1.0. We saw that XPath 2.0 represents a significant increase in functionality for XSLT users. In this article, we'll take a look at some of the new features specific to XSLT 2.0, as outlined in the latest working draft. Again, this assumes that you are familiar with the basics of XSLT/XPath 1.0.

XSLT 2.0 and XPath 2.0

XSLT 2.0 goes hand in hand with XPath 2.0. The two languages are specified separately and have separate requirements documents only because XPath 2.0 is also meant to be used in contexts other than XSLT, such as XQuery 1.0. But for the purposes of XSLT users, the two are linked together. You can't use XPath 2.0 with XSLT 1.0 or XPath 1.0 with XSLT 2.0. (At least, the W3C is not currently proposing any such combination.)

Note

Ever wonder what happened to XSLT 1.1? It's been canceled. Officially as of August 2001, and actually as of a number of months before then, the XSL Working Group ceased work on XSLT 1.1, instead focusing its efforts on the development of XSLT and XPath 2.0, carrying forward the requirements for XSLT 1.1 into the requirements for XSLT 2.0.

A Welcome Arrival

A new version of XSLT has been heavily anticipated in the XSLT user community for some time. As is true with the first versions of many languages, it did not become clear which extensions to the language would prove to be the most important until there had been some real-world experience with it. Since November 16, 1999, when XSLT 1.0 became a recommendation, it has become quite apparent that certain areas of missing functionality are due for inclusion in the next version of the language. In this article, we'll show how XSLT 2.0 addresses four of these areas.

Conversion of result tree fragments to node-sets
Multiple output documents
Built-in support for grouping
User-defined functions (implemented in XSLT)

Death To the Result Tree Fragment!

In XSLT 1.0 the result tree fragment (RTF) type is like a node-set, but it is really a second-class citizen. An RTF is what you get whenever you use xsl:variable to construct a temporary tree. The problem is that you can't then use an XPath expression to access the innards of this tree, unless you use a vendor-specific extension function, usually called something like node-set(), to convert the RTF into a first-class node-set (consisting of one root node). The rationale for the RTF data type was that it would reduce implementation burden, but since almost all existing XSLT processors provide their own version of a node-set() extension function anyway, that consideration has become moot. In any case, the need to overcome this limitation has been clear for some time, as it is important to be able to break up complex transformations into sequences of simpler transformations.

If you haven't guessed already, XSLT 2.0 has shown RTFs the door. Now when you use xsl:variable to create a temporary tree, the value of that variable is a true node-set. Actually, in XPath 2.0 terms, it is a true node sequence, consisting of one document node, which is XPath 2.0's name for what XPath 1.0 called a "root node". With that sequence you can then use path expressions to drill down inside the tree, apply templates to it, and so on, just like you would with any other source document. With XSLT 2.0, there is no longer a need for the node-set() extension function.

Enabling Multiple Output Documents

Another extension which many XSLT 1.0 processors provide is support for multiple output documents. This extension has proven very useful, especially for statically generating web sites containing multiple pages. The problem with extensions is that they aren't standard. Each XSLT processor has a different extension element for doing this, e.g. saxon:output, xt:document, etc.

XSLT 2.0 provides a standard way to output multiple documents, using the xsl:result-document element. The following example stylesheet constructs multiple output documents, one "principal result document" and a variable number of "secondary result documents". The principal source document will be serialized as XHTML, and the secondary result documents will be serialized as text.


<xsl:stylesheet version="2.0"

  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

  xmlns="http://www.w3.org/1999/xhtml">



  <xsl:output method="xhtml"/>

  <xsl:output method="text" name="textFormat"/>



  <xsl:template match="/">

    <html>

      <head>

        <title>Links to text documents</title>

      </head>

      <body>

        <p>Here is a list of links to text files:</p>

        <ul>

          <xsl:apply-templates select="//textBlob"/>

        </ul>

      </body>

    </html>

  </xsl:template>



  <xsl:template match="textBlob">

    <xsl:variable name="uri" select="concat('text', position(), '.txt')"/>

    <li>

      <a href="{$uri}">

        <xsl:value-of select="$uri"/>

      </a>

    </li>

    <xsl:result-document href="{$uri}" format="textFormat">

      <xsl:value-of select="."/>

    </xsl:result-document>

  </xsl:template>



</xsl:stylesheet>

The href attribute of xsl:result-document is used to assign the URI of the corresponding output document. For many processors, this will mean writing the document to a file with that name. The format attribute refers to a named output definition. In this case, it points to the xsl:output element that we appropriately named textFormat.

Another thing worth noting from the example aboveis the use of the XHTML output method, newly introduced in XSLT 2.0.

Grouping Simplified

XSLT 1.0 did not include built-in support for grouping. Certain grouping problems certainly can be solved using various techniques, such as the Muenchian Method, but such solutions tend to be rather complex and verbose. One of XSLT 2.0's requirements was that it must simplify grouping. As we shall see from a simple example below, it is well on its way to meeting that goal.

An example that's used in both the Requirements document and the XSLT 2.0 working draft involves converting the list of cities in the following simple XML document,


<cities>

  <city name="milan"  country="italy"   pop="5"/>

  <city name="paris"  country="france"  pop="7"/>

  <city name="munich" country="germany" pop="4"/>

  <city name="lyon"   country="france"  pop="2"/>

  <city name="venice" country="italy"   pop="1"/>

</cities>

to an HTML table that groups the cities by the country they are in, as follows:


<table>

   <tr>

      <th>Country</th>

      <th>City List</th>

      <th>Population</th>

   </tr>

   <tr>

      <td>italy</td>

      <td>milan, venice</td>

      <td>6</td>

   </tr>

   <tr>

      <td>france</td>

      <td>paris, lyon</td>

      <td>9</td>

   </tr>

   <tr>

      <td>germany</td>

      <td>munich</td>

      <td>4</td>

   </tr>

</table>

The difficult part of this transformation is generating the last three rows (in bold). An XSLT 1.0 solution can be seen below:


  <xsl:for-each select="cities/city[not(@country =

                           preceding::*/@country)]">

    <tr>

      <td><xsl:value-of select="@country"/></td>

      <td>

        <xsl:for-each select="../city[@country = current()/@country]">

          <xsl:value-of select="@name"/>

          <xsl:if test="position() != last()">, </xsl:if>

        </xsl:for-each>

      </td>

      <td><xsl:value-of select="sum(../city[@country =

                 current()/@country]/@pop)"/></td>

    </tr>

  </xsl:for-each>

In the above example, we first identify the first city for each unique country, which is selected by the following XPath expression:

cities/city[not(@country = preceding::*/@country)]

Then, for each group, we need to be able to refer back to all other members of the group, in order to get the list of city names for each country as well as the total population for each country. In each case, we have some redundancy because the only way to refer to the current group is with an expression such as the following:

../city[@country = current()/@country]

This is clearly not an ideal situation, since the redundancy tends to make it rather error-prone. Enter xsl:for-each-group, XSLT 2.0's answer to many of your grouping problems. The following example shows the much simpler XSLT 2.0 solution to this problem (with new features in bold):


  <xsl:for-each-group select="cities/city" group-by="@country">

    <tr>

      <td><xsl:value-of select="@country"/></td>

      <td>

        <xsl:value-of select="current-group()/@name" separator=", "/>

      </td>

      <td><xsl:value-of select="sum(current-group()/@pop)"/></td>

    </tr>

  </xsl:for-each-group>

In the above example, xsl:for-each-group initializes the "current group" as part of the XPath evaluation context. The current group is simply a sequence. Once we've set up our group using the group-by attribute, we can thereafter refer to the current group using the current-group() function. This completely eliminates the redundancy that was present in the XSLT 1.0 solution.

Note also the separator attribute on xsl:value-of. The mere presence of this attribute instructs the processor to output not just the string value of the first member of the sequence (XSLT 1.0's behavior), but the string values of all members of the sequence, in sequence order. The value of the separator attribute is an optional string that is used as a delimiter between each string in the output. For the sake of backward compatibility with XSLT 1.0, only the sequence's first member's string value is output when the separator attribute is not present.

Finally, xsl:for-each-group is able to solve different kinds of grouping problems depending on which of the three attributes you choose from: group-by (which we saw in action above), group-adjacent (which enables grouping based on adjacency of nodes in document order, e.g. transforming inline <para> elements into block <para> elements), and group-starting-with (which groups by patterns of elements in a sequence). Examples of each of these can be found in the latest XSLT 2.0 Working Draft in "13.3 Examples of Grouping".

User-defined Functions

XSLT 2.0 introduces the ability for users to define their own functions which can then be used in XPath expressions. This is an extremely powerful mechanism that should prove to be very useful. Stylesheet functions, as they are called, are defined using the xsl:function element. This element has one required attribute, the name attribute. It contains zero or more xsl:param elements, followed by zero or more xsl:variable elements, followed by exactly one xsl:result element. This restricted content model may sound limiting, but you will discover that the real power lies in the use of XPath 2.0 to define the result in the select attribute of the xsl:result element. As you may recall, XPath 2.0 includes the ability to do conditional expressions (if...then) and iterative expressions (for...return).

As the following example (taken straight from the latest working draft) shows, most of the work is done inside the select attribute of xsl:result. This stylesheet invokes the user's recursively-defined function, str:reverse(), to output the string "MAN BITES DOG".


<xsl:transform 

  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"

  xmlns:str="http://user.com/namespace"

  version="2.0"

  exclude-result-prefixes="str">



<xsl:function name="str:reverse">

  <xsl:param name="sentence"/>

  <xsl:result 

     select="if (contains($sentence, ' '))

             then concat(str:reverse(substring-after($sentence, ' ')),

                         ' ',

                         substring-before($sentence, ' '))

             else $sentence"/>             

</xsl:function>



<xsl:template match="/">

<output>

  <xsl:value-of select="str:reverse('DOG BITES MAN')"/>

</output>

</xsl:template>



</xsl:transform>

Other Useful Stuff

XSLT 2.0 includes a number of other useful features that we won't go into detail here. They include a mechanism for defining a default namespace for XPath expressions, the ability to use variables in match pattern predicates, named sort specifications, the ability to read external files as unparsed text, and so on.

In addition, a large part of the XSLT 2.0 specification remains to be written, particularly the material dealing with the construction and copying of W3C XML Schema-typed content. About this, the latest working draft says, "This is work in progress. Facilities for associating type information with constructed elements and attributes are likely to appear in future drafts of XSLT 2.

Getting Your Hands Dirty

For those of you who can't wait to start trying some of this stuff out, Michael Kay has released Saxon 7.0, which includes an "experimental implementation of XSLT 2.0 and XPath 2.0". It implements a number of features in the XSLT 2.0 and XPath 2.0 working drafts, with particular attention to those features that are likely the most stable. I've tested each of the examples in this article, and Saxon 7.0 executes them all as expected.

XSLT 2.0 is still very much a work in progress, so be forewarned that a number of things could change between now and the time it reaches Recommendation status. Until then, the public is encouraged to review the specification and send their comments to xsl-editors@w3.org.