What's New in XSLT 2.0
April 10, 2002
In my previous article, I went through a brief overview of some of the features that XPath 2.0 offers over and above XPath 1.0. We saw that XPath 2.0 represents a significant increase in functionality for XSLT users. In this article, we'll take a look at some of the new features specific to XSLT 2.0, as outlined in the latest working draft. Again, this assumes that you are familiar with the basics of XSLT/XPath 1.0.
XSLT 2.0 and XPath 2.0
XSLT 2.0 goes hand in hand with XPath 2.0. The two languages are specified separately and have separate requirements documents only because XPath 2.0 is also meant to be used in contexts other than XSLT, such as XQuery 1.0. But for the purposes of XSLT users, the two are linked together. You can't use XPath 2.0 with XSLT 1.0 or XPath 1.0 with XSLT 2.0. (At least, the W3C is not currently proposing any such combination.)
Note
Ever wonder what happened to XSLT 1.1? It's been canceled. Officially as of August 2001, and actually as of a number of months before then, the XSL Working Group ceased work on XSLT 1.1, instead focusing its efforts on the development of XSLT and XPath 2.0, carrying forward the requirements for XSLT 1.1 into the requirements for XSLT 2.0.
A Welcome Arrival
A new version of XSLT has been heavily anticipated in the XSLT user community for some time. As is true with the first versions of many languages, it did not become clear which extensions to the language would prove to be the most important until there had been some real-world experience with it. Since November 16, 1999, when XSLT 1.0 became a recommendation, it has become quite apparent that certain areas of missing functionality are due for inclusion in the next version of the language. In this article, we'll show how XSLT 2.0 addresses four of these areas.
- Conversion of result tree fragments to node-sets
- Multiple output documents
- Built-in support for grouping
- User-defined functions (implemented in XSLT)
Death To the Result Tree Fragment!
In XSLT 1.0 the result tree fragment (RTF) type is like a node-set, but it is really
a
second-class citizen. An RTF is what you get whenever you use xsl:variable
to
construct a temporary tree. The problem is that you can't then use an XPath expression
to
access the innards of this tree, unless you use a vendor-specific extension function,
usually called something like node-set()
, to convert the RTF into a first-class
node-set (consisting of one root node). The rationale for the RTF data type was that
it
would reduce implementation burden, but since almost all existing XSLT processors
provide
their own version of a node-set()
extension function anyway, that consideration
has become moot. In any case, the need to overcome this limitation has been clear
for some
time, as it is important to be able to break up complex transformations into sequences
of
simpler transformations.
If you haven't guessed already, XSLT 2.0 has shown RTFs the door. Now when you use
xsl:variable
to create a temporary tree, the value of that variable is a true
node-set. Actually, in XPath 2.0 terms, it is a true node sequence, consisting of one
document node, which is XPath 2.0's name for what XPath 1.0 called a "root node".
With that sequence you can then use path expressions to drill down inside the tree,
apply
templates to it, and so on, just like you would with any other source document. With
XSLT
2.0, there is no longer a need for the node-set()
extension function.
Enabling Multiple Output Documents
Another extension which many XSLT 1.0 processors provide is support for multiple output
documents. This extension has proven very useful, especially for statically generating
web
sites containing multiple pages. The problem with extensions is that they aren't standard.
Each XSLT processor has a different extension element for doing this, e.g.
saxon:output
, xt:document
, etc.
XSLT 2.0 provides a standard way to output multiple documents, using the
xsl:result-document
element. The following example stylesheet
constructs multiple output documents, one "principal result document" and a variable
number
of "secondary result documents". The principal source document will be serialized
as XHTML,
and the secondary result documents will be serialized as text.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml"> <xsl:output method="xhtml"/> <xsl:output method="text" name="textFormat"/> <xsl:template match="/"> <html> <head> <title>Links to text documents</title> </head> <body> <p>Here is a list of links to text files:</p> <ul> <xsl:apply-templates select="//textBlob"/> </ul> </body> </html> </xsl:template> <xsl:template match="textBlob"> <xsl:variable name="uri" select="concat('text', position(), '.txt')"/> <li> <a href="{$uri}"> <xsl:value-of select="$uri"/> </a> </li> <xsl:result-document href="{$uri}" format="textFormat"> <xsl:value-of select="."/> </xsl:result-document> </xsl:template> </xsl:stylesheet>
The href
attribute of xsl:result-document
is used to
assign the URI of the corresponding output document. For many processors, this will
mean
writing the document to a file with that name. The format
attribute
refers to a named output definition. In this case, it points to the xsl:output
element that we appropriately named textFormat
.
Another thing worth noting from the example aboveis the use of the XHTML output method, newly introduced in XSLT 2.0.
Grouping Simplified
XSLT 1.0 did not include built-in support for grouping. Certain grouping problems certainly can be solved using various techniques, such as the Muenchian Method, but such solutions tend to be rather complex and verbose. One of XSLT 2.0's requirements was that it must simplify grouping. As we shall see from a simple example below, it is well on its way to meeting that goal.
An example that's used in both the Requirements document and the XSLT 2.0 working draft involves converting the list of cities in the following simple XML document,
<cities> <city name="milan" country="italy" pop="5"/> <city name="paris" country="france" pop="7"/> <city name="munich" country="germany" pop="4"/> <city name="lyon" country="france" pop="2"/> <city name="venice" country="italy" pop="1"/> </cities>
to an HTML table that groups the cities by the country they are in, as follows:
<table> <tr> <th>Country</th> <th>City List</th> <th>Population</th> </tr> <tr> <td>italy</td> <td>milan, venice</td> <td>6</td> </tr> <tr> <td>france</td> <td>paris, lyon</td> <td>9</td> </tr> <tr> <td>germany</td> <td>munich</td> <td>4</td> </tr> </table>
The difficult part of this transformation is generating the last three rows (in bold). An XSLT 1.0 solution can be seen below:
<xsl:for-each select="cities/city[not(@country = preceding::*/@country)]"> <tr> <td><xsl:value-of select="@country"/></td> <td> <xsl:for-each select="../city[@country = current()/@country]"> <xsl:value-of select="@name"/> <xsl:if test="position() != last()">, </xsl:if> </xsl:for-each> </td> <td><xsl:value-of select="sum(../city[@country = current()/@country]/@pop)"/></td> </tr> </xsl:for-each>
In the above example, we first identify the first city for each unique country, which is selected by the following XPath expression:
cities/city[not(@country = preceding::*/@country)]
Then, for each group, we need to be able to refer back to all other members of the group, in order to get the list of city names for each country as well as the total population for each country. In each case, we have some redundancy because the only way to refer to the current group is with an expression such as the following:
../city[@country = current()/@country]
This is clearly not an ideal situation, since the redundancy tends to make it rather
error-prone. Enter xsl:for-each-group
, XSLT 2.0's answer to many of your
grouping problems. The following example shows the much simpler XSLT 2.0 solution
to this
problem (with new features in bold):
<xsl:for-each-group select="cities/city" group-by="@country"> <tr> <td><xsl:value-of select="@country"/></td> <td> <xsl:value-of select="current-group()/@name" separator=", "/> </td> <td><xsl:value-of select="sum(current-group()/@pop)"/></td> </tr> </xsl:for-each-group>
In the above example, xsl:for-each-group
initializes the "current group" as
part of the XPath evaluation context. The current group is simply a sequence. Once
we've set
up our group using the group-by
attribute, we can thereafter refer to
the current group using the current-group()
function. This completely
eliminates the redundancy that was present in the XSLT 1.0 solution.
Note also the separator
attribute on xsl:value-of
. The
mere presence of this attribute instructs the processor to output not just the string
value
of the first member of the sequence (XSLT 1.0's behavior), but the string values of
all members of the sequence, in sequence order. The value of the
separator
attribute is an optional string that is used as a delimiter between
each string in the output. For the sake of backward compatibility with XSLT 1.0, only
the
sequence's first member's string value is output when the separator
attribute
is not present.
Finally, xsl:for-each-group
is able to solve different kinds of grouping
problems depending on which of the three attributes you choose from:
group-by
(which we saw in action above),
group-adjacent
(which enables grouping based on adjacency of nodes
in document order, e.g. transforming inline <para> elements into block <para>
elements), and group-starting-with
(which groups by patterns of elements
in a sequence). Examples of each of these can be found in the latest XSLT 2.0 Working
Draft
in "13.3
Examples of Grouping".
User-defined Functions
XSLT 2.0 introduces the ability for users to define their own functions which can
then be
used in XPath expressions. This is an extremely powerful mechanism that should prove
to be
very useful. Stylesheet functions, as they are called, are defined using the
xsl:function
element. This element has one required attribute, the
name
attribute. It contains zero or more
xsl:param
elements, followed by zero or more
xsl:variable
elements, followed by exactly one
xsl:result
element. This restricted content model may sound
limiting, but you will discover that the real power lies in the use of XPath 2.0 to
define
the result in the select
attribute of the xsl:result
element. As
you may recall, XPath 2.0 includes the ability to do conditional expressions
(if
...then
) and iterative expressions
(for
...return
).
As the following example (taken straight from the latest working draft) shows, most
of the
work is done inside the select
attribute of xsl:result
. This
stylesheet invokes the user's recursively-defined function, str:reverse()
, to
output the string "MAN BITES DOG
".
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:str="http://user.com/namespace" version="2.0" exclude-result-prefixes="str"> <xsl:function name="str:reverse"> <xsl:param name="sentence"/> <xsl:result select="if (contains($sentence, ' ')) then concat(str:reverse(substring-after($sentence, ' ')), ' ', substring-before($sentence, ' ')) else $sentence"/> </xsl:function> <xsl:template match="/"> <output> <xsl:value-of select="str:reverse('DOG BITES MAN')"/> </output> </xsl:template> </xsl:transform>
Other Useful Stuff
XSLT 2.0 includes a number of other useful features that we won't go into detail here. They include a mechanism for defining a default namespace for XPath expressions, the ability to use variables in match pattern predicates, named sort specifications, the ability to read external files as unparsed text, and so on.
In addition, a large part of the XSLT 2.0 specification remains to be written, particularly the material dealing with the construction and copying of W3C XML Schema-typed content. About this, the latest working draft says, "This is work in progress. Facilities for associating type information with constructed elements and attributes are likely to appear in future drafts of XSLT 2.
Getting Your Hands Dirty
For those of you who can't wait to start trying some of this stuff out, Michael Kay has released Saxon 7.0, which includes an "experimental implementation of XSLT 2.0 and XPath 2.0". It implements a number of features in the XSLT 2.0 and XPath 2.0 working drafts, with particular attention to those features that are likely the most stable. I've tested each of the examples in this article, and Saxon 7.0 executes them all as expected.
XSLT 2.0 is still very much a work in progress, so be forewarned that a number of things could change between now and the time it reaches Recommendation status. Until then, the public is encouraged to review the specification and send their comments to xsl-editors@w3.org.