Writing Your Own Functions in XSLT 2.0
September 3, 2003
Most XSLT 1.0 processors, particularly the ones written in Java, let you write extension functions in the processor's host language, link them in, and then call those functions from stylesheets. The XSLT 1.0 spec spells out specific ways to check whether a particular extension function is available and how to recover gracefully if not. In the September 2001 "Transforming XML" column, I presented examples of extension elements and functions.
If you wanted to write your own functions within a stylesheet, there were ways to fake it with named templates, but faking it won't be necessary with XSLT 2.0, which lets you write your own functions using XSLT syntax. These functions return values that can be used all over your spreadsheet, even in XPath expressions.
Let's look at a simple example. The following stylesheet creates a result tree
upon seeing the root of any document, so you can run it with itself as input. It declares
a
function called foo:compareCI
, which does a case-insensitive comparison of two
strings and returns the same values as the XSLT 2.0 compare()
function
described in last month's
column.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://whatever"> <!-- Compare two strings ignoring case, returning same values as compare(). --> <xsl:function name="foo:compareCI"> <xsl:param name="string1"/> <xsl:param name="string2"/> <xsl:value-of select="compare(upper-case($string1),upper-case($string2))"/> </xsl:function> <xsl:template match="/"> compareCI red,blue: <xsl:value-of select="foo:compareCI('red','blue')"/> compareCI red,red: <xsl:value-of select="foo:compareCI('red','red')"/> compareCI red,Red: <xsl:value-of select="foo:compareCI('red','Red')"/> compareCI red,Yellow: <xsl:value-of select="foo:compareCI('red','Yellow')"/> </xsl:template> </xsl:stylesheet>
Also in Transforming XML |
|
The first thing to notice is that the declared function must come from a
namespace outside of the XSLT namespace. In the example I assigned a namespace prefix
of
foo
to the http://whatever
URL to make it clear that you can use
any namespace, as long as it's not the XSLT namespace. The URL I specified wasn't
serious,
but works anyway. You'll probably want to pick a URL associated with your company
or
project.
The actual function declaration in the sample stylesheet is in an
xsl:function
element. Its structure is pretty straightforward: a
name
attribute stores the function's name, and optional
xsl:param
child elements name parameters that can be passed to the function,
just like xsl:param
elements do in XSLT 1.0's xsl:template
elements. In the example above, the two parameters passed are the two strings to be
compared.
The function's only remaining line is an xsl:value-of
instruction,
which uses XPath 2.0's compare()
and upper-case()
functions to
perform its comparison and output the result. The return value of the function is
the
sequence of nodes that it outputs. If you want, you can add an as
attribute to
the xsl:function
element to indicate a specific data type that the function
returns. Because my foo:compareCI()
function returns the integer returned by
its call to the compare()
function, I could have added an
as="xs:integer"
attribute value to the xsl:function
element
(which would have required declaration of the http://www.w3.org/2001/XMLSchema
namespace to go with that "ns" prefix), but I wanted to keep my first example function
as
simple as possible.
When run with Saxon 7's experimental XSLT 2.0 support, this stylesheet creates the following output:
<?xml version="1.0" encoding="UTF-8"?> compareCI red,blue: 1 compareCI red,red: 0 compareCI red,Red: 0 compareCI red,Yellow: -1
The third line is the most important here because it shows that the function considers "red" and "Red" to be equal. (See last month's column for the meaning of the various return values.)
XSLT 2.0 functions can be recursive. The following stylesheet includes a
substring function that expects you to pass it a string (inString
) and the
length of a substring to pull from that string (length
), starting at its first
character. Instead of always breaking after length
characters, though, this
function only breaks there if it finds a word boundary character. Otherwise, it breaks
at
the last word boundary before that. It does this by calling itself with the same
inString
value and a length
value of length - 1
.
Before making each recursive call, the function's xsl:choose
element's first
xsl:when
element checks whether $length
is less than or equal to
0 and returns the entire string if so, because if $length
was decremented that
far, there's no point in continuing. The second xsl:when
element checks whether
the passed string is already shorter than the requested length, in which case it just
returns the whole string. The third and last xsl:when
element checks whether
character number $length
in $inString
is a member of the list of
delimiter characters defined near the beginning of the stylesheet, and if so, returns
the
string up to that point, because its job is done. If none of these conditions are
true, the
xsl:otherwise
element makes the recursive call.
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://whatever"> <xsl:output method="text"/> <xsl:variable name="delimiters"> ,."!?()</xsl:variable> <xsl:function name="foo:substrWordBoundary"> <xsl:param name="inString"/> <xsl:param name="length"/> <xsl:choose> <xsl:when test="$length <= 0"> <xsl:value-of select="$inString"/> </xsl:when> <xsl:when test="string-length($inString) <= $length"> <xsl:value-of select="$inString"/> </xsl:when> <xsl:when test="contains($delimiters,substring($inString,$length + 1,1))"> <xsl:value-of select="substring($inString,1,$length)"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="foo:substrWordBoundary($inString,$length - 1)"/> </xsl:otherwise> </xsl:choose> </xsl:function> <xsl:template match="/"> 20 chars: <xsl:value-of select="foo:substrWordBoundary('This is a test.Right? Yes.',20)"/> 10 chars: <xsl:value-of select="foo:substrWordBoundary('This is a test.Right? Yes.',19)"/> already short enough: <xsl:value-of select="foo:substrWordBoundary('catatonic',15)"/> no boundaries: <xsl:value-of select="foo:substrWordBoundary('catatonic',5)"/> </xsl:template> </xsl:stylesheet>
The four strings passed to the function test several possible outcomes. With any source document, the stylesheet creates this result:
20 chars: This is a test.Right 10 chars: This is a test already short enough: catatonic no boundaries: catatonic
What happens if we pass a bad parameter to the function? For example, what if we added this new line after the "no boundaries" line, passing the string "five" instead of a numeric digit as the second parameter?
bad parameter: <xsl:value-of select="foo:substrWordBoundary('catatonic','five')"/>
Without executing the function on any of the legitimate input, Saxon 7 immediately tells us about the following problem:
Error at xsl:choose on line 13 of file:/C:/dat/writing/trxml/temp/sswb1.xsl: Cannot compare xs:string to xs:integer Transformation failed: Run-time errors were reported
The stronger typing offered by XSLT 2.0 lets us plan for this a little better.
By adding an as
attribute to the function's declaration for the
length
parameter, like this,
<xsl:param name="length" as="xs:integer"/>
we tell the XSLT processor to check the types of the parameters when they're
passed, instead of waiting for the bad data to blow up in some line of the stylesheet
that
doesn't know what to do with it. (Don't forget to add
xmlns:xs="http://www.w3.org/2001/XMLSchema"
to the other namespace
declarations in the stylesheet's start-tag.) With length
declared using this
typing, Saxon 7 catches the error sooner and delivers a more informative error message:
Error at xsl:value-of on line 35 of sswb2.xsl: Required type of second argument of *** call to user function ***() is xs:integer; supplied value has type xs:string Transformation failed: Failed to compile stylesheet. 1 error detected.
Nearly all serious programming languages offer the ability to declare and use your own functions; most programmers have become accustomed to the modularity and scalability advantages that this gives them. Now XSLT 2 developers will have these advantages as well.
Although you can declare and use your own functions in popular programming
languages ranging from C to JavaScript, they don't quite count as functional languages.
(Warning: readers with no trace of LISP/Scheme geek in them may want to stop reading
now.)
If DSSSL is XSLT's parent, that makes Scheme its
grandparent and LISP its great-grandparent. Between XSLT's xsl:function
element
and its idea of node sequences, I realized that I could implement the classic
car
and cdr
functions that return either the first item or the
remainder of a list, respectively. LISP does stand for "LISt Processing," after all,
and not
"Lots of Irritating Silly Parentheses". These two functions don't do much by themselves,
but
as two of the basic building blocks of LISP and later Scheme, they've provided the
foundation for useful applications for over 40 years. (The origin of the names "car"
and
"cdr," pronounced "could-er," is one of the classic old twisted history of
computer science stories.)
After the following stylesheet declares these two functions, it outputs the sample input list delimited by pipe characters. It then tests the functions individually and combines them into a more complex expression to extract the third member of the list sequence:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:foo="http://whatever"> <xsl:output method="text"/> <xsl:variable name="seq1" select="('a','b','c','d')"/> <xsl:function name="foo:cdr"> <xsl:param name="seq"/> <xsl:for-each select="subsequence($seq,2)"> <xsl:value-of select="."/> </xsl:for-each> </xsl:function> <xsl:function name="foo:car"> <xsl:param name="seq"/> <xsl:value-of select="item-at($seq,1)"/> </xsl:function> <xsl:template match="/"> seq1: <xsl:value-of select="string-join($seq1,'|')"/> car(seq1): <xsl:value-of select="string-join(foo:car($seq1),'|')"/> cdr(seq1): <xsl:value-of select="string-join(foo:cdr($seq1),'|')"/> car(cdr(cdr(seq1))): <xsl:value-of select= "string-join(foo:car(foo:cdr(foo:cdr($seq1))),'|')"/> </xsl:template> </xsl:stylesheet>
The output shows that it works. It may not look particularly useful, but it should provoke a smirk from some of the grayer-haired developers out there:
seq1: a|b|c|d car(seq1): a cdr(seq1): b|c|d car(cdr(cdr(seq1))): c