Comparing XSLT and XQuery
March 9, 2005
XSLT has been the main XML technology for transformations for some time now, but it’s not the only player in the game. Although XQuery is designed for retrieving and interpreting information, it is also, according to the specification, “flexible enough to query a broad spectrum of XML information sources, including both databases and documents.”
In this article, we’ll be transforming the following XML source information from Cathy Kost, a beginning XML student who helps with a pot-bellied pig rescue organization.
<animal> <species>pot belly pig</species> <name>Molly II</name> <birth>February, 1998</birth> <in-date>January, 2000</in-date> <from>Middle Ave.</from> <gender spay-neuter="yes">F</gender> <info> She is a sweet, friendly pig who likes to hang out on Cathy’s porch on the lounge pad. </info> <picture> <file>images/molly_th.jpg</file> <description>Black pig</description> <caption>Molly in the Pasture</caption> </picture> </animal>
We will develop a transformation in both XSLT and XQuery. The transformations will change the XML into several HTML pages with four pigs per page, and an index page with links to the pig description pages. Both transformations will use built-in extensions to create multiple output files.
Each pig’s <picture>
element will become an
<img>
element in the resulting file. It’s good practice to put a
width
and height
attribute into image elements, but it’s a
lot of work to have to look up each image’s dimensions. This is a perfect place for
a
user-defined extension function that, given an image’s file name, returns the
image’s width and height.
Which Tools to Use?
For the XSLT transformation, we use the Apache Xalan XSLT processor. For XQuery, we use Qizx/open, which implements all features of the language except Schema import and validation.
The Main Differences
XSLT has a “processing engine” that automatically goes through the document tree and applies templates as it finds nodes; with XQuery the programmer has to direct the process. It’s almost like the difference between RPG (the business programming language, not role playing games) and procedure-oriented programming languages like C. In RPG, there is an implicit processing cycle, and you just set up the actions that you want to occur when certain conditions are met; in C, you are responsible for directing the algorithm.
XSLT is to XQuery as JavaScript is to Java. XSLT is untyped; conversions between nodes and strings and numbers are handled pretty much transparently. XQuery is a typed language which uses the types defined by XML Schema. XQuery will complain when it gets input that isn't of the appropriate type.
Global Setup
We want the number of pigs per page to be a global, user-settable parameter with a default value of four. In XSLT, we define this outside of any templates:
<xsl:param name="perPage" select="'4'"/>
In XQuery the following declaration appears as the first line in our query file:
declare variable $perPage as xs:integer := 4;
Both of these can be overridden by options on the command line. However, here is
our first
difference between XSLT and XQuery: any XSLT template may contain an
<xsl:param>
; that is how information gets passed among templates.
XQuery’s declare variable
defines global variables only, and cannot
appear within a user-defined function.
We also want the output file to be XHTML transitional. In XSLT we accomplish this with the following element:
<xsl:output method="xml" indent="yes" omit-xml-declaration="yes" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system="http://www.w3.org/TR/xhtml1/DTD/ xhtml1-transitional.dtd" />
In XQuery, we add these options to the UNIX shell script that will run Qizx/open:
-Xindent='yes' \ -Xmethod='XHTML' \ -X'doctype-public'='-//W3C//DTD XHTML 1.0 Transitional//EN'\ -X'doctype-system'=\ 'http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd' \
Creating the Index Page
Here is what the index page looks like when we have four pigs per page:
Figure 1: Logo with list items
In XSLT, we provide a template to match the root <pig-rescue>
element.
(To save space, we’re not showing the code that generates the logo image on the index
page.) Since we have to process the <animal>
elements in two different
ways—once for the index page and once for the display pages—we need to use a
mode
. The template will be applied only to every fourth
(perPage
) entry; this ensures that we get the correct number of list items in
the unordered list.
<xsl:template match="pig-rescue"> <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> <xsl:apply-templates select="animal[position() mod $perPage = 1]" mode="indexList" /> </ul> </body> </html> </xsl:template>
In XQuery, processing the document becomes our single XQuery statement; in this case, an XQuery FLWOR expression. This acronym stands for the clauses in the expression:
for
, which allows you to step through a sequence of items or nodes.let
, which allows you to declare and initialize variables.where
(optional), which allows you to specify under which conditions an item or node should be chosen.order
(optional), which sorts the selected items.return
, which returns the specified values for each of the selected items.
A FLWOR expression must have at least one for
or let
; ours has
just a let
which assigns the root element from the input document to the
$doc
variable. The return
returns an <html>
element. The parentheses aren’t really necessary as only one item is being returned,
but we want to use them for the sake of consistency.
let $doc := fn:input()/pig-rescue return ( <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>The Pigs</title> </head> <body> <div align="center"> <h1>The Pigs At Belly Draggers Ranch</h1> </div> <ul> { local:make-name-list( $doc/animal ) } </ul> </body> </html> )
The fn:input()
in the preceding code is a Qizx/open extension that takes the
input file name from the command line.
The text starting with the <html>
tag is called a Direct Element
Constructor, and it must be well-formed. Within one of these constructors, you may
embed XQuery expressions by enclosing them in braces. In this case, we switch back
to XQuery
to call the local:make-name-list
function, passing it all the
<animal>
nodes within the document. If the function name looks like it
has a namespace prefix, that’s because it does. XQuery predefines the namespace prefix
local
and reserves it for use in defining local functions.
Creating a List Item for the Index Page
Let us now turn our attention to the XSLT that provides the list items with four pig names per entry. The numbers in the circles refer to the notes that follow the listing.
<xsl:template match="animal" mode="indexList"> <xsl:variable name="start" select="(position()-1)*$perPage + 1"/> <xsl:variable name="end"> <xsl:choose> <xsl:when test="$start + $perPage > count(/pig-rescue/animal)"> <xsl:value-of select="count(/pig-rescue/animal)"/> </xsl:when> <xsl:otherwise> <xsl:value-of select="$start + $perPage - 1"/> </xsl:otherwise> </xsl:choose> </xsl:variable> <xsl:variable name="filename">animals<xsl:value-of select="position()"/>.html</xsl:variable> <li> <xsl:for-each select="/pig-rescue/animal[position() >= $start and position() <= $end]"> <xsl:variable name="url"><xsl:value-of select="$filename"/>#a<xsl:value-of select="$start+position()-1"/></xsl:variable> <a href="{$url}"> <xsl:value-of select="name"/> </a> <xsl:call-template name="seriesSeparator"> <xsl:with-param name="start" select="$start"/> <xsl:with-param name="end" select="$end"/> </xsl:call-template> </xsl:for-each> </li> <xsl:call-template name="makeSubfile"> <xsl:with-param name="start" select="$start"/> <xsl:with-param name="end" select="$end"/> <xsl:with-param name="filename" select="$filename"/> </xsl:call-template> </xsl:template>
- We can’t just use
position()
for this; though the calling template selected every fourth item, the template sees the resulting nodes in that list as being numbered one, two, three, etc. - The last animal to be processed is the starting animal plus the number per page or the total number of animals, whichever is less.
- The pigs’ names have to link to the page where their full descriptions will be.
For the first four pigs, this is
animals1.html
, for the next four it isanimals2.html
, etc. - Construct the destination URL for each pig.
- Listing the names separated by commas in a grammatically correct manner is tricky business, so we hand that off to a named template.
- Finally, as long as we have figured out which pigs to process, we pass that information to a template that will construct the file we named in step 4 above.
The XQuery equivalent for this is the local:make-name-list
function. The
logic is the same, so the notes will concentrate on the XQuery-specific aspects.
declare function local: make-name-list( $animalList as element()* ) as item()+ { for $pig at $pos in $animalList[position() mod $perPage = 1] let $n := count($animalList), $filename := fn:concat("animals", $pos, ".html"), $start := ($pos - 1) * $perPage + 1, $end := if ($start + $perPage > $n) then $n else $start + $perPage - 1 return ( <li> { for $animal at $pos in $animalList[position() >= $start and position() <= $end] return ( <a href="{$filename}#a{$start + $pos - 1}"> {$animal/name/text()} </a>, local:series-separator( $start, $pos, $end ) ) } </li>, local:make-subfile( $animalList, $start, $end, $filename) ) };
-
In an XQuery function, you should always specify the types of function parameters and return values. In this case, we need to specify that the
$animalList
parameter will consist ofelement()*
, which means zero or more elements. The function returnsitem()+
, which means one or more items.If you do not specify a type for the parameter or return value, XQuery assigns
item()*
, meaning zero or more items, where anitem()
is equivalent to XML Schema’sxs:anyType
. This is normally not what you want. -
Here is a
for
clause, stepping through every fourth animal in the list. Theat $pos
modifier has the same effect as<xsl:value-of select="position()">
. In XQuery, you can use theposition()
function only inside a predicate of an XPath expression. -
You may do several different assignments within a
let
clause by separating them with commas. Notice the assignment to$end
, which uses anif
expression. Since this is an expression and not a statement, you must always have both athen
andelse
so that it always yields a value. -
The
return
’s first value is the list item. The<li>
puts us into direct constructor mode, so we need braces to re-enter XQuery mode to create the contents. -
This line is the reason we needed to declare
$animalList
aselement()*
. You cannot use an anonymous item as a path step; you must have a node or an element.Also, we can’t just say
$animal/name
. Unlike<xsl:value-of select="animal/name"/>
, which yields a text value,$animal/name
puts a copy of the<name>
element, tags and all, into the return value. If we want just the text, we have to explicitly add the extratext()
step to the XPath expression. -
Making the page with the pigs’ description is a task that we hand off to another local function. Its output will be the second item in this function’s return value (note the comma on the preceding line), and that value will eventually make its way into the output, so the function will have to return the null string as its value.
Notice that the return
expression switches between direct element constructor
mode and XQuery expression evaluation mode several times. In XSLT, the difference
between
commands to the XSLT processor and elements destined for output is fairly easy to
distinguish due to the leading xsl:
prefix. When you first start writing
XQuery, it can be difficult to see—but always important to remember—which mode
you are in.
Putting the correct separator after a pig’s name boils down to one of four cases:
- last pig in the series: no comma
- next to last pig in a group of two: “ and ”
- next to last pig in a group of three or more: “ , and ”
- other pig in a series: a comma followed by a blank
In XSLT, this is a simple <xsl:choose>
, and we won’t show it
here. In XQuery, it is a simple nested if
. The types in the following
declaration are based on XML Schema’s predefined types, which means you also get all
the quirks and non-extensibility of the XML Schema type list. The function doesn’t
need a FLWOR expression; the result of the nested if
is the function’s
return value.
declare function local: series-separator( $start as xs:integer, $pos as xs:integer, $end as xs:integer) as xs:string { if (($start + $pos < $end) and ($end - $start > 1)) then ", " else if (($start + $pos = $end) and ($end - $start >= 2)) then ", and " else if (($start + $pos = $end) and ($end - $start = 1)) then " and " else "" };
Intermission
Before proceeding to the extensions for XSLT and XQuery, let’s pause for a brief summary that will help you translate from XSLT to XQuery.
XSLT | XQuery |
---|---|
<xsl:param name="x" select="10"/> (global parameter) |
declare variable $x as xs:integer := 10;
|
Parameters to <xsl:output/> |
Command line parameters to Qizx/open |
<xsl:template match="p"> <!-- template body --> </xsl:template>invoked by: <xsl:apply-templates select="XPath/to/p"/> |
declare function local:process-p( $pList as element()* ) { for $p in $pList (: function body :) }with a call: local:process-p( XPath/to/p ) |
<xsl:call-template name="action"> <xsl:with-param name="p1" select="value"/> </xsl:call-template> |
let $p1 := value return local:action($p1) |
<xsl:variable name="x" select="value"/> |
let $x := value |
position() outside a predicate |
for $item at $pos in $sequence
|
<xsl:if> |
No equivalent; all if expressions must have an else . |
<xsl:choose> <xsl:when test="cond 1"> <!-- value 1 --> <xsl:when> <xsl:when test="cond 2"> <!-- value 2 --> </xsl:when> <xsl:otherwise> <!-- value 3 --> </xsl:otherwise> |
if (cond 1) (: value 1 :) else if (cond 2) (: value 2 :) else (: value 3 :) |
“Counting loops” implemented by recursion |
for $i in 1 to n
|
Built-in Extensions
We are now in a position to make the subfiles that display the information about
each
group of pigs. Using Xalan, we must add a namespace to the
xmlns:redirect="org.apache.xalan.xslt.extensions.Redirect"
to the root
<xsl:stylesheet>
element. The template that makes the subfile follows.
To save space, we do not show the code that shows the next/previous page links at
the bottom
of each page.
<xsl:template name="makeSubfile"> <xsl:param name="start"/> <xsl:param name="end"/> <xsl:param name="filename"/> <!-- calculate this once, for use in next/back links --> <xsl:variable name="currentPage" select="(($start - 1) div $perPage) + 1"/> <redirect:write select="$filename"> <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>Animals Page <xsl:value-of select="$currentPage"/> </title> </head> <body> <div align="center"> <h1>Animals <xsl:value-of select="$start"/> - <xsl:value-of select="$end"/></h1> </div> <table border="0"> <xsl:apply-templates select="self::animal | following-sibling::animal[position() < $perPage]" mode="display"> <xsl:with-param name="start" select="$start"/> </xsl:apply-templates> </table> </body> </html> </redirect:write> </xsl:template>
- Everything between this tag and the closing
</redirect:write>
will be output to the file named in$filename
. - This ugly expression works out to the current animal and all the remaining ones on
the
page. It needs a
mode
because it is processing the<animal>
elements again, and the mode tells XSLT which template to invoke.
Now, the equivalent XQuery. Our strategy is to create the output page in a variable,
and
then use Qizx/open’s x:serialize
extension function to direct it to a
file.
declare function local: make-subfile( $animalList as element()*, $start as xs:integer, $end as xs:integer, $filename as xs:string) as xs:string { let $currentPage := (($start - 1) div $perPage) + 1 let $htmlPage := <html> <head> <link rel="stylesheet" type="text/css" href="bdr.css" /> <title>Animals Page {$currentPage}</title> </head> <body> <div align="center"> <h1>Animals {$start} - {$end}</h1> </div> <table border="0"> { local:display-animals( $animalList[position() >= $start and position() <= $end], $start ) } </table> </body> </html> let $outputFilename:= x:serialize( $htmlPage, <options output="animals{$currentPage}.html" indent="yes" omit-xml-declaration="yes" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN" doctype-system= "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"/>) return ( "" ) };
- In this function, we are using multiple
let
clauses rather than a series of variables separated by commas; it makes the code clearer to read. - We can’t use the same XPath expression that we used in the XSLT to handle the
individual animals. In XSLT,
makeSubfile
was called in the context of an<animal>
element. XQuery does not automatically pass the context on to called functions, which is why we must pass the entire$animalList
. - Qizx/open’s
x:serialize
function takes two arguments. The first is an XML tree you want serialized. The second is an element patterned along the lines of XSLT’s<xsl:output>
element.
A Comment about Comments
In order to place a comment into XQuery, you enclose it in smiley faces (:
and :)
, which works fine when you are in XQuery expression mode:
let $pi := 3.14159 (: just a quick approximation :)
Unfortunately, this doesn’t work well when you are in direct element constructor mode. The first of the three following examples will simply place text into the XML tree, smiley faces and all. Enclosing the comment in braces to enter XQuery expression mode gives a syntax error because an expression in braces must yield a value. The only way to get around this is to provide the null string as the value of the expression, as shown in the third example.
<a href="#">Main Page</a> (: activate link later :) <a href="#">Main page</a> { (: activate link later :) } <a href="#">Main page</a> { (: activate link later :) ""}
Writing a Xalan Extension
In order to retrieve the width and height of an image given its file name, we will
write a
Xalan extension function in Java. It will be in a class named XImageSize
(X for
Xalan). This function, named getDimensions
, will take a file name as string
input and return an empty XML element with attributes containing the file name and
the
image’s width and height. The return value “cleans up” the file name by
removing leading and trailing whitespace. The general model for this element is:
<imageSize fileName="fileName" width="width" height="height" />
In order to use this extension, we need to add some information to the XSL stylesheet. We need to establish a namespace for the extension and register that prefix as one belonging to an extension. We also want to make sure that this prefix never makes it into the output document.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0" xmlns:img="info.evccit.utils" extension-element-prefixes="img" exclude-result-prefixes="img">
Once this is set up, the XSL stylesheet can call the function and extract the width
and
height directly from the returned <imageSize>
element as follows:
<xsl:template match="picture"> <xsl:variable name="dimensions" select="img:ImageSize.getDimensions(string(file))"/> <img src="{$dimensions/@fileName}" width="{$dimensions/@width)}" height="{$dimensions/@height}" alt="{description}" title="{caption}" /> </xsl:template>
In order to return an element, the function must have access to a Document
and its createElement()
method. We know that it is possible for an extension
function to do this; the tokenize()
extension in
org.apache.xalan.lib.Extensions
does it. Because Xalan is Open Source, we can
look at the code and copy it wholesale into ours. We also need to put in the appropriate
attribution and include a copy of the Apache
License information along with the source code.
/** * This class is not loaded until first referenced * (see Java Language Specification by Gosling/Joy/Steele, * section 12.4.1) * * The static members are created when this class is * first referenced, as a lazy initialization not needing * checking against null or any synchronization. * * This function Copyright 1999-2004 * The Apache Software Foundation. */ private static class DocumentHolder { // Reuse the Document object to reduce memory usage. private static final Document m_doc; static { try { m_doc = DocumentBuilderFactory.newInstance(). newDocumentBuilder().newDocument(); } catch(ParserConfigurationException pce) { throw new org.apache.xml.utils. WrappedRuntimeException(pce); } } }
This class will go into the main XImageSize class, which looks like this.
package info.evccit.utils; public class XImageSize { static char fileSep; static { char[] carr = System.getProperty("file.separator").toCharArray(); fileSep = carr[0]; } public static Node getDimensions( String fileName ) { Document doc = DocumentHolder.m_doc; Element result = null; fileName = fileName.trim(); try { Dimension d = ImageFileDimensions.getFileDimensions( fileName ); result = doc.createElement("imageSize"); result.setAttribute( "fileName", fileName.replace( fileSep, '/' ) ); result.setAttribute( "width", Integer.toString((int) d.getWidth() )); result.setAttribute( "height", Integer.toString((int) d.getHeight() )); } catch (Exception e) { result = null; } return result; } }
- The static initialization of the class saves the system’s file separator character.
- The call to
ImageFileDimensions.getFileDimensions()
opens up the file and reads the first few bytes to determine whether it is a gif, JPG, or GIF file. Depending upon the file type, it does the appropriate work to extract the width and height and returns it in aDimension
object. The exact details aren’t relevant to this article, so the code isn’t shown here. - We have to replace the file separator character with a slash, which is the standard separator for URLs.
The source XML file sets the base path for all the images with the
<image-base>
element. Rather than do a complicated
normalize-space()
and concat()
to join the base path to the
image file name in the XSLT, we create a second version of getDimensions()
that
accepts two strings and does the heavy lifting:
public static Node getDimensions( String pathName, String fileName ) { String fileSeparator = System.getProperty("file.separator"); String combinedName; pathName = pathName.trim(); fileName = fileName.trim(); if (pathName.endsWith( fileSeparator )) { combinedName = pathName + fileName; } else { combinedName = pathName + fileSeparator + fileName; } return getDimensions( combinedName ); }
If you download the code, you will see that we have heavily overloaded the
getDimensions()
function by allowing it to accept a Node or NodeList for
either or both parameters, but that isn’t the point of this article. Onward to...
Writing an XQuery Extension
The code for this extension is almost identical to the Xalan extension. Instead of
returning an <imageSize>
element, however, we will return a vector of
three items: the filename, the width, and the height. Qizx/open will interpret this
as a
sequence of items.
The XQuery file must connect the class, which is named QImageSize
, with a
namespace. This statement goes at the head of the XQuery file. Note carefully! This
assignment uses a single equal sign, not the :=
used for a
let
clause. We will also have to pass the class name to Qizx/open on the
command line when we run the query; this lets Qizx/open know that this is an authorized
extension and no security exception needs to be raised.
declare namespace imgsize = "java:info.evccit.utils.QImageSize";
Once the namespace is established, XQuery can extract the information as part of the pig display code:
for $animal at $pos in $animalList let $basePath := $animal/../image-base, $dimensions := imgsize:getDimensions($basePath, $animal/picture/file/text() ) return ( <img src="{$dimensions[1]}" width="{$dimensions[2]}" height="{$dimensions[3]}" alt="{$animal/picture/description/text()}" title="{$animal/picture/caption/node()}" hspace="4" /> )
Here’s the code for the function that takes the entire filename as one string parameter:
package info.evccit.utils; import net.xfra.qizxopen.xquery.dm.Node; import java.awt.Dimension; import java.util.Vector; public class QImageSize { static char fileSeparator; static { char[] carr = System.getProperty("file.separator").toCharArray(); fileSeparator = carr[0]; } public static Vector getDimensions( String fileName ) { Vector result = new Vector(3); fileName = fileName.trim(); try { Dimension d = ImageFileDimensions.getFileDimensions( fileName ); result.add( fileName.replace( fileSeparator, '/' ) ); result.add( new Integer( (int) d.getWidth() ) ); result.add( new Integer( (int) d.getHeight() ) ); } catch (Exception e) { result = null; } return result; } }
The two-string version of getDimensions()
is exactly the same as the Xalan
version, except that it returns a Vector
instead of a Node
. This
function has also been heavily overloaded to accept a Qizx/open Node
(which is
not the same as a DOM node) so that the caller doesn’t have to dig down to
the text()
step in the path.
Getting the Code
You can download the sample pig rescue file, XSLT stylesheet, XQuery file, extension
sources, and Apache License here.
The Java source files are in the info
directory, and the API documentation is
in the doc
directory. Make sure you put the ImageSize.jar
file in
your classpath when invoking Xalan and/or Qizx/open.
The shell files xcompile.sh
and qcompile.sh
will compile the
Xalan and Qizx/open extensions. Files make_javadoc.sh
and
make_jar.sh
create the Javadoc and ImageSize.jar files. Files
run_xalan.sh
and run_qizx.sh
run the transformation and XQuery.
Thanks to Xavier Franc, author of Qizx/open, for his advice and information on using XQuery.