Running Multiple XSLT Engines with Ant
December 11, 2002
What is Ant?
Ant is a build utility produced as
part of the Apache Jakarta project. It's broadly
equivalent to Unix's make
or nmake
under Windows.
make
-like tools work by comparing the date of an output file to the date of
the input files required to build it. If any of the input files is newer than the
output
file, the output file needs to be rebuilt. This is a simple rule, and one that generally
produces the right results.
Unlike traditional make
utilities, Ant is written in Java, so Ant is a good
cross-platform solution for controlling automatic file building. That is good news
for
anyone developing cross-platform XSLT scripts because you only need to target one
build
environment. Anyone who has tried writing and maintaining equivalent Windows and Unix
batch
scripts knows how hard it is to get the same behavior across different platforms.
Ant and XSLT
So why would you use Ant and XSLT together? If all you are doing is applying a single XSLT stylesheet to a single XML input file, using a single XSLT engine, then there is probably nothing to be gained. However, if
-
you need to apply one or more XSLT scripts to one or more XML input files in some sequence, in order to build your final output file(s);
-
you need to run multiple XSLT engines on the same XML input file(s) as part of your regression or integration testing
then Ant is a good and quick way to implement the workflow you need to transform your input(s) into your output(s).
A Simple Example
Using Ant for a simple "1 input, 1 stylesheet, 1 output" transformation is overkill
but
also a good way to learn how to use Ant. Assume that the input is input.xml
,
the stylesheet is transform.xsl
, and the output is output.html
. A
matching Ant 1.5 project file build.xml
might look something like
<project default="do-it"> <target name="do-it"> <xslt processor="trax"in="input.xml" style="transform.xsl"out="output.html"/> </target> </project>
The root element of an Ant build file is project
. It can contain a number of
target
elements. Its default
attribute contains the name of the
target to build if no targets are given on the command line. Since the example project
file
defaults to building the target do-it
, the output file could be built equally
using any of the following command lines:
$ ant $ ant do-it $ ant -buildfile build.xml $ ant -buildfile build.xml do-it
Unlike Unix's make
and its clones, which can use filenames for targets, Ant
only uses target names defined in the build file. So every target
must have a
unique name. Within a target, any number of tasks can be performed. The xslt
task is included with Ant 1.5. With the processor
attribute set to
trax
, the xslt
task uses the default JAXP/TraX XSLT engine to perform the transformation.
A Complex Example
What about a more complicated XSLT workflow, in which there are three input files
(in1.xml
, in2.xml
and in3.xml
)? Each of these has
the same kind of information, but the formats are different. So they are normalized
to a
common format by three separate stylesheets (norm1.xsl
, norm2.xsl
and norm3.xsl
respectively). A standard merging stylesheet exists,
merge.xsl
, but it only merges two inputs (the usual input plus a filename
passed as a parameter to the stylesheet). So it has to be used twice in order to merge
the
three normalized files. The merged sum of the three is sorted to produce the final
output
file, out.xml
.
The following Ant build file does the trick:
<project default="sort"> <target name="normalize"> <xslt processor="trax"in="in1.xml"style="norm1.xsl"out="nm1.xml"/> <xslt processor="trax"in="in2.xml"style="norm2.xsl"out="nm2.xml"/> <xslt processor="trax"in="in3.xml"style="norm3.xsl"out="nm3.xml"/> </target> <target name="check12"> <uptodate property="skip.merge12"targetfile="m12.xml"> <srcfiles dir="."> <include name="nm1.xml"/> <include name="nm2.xml"/> <include name="merge.xsl"/> </srcfiles> </uptodate> </target> <target name="merge12"depends="normalize,check12"unless="skip.merge12"> <xslt processor="trax"in="nm1.xml"style="merge.xsl"out="m12.xml"force="true"> <param name="source2"expression="nm2.xml"/> </xslt> </target> <target name="check123"> <uptodate property="skip.merge123"targetfile="123.xml"> <srcfiles dir="."> <include name="m12.xml"/> <include name="nm3.xml"/> <include name="merge.xsl"/> </srcfiles> </uptodate> </target> <target name="merge123"depends="normalize,merge12,check123"unless="skip.merge123"> <xslt processor="trax"in="m12.xml"style="merge.xsl"out="123.xml"force="true"> <param name="source2"expression="nm3.xml"/> </xslt> </target> <target name="sort"depends="merge123"> <xslt processor="trax"in="123.xml"style="sort.xsl"out="out.xml"/> </target> <target name="clean"> <delete> <fileset dir="."> <include name="output.html"/> <include name="nm*.xml"/> <include name="m12.xml"/> <include name="123.xml"/> <include name="out.xml"/> </fileset> </delete> </target> </project>
Ant takes account of timestamps on files, just like make
. It will not run the
transformation unless either the input file or the stylesheet is newer than the output
file
(which usually means that the input file or the stylesheet has been modified since
the last
build). So if in1.xml
is modified, nm2.xml
and
nm3.xml
will not be rebuilt. Alternatively, if in3.xml
is
modified, m12.xml
will not be rebuilt. This can save a lot of development time
is situations where one of the transformations takes much longer than the others.
Some things to note about this Ant project file include:
-
The default target is
sort
. Sorting is the last thing that needs to be done, so makingsort
the default target means that the whole build process is carried out by default. -
The
normalize
target is used to run the three normalization stylesheets. Although you could use three separate targets, there is no need, since eachxslt
task only runs when its output (nm1.xml
,nm2.xml
, ornm3.xml
) needs to be rebuilt. -
The
merge.xsl
stylesheet is special because one of its input filenames is passed as a stylesheet parameter. There is no way that the standardxslt
can know this, so it is necessary to tell Ant explicitly when a rebuild is or is not required. Thecheck12
target uses Ant'suptodate
task to check whetherm12.xml
is newer thannm1.xml
,nm2.xml
, andmerge.xsl
. The result is stored in the Ant propertyskip.merge12
.The
merge12
target is used to mergenm1.xml
andnm2.xml
, but it only runs whenskip.merge12
is false; that is, whenm12.xml
is not up to date. Thexslt
task which runsmerge.xsl
has an extra attributeforce
, which is set to true to override the default check for whether a rebuild is necessary.This complexity comes about purely because a filename has been passed to the stylesheet as a parameter. It is a special case, but one which is not too difficult to solve. The same logic applies to the targets
check123
andmerge123
. -
Finally, the
sort
target, which is the default target for the project, appliessort.xsl
to123.xml
to produce the end result,out.xml
.
The files for this project are provided with the zipped examples. You now know everything you
need to start using the standard Ant xslt
task in your own projects. However,
you should also take the time to read the full description of this task in the Ant documentation.
The Difficulty With Multiple XSLT Engines
XSLT stylesheets can provide a good cross-platform solution for manipulating XML,
but
different platforms use different XSLT engines. Sites that are using the Apache Web
server
often use Apache Xalan. Sites that are using PHP are likely to use Sablotron. Oracle
sites
often use the Oracle XDK (as this may be the only XSLT engine that the operations
people
will allow). Some XML consultants use and recommend Saxon. Microsoft sites generally
use
MSXML. Although these XSLT engines behave similarly, there are still some differences,
so
you need to plan to test with all of the XSLT engines that are likely to be used with
your
XSLT stylesheets. For this article, we will focus on the Java XSLT engines, since
they are
the ones supported natively by the Ant xslt
task.
When testing with multiple engines, it's useful to be able to run the same test using
each
XSLT engine from within one Ant build file. But there's a problem: the JAXP/Trax interface
uses the Java javax.xml.transform.TransformerFactory
property to define which
class should be instantiated as a factory for creating XSLT engines. In order to use
the
XSLT engine of your choice, this property needs to be set appropriately. However,
there is
no easy way to do that within Ant and, hence, no easy way to change XSLT engines within
a
single Ant build file. The best you can do is to launch a separate Java process and
then
call Ant from within that new process. To overcome this problem, the best solution
is to
create a new XSLT task for Ant, one which makes it easy to select the desired XSLT
TransformerFactory
.
mtxslt - The Solution
mtxslt (short for "multi-XSLT") is an Ant task that
makes it easy to select several Java XSLT engines within an Ant build file.
mtxslt
extends the standard Ant xslt
task while maintaining full
compatibility with it. Anything that works with the xslt
task also works with
mtxslt
.
With mtxslt
, it is possible to ignore the value of the Java
javax.xml.transform.TransformerFactory
property and simply load a particular
XSLT engine directly. mtxslt
currently supports Xalan 2, Saxon 6/7, and Oracle XDK 9.
A Multiple XSLT Engine Example
This example uses a few new Ant elements. A taskdef
is required to associate
the task name mtxslt
with the Java class which implements it. Actually, you can
call mtxslt
anything you want just by changing the name in the
taskdef
.
The property
definitions are used to define values that can be retrieved by
name throughout the build file, which is similar to defining a string variable in
a
programming language. Property definitions are used to define short names for qualified
Java
class names and for file paths, since both of these tend to be long and reduce the
readability and maintainability of the build file if repeated.
In this example, different XSLT engines are used to apply the same stylesheet
transform.xsl
to the same input input.xml
. The resulting HTML
files can then be compared.
<project name="test"default="all"> <taskdef name="mtxslt"classname="org.xmLP.ant.taskdefs.xslt.XSLTProcess"/> <property name="trax"value="org.xmLP.ant.taskdefs.optional.TraXLiaison"/> <property name="xalan2"value="org.xmLP.ant.taskdefs.optional.Xalan2Liaison"/> <property name="xalan2.classpath"value="D:\home\tony\XSLT\xalan-j_2_4_0\bin\xalan.jar"/> <property name="saxon6"value="org.xmLP.ant.taskdefs.optional.Saxon6Liaison"/> <property name="saxon6.classpath"value="D:\home\tony\XSLT\Saxon-6.5.2\saxon.jar"/> <property name="saxon7"value="org.xmLP.ant.taskdefs.optional.Saxon7Liaison"/> <property name="saxon7.classpath"value="D:\home\tony\XSLT\Saxon-7.1\saxon7.jar"/> <property name="oracle9"value="org.xmLP.ant.taskdefs.optional.Oracle9Liaison"/> <property name="oracle9.classpath" value="D:\home\tony\XSLT\xdk_java_9_2_0_3_0\lib\xmlparserv2.jar"/> <target name="all"depends="trax1,trax2,trax3,trax4,xalan2,saxon6,saxon7,oracle9"/> <target name="trax1"> <xslt processor="trax"in="input.xml"style="transform.xsl"out="trax1.html"> <param name="target"expression="trax1"/> </xslt> </target> <target name="trax2"> <mtxslt processor="trax"in="input.xml"style="transform.xsl"out="trax2.html"> <param name="target"expression="trax2"/> </mtxslt> </target> <target name="trax3"> <xslt processor="${trax}"in="input.xml"style="transform.xsl"out="trax3.html"> <param name="target"expression="trax3"/> </xslt> </target> <target name="trax4"> <mtxslt processor="${trax}"in="input.xml"style="transform.xsl"out="trax4.html"> <param name="target"expression="trax4"/> </mtxslt> </target> <target name="xalan2"> <mtxslt processor="${xalan2}"in="input.xml"style="transform.xsl"out="xalan2.html" classpath="${xalan2.classpath}"> <param name="target"expression="xalan2"/> </mtxslt> </target> <target name="saxon6"> <mtxslt processor="${saxon6}"in="input.xml"style="transform.xsl"out="saxon6.html" classpath="${saxon6.classpath}"> <param name="target"expression="saxon6"/> </mtxslt> </target> <target name="saxon7"> <mtxslt processor="${saxon7}"in="input.xml"style="transform.xsl"out="saxon7.html" classpath="${saxon7.classpath}"> <param name="target"expression="saxon7"/> </mtxslt> </target> <target name="oracle9"> <mtxslt processor="${oracle9}"in="input.xml"style="transform.xsl"out="oracle9.html" classpath="${oracle9.classpath}"> <param name="target"expression="oracle9"/> </mtxslt> </target> <target name="clean"> <delete> <fileset dir="."includes="*.html"/> </delete> </target> </project>
-
The target
trax1
simply uses the standardxslt
task to transform the input file, as in the earlier examples. -
The target
trax2
is identical totrax1
, except that it usesmtxslt
instead ofxslt
. This demonstrates thatmtxslt
implements the standard behavior of thexslt
task. -
The target
trax3
is similar totrax1
, except that the value of theprocessor
attribute is the value of the propertytrax
(i.e.,org.xmLP.ant.taskdefs.optional.TraXLiaison
). This is a feature of thexslt
task that only becomes apparent when you look at the Ant source code. Theprocessor
can optionally be a qualified class name for an Ant XSLT liaison class. This is the mechanism thatmtxslt
exploits to support multiple XSLT engines.This particular XSLT liaison class connects with the default JAXP/TraX XSLT engine, so the result is identical to that produced by the target
trax1
. -
The target
trax4
is identical totrax3
, except that it usesmtxslt
instead ofxslt
. -
The targets
xalan2
,saxon6
,saxon7
, andoracle9
usemtxslt
to call Xalan 2, Saxon 6, Saxon 7, and Oracle XDK 9 respectively. Once the appropriate properties have been defined,mtxslt
attributes look nearly identical to standardxslt
attributes. Note, however, the addition of aclasspath
attribute, which is required so that Ant loads the correct JAR archive for each XSLT engine.
The target
parameter that is passed to the stylesheet allows the Ant target
name to be embedded in each HTML product file to make identification of the files
easier. It
serves no other purpose.
That's all there is to it. You now not only know how to use Ant to control XSLT, you
also
know how to use mtxslt
to control which XSLT engines are used within an Ant
build. (All of the example files from this article can be downloaded as a ZIP archive.)
Conclusion
Ant is a powerful cross-platform tool for controlling build processes and is ideal
for
controlling multifile builds involving XSLT stylesheets. Using mtxslt
, you can
go further and invoke multiple Java XSLT engines during a single build, which is ideal
for
portability testing.
It may be worth mentioning that this article was written using an extended version of DocBook 4.2 and then converted to XHTML using an XSLT stylesheet -- a process controlled by an Ant build file. As well as building the article, Ant controlled the extraction of the Ant build file code out of the DocBook source and into the example build files, as well as the regression testing of the examples. It really works.
Resources
- Example files
- Articles
- Ant
-
-
Ant;
-
Apache Jakarta project;
-
Ant: The Definitve Guide (O'Reilly, 2002)
- XSLT Engines
- JAXP/TraX
-
-
JAXP/TraX API from JDK 1.4.1.
-