XML Pipelining with Ant
January 28, 2003
Ant is an extensible, open-source build tool
written in Java and sponsored by Apache's Jakarta
project. Ant has developed into something more than a just a build tool, however.
It has
gone beyond its predecessor make
(and make
's kin) to become a
framework for performing an even larger variety of operations in a single step, not
just
compiling code or cleaning up after a build.
Ant's build files are written in XML, and Ant takes advantage of XML in a variety of ways. In my opinion, Ant is a suitable if not ideal framework for XML pipelining -- that is, a framework for performing a variety of XML processing, in the desired order and in one fell swoop. The reason why I say ideal is because Ant is open, somewhat mature, reasonably stable, readily available, widely known and used, easily extensible, and already amenable to XML processing. What else could you ask for?
In this article, I'll discuss the XML structures in an Ant build file, named
build.xml
by default, talk about some common XML-related tasks that Ant can
perform, and then finish up with an example of XML pipelining.
I assume that you already know something of Ant and have probably used it. I plan to review the basics of the tool, but I also suggest that you read Tony Coates recent XML.com article ("Running Multiple XSLT Engines with Ant.") Along with an interesting approach to processing multiple XSLT stylesheets with multiple engines, Tony's article also provides good introductory material on Ant.
To get the examples in this piece to work, you'll of course need a recent version of Java on your system. You'll also need to download and install Ant version 1.5.1 (or later) binaries. Because you'll be using a new task that validates with RELAX NG schemas, you'll also need to download and install James Clark's Jing. All the example files discussed in this article are available for download in a ZIP archive and have been tested on the Windows XP Professional platform running Java 2 v1.4.
You can refer to Ant's HTML manual either online or, after installing Ant
locally, by bringing up docs/manual/index.html
in a browser.
Where Is Ant's DTD?
One of the first things I noticed about Ant was that it didn't have an explicit DTD
available in the archives I downloaded, either the binary or source archive. I wanted to
see Ant's DTD so I could figure out what went into a build file. Then I discovered
the
antstructure
task. This task in essence extracts a DTD from Ant's source
code.
The following snippet is a simple Ant build file that uses the antstructure
task (build-dtd.xml
in the example archive):
<?xml version="1.0"?> <project default="dtd"> <target name="dtd"> <antstructure output="ant.dtd"/> </target> </project>
Here's a quick review of some basics. The document begins with an optional XML declaration.
The root element of an Ant build file is <project>
. It has several
possible attributes, but only one is required: default
. This attribute names
the default target for the project, and in this case the only target, dtd
. A
target represents a way to achieve an expected outcome from an operation, such as
a set of
compiled Java classes or, in the case of antstructure
, a DTD.
The <target>
element is a child of <project>
and must
have a name
attribute. The value of this attribute matches the value of the
default
attribute of <project>
. When there is more than
one target in a build file, the value of default
only matches the value of one
name
attribute in one <target>
. The
<target>
element also has several other attributes such as
depends
(which will come to light in later examples).
The <antstructure>
task element is empty. One of four possible
attributes is output
which gives the name of the output file that will contain
the DTD that the task produces. This output file is written to the current directory
by
default; however, if you add a basedir
attribute to
<project>
, you can specify a different output directory than the
current one as a value of basedir
, such as:
<project default="dtd" basedir="c:/temp">
Now give it a try. The following command presupposes that Ant's bin
directory
is in the path environment variable, that your working directory is
C:\Java\Ant
, and that you have unzipped the example archive there:
C:\Java\Ant>ant -f build-dtd.xml
Ant assumes that the build file is named build.xml
. If it isn't, you need to
use the -f
option (or the synonyms -file
or
-buildfile
), followed by a filename. You should see output from this command
like this:
Buildfile: build-dtd.xml dtd: BUILD SUCCESSFUL Total time: 2 seconds
The output lists the build filename, the target name dtd
, and whether the
build was successful. The target produces the file ant.dtd
in your current
directory. This DTD is straightforward (only three parameter entities), but is quite
long
(nearly 4000 lines). With this DTD available now, you can see for yourself how a build
file
is put together. For any element name in the DTD, you are likely to find a corresponding
entry in the Ant manual.
At first I wondered how Ant validates build files. The answer lies in the source code,
where it is clear that Ant validates build files in its own application-specific,
rather
than in a general-purpose way. (If you want to see how Ant does this, a good place
to start
looking is in the Java source of the class
org.apache.tools.ant.helper.ProjectHelperImpl
.) Ant is in effect
self-validating and avoids the use of namespaces.
Validating an XML Document
Ant has a task for validating XML documents called xmlvalidate
. By default Ant
validates with Xerces version 2.2.0. Consider the small XML document
date.xml
:
<?xml version="1.0" encoding="UTF-8"?> <!DOCTYPE date SYSTEM "date.dtd"> <date>2003-01-31T00:00:01</date>
And its equally small DTD date.dtd
:
<!ELEMENT date (#PCDATA)>
You can validate date.xml
with the build file build-valid.xml
by
using the xmlvalidate
task:
<?xml version="1.0"?> <project default="valid"> <target name="valid"> <xmlvalidate file="date.xml"/> </target> </project>
The attribute file
specifies the document to validate. Issuing the command
C:\Java\Ant>ant -f build-valid.xml
produces the following output, if successful:
Buildfile: build-valid.xml valid: [xmlvalidate] 1 file(s) have been successfully validated. BUILD SUCCESSFUL Total time: 2 seconds
In Ant, types are elements that can help performs tasks, such as on groups of files.
Using the fileset
type as a child of xmlvalidate
, you can validate
a series of XML documents, as shown in build-fileset.xml
:
<?xml version="1.0"?> <project default="valid"> <target name="valid"> <xmlvalidate> <fileset file="date*.xml"/> </xmlvalidate> </target> </project>
The file
attribute of fileset
allows you to specify a series of
files with wildcards. If you run this build file, you will see that Ant validates
six XML
documents in one step (all XML documents in the current directory beginning with the
name
date).
The xmlvalidate
task has several other features worth mentioning:
- An attribute of
lenient="true"
means that the task will only do well-formedness checking. - The
classname
andclasspathref
attributes allow you to specify a different XML parser than the default and where to find it. - The child element
<dtd>
lets you indicate a formal public identifier (publicId
) attribute as well as the local whereabouts (location
attribute) of a DTD.
Validating with Jing
As I mentioned earlier, Ant is extensible. One way that you can extend Ant is by writing
your own task (
instructions on how to do this are found in the Ant manual). James Clark has written a
task for Jing that
allows you to use Ant to validate XML documents against RELAX NG schemas, in both XML
and compact syntaxes. Jing's source code is available for download, but for convenience I have included a copy of
JingTask.java
in the example archive for easy inspection (along with a copy
of Jing's license).
The document date.xml
is valid with regard to the RELAX NG schema
date.rng
:
<?xml version="1.0"?> <element name="date" xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <data type="dateTime"/> </element>
RELAX NG supports externally defined datatype libraries, such as W3C XML Schema datatypes. The XML Schema
datatype dataTime
more precisely defines the valid content of
<date>
than just #PCDATA in a DTD. To validate date.xml
against date.rng
with Ant, use the build file build-jing.xml
:
<?xml version="1.0"?> <project default="rng"> <taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/> <target name="rng"> <echo message="Validating RELAX NG schema with Jing..."/> <jing rngfile="date.rng" file="date.xml"/> </target> </project>
The <taskdef>
element defines the jing
task, and its
classname
attribute identifies the class that executes the task. This class
is stored in jing.jar
, part of the Jing distribution. If you place
jing.jar
in Ant's lib
directory, Ant will be able to find the
Jing task.
The echo
task echoes the text in message
. Jing is silent upon
success, as are other tasks. You can throw in an echo
task to augment what is
normally reported.
The jing
task's rngfile
identifies a RELAX NG schema, and the
file
attribute names the instance of the schema. You can also use a
fileset
type as a child of <jing>
, allowing you to
validate more than one document at a time.
Jing can also validate against schemas in the compact
syntax, RELAX NG's terse, non-XML format. The compact version reduces
date.rng
to one short line in date.rnc
:
element date { xsd:dateTime }
Compact syntax processors automatically declare the XML Schema datatype library with
the
xsd
prefix. The build file build-rnc.xml
validates
date.xml
against date.rnc
(note the addition of the
compactsyntax
attribute):
<?xml version="1.0"?> <project default="rng"> <taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/> <target name="rng"> <echo message="Validating RELAX NG compact syntax schema with Jing..."/> <jing compactsyntax="true" rngfile="date.rnc" file="date.xml"/> </target> </project>
Kawaguchi Kohsuke is currently developing an Ant task for validators that support the Java API for Relax Verifiers (JARV). This task will work with Sun's Multi-schema Validator and other JARV validators.
An XML Pipeline Example
This example places targets discussed earlier together into a single build file and
adds a
few other targets as well. The resulting file, build.xml
, is an example of a
simple XML pipeline. The basic scenario is that a property is set (the current directory)
using a local XML document (properties.xml
) and a remote, zipped file
(date.zip
) is downloaded via the get
task. The file, which
contains a RELAX NG schema (date.rng
), is unzipped and a local document
(date.xml
) is validated against it. Then the same document is validated
against a DTD (date.dtd
) and transformed into an HTML document
(date.html
). Finally, an e-mail is sent, signaling the completion of the
process. Granted, this is a rather uncomplicated example, and more complex operations
are
possible, but this gives you an idea of how you can put your own pipeline together.
Here is the build file:
<?xml version="1.0"?> <project default="mail"> <taskdef name="jing" classname="com.thaiopensource.relaxng.util.JingTask"/> <target name="init"> <echo message="Load XML properties..."/> <xmlproperty file="properties.xml"/> </target> <target name="get" depends="init"> <get src="http://www.wyeast.net/date.zip" dest="date.zip"/> </target> <target name="unzip" depends="get"> <unzip src="date.zip" dest="${build.dir}"/> </target> <target name="rng" depends="unzip"> <echo message="Jing validating..."/> <jing rngfile="date.rng" file="date.xml"/> </target> <target name="val" depends="rng"> <xmlvalidate file="date.xml"> <xmlcatalog> <dtd publicId="-//Wy'east Communications//Date DTD//EN" location="date.dtd"/> </xmlcatalog> </xmlvalidate> </target> <target name="xform" depends="val"> <xslt in="date.xml" out="date.html" style="date.xsl"> <outputproperty name="method" value="xml"/> <outputproperty name="indent" value="yes"/> </xslt> </target> <target name="mail" depends="xform"> <mail mailhost="mail.example.com" subject="Ant build"> <to address="schlomo@example.com"/> <from address="hermes@example.com"/> <message>Complete!</message> </mail> </target> </project>
Before running this example, you should change the values of mailhost
and both
the to and from addresses to something that will work on your own mail server. You
will also
need to install the JAR files from the
JavaMail project in Ant's lib
directory (though MIME mail may still not
work). To run the build, all you have to do is type:
C:\Java\Ant>ant
Because the build file is named build.xml
, Ant automatically picks it up and
runs it. The output will look like this, provided you have a live Internet connection
(for
the get
and mail
targets), and all files from the example archive
are still in place:
Buildfile: build.xml init: [echo] Load XML properties... get: [get] Getting: http://www.wyeast.net/date.zip unzip: [unzip] Expanding: C:\Java\Ant\date.zip into C:\Java\Ant rng: [echo] Jing validating... val: [xmlvalidate] 1 file(s) have been successfully validated. xform: [xslt] Processing C:\Java\Ant\date.xml to C:\Java\Ant\date.html [xslt] Loading stylesheet C:\Java\Ant\date.xsl mail: [mail] Failed to initialise MIME mail [mail] Sending email: Ant build [mail] Sent email with 0 attachments BUILD SUCCESSFUL Total time: 7 seconds
Each of the targets except the one named init
has a depends
attribute. The value of this attribute establishes a hierarchy of dependencies between
the
targets. The default or starting target is mail
(identified in the
<project>
element); in order for it to execute, the xform
target must first execute successfully and in order for xform
to execute,
val
must execute, and so forth. So this dependency is not established
structurally, as through a parent-child relationship, but rather through attribute
values.
You can put the targets in any order in the build file. They will be still execute
according
to the order of the values in the depends
and name
attributes.
These dependencies make up the segments of the pipeline.
The build file has an xslt
target that transforms date.xml
into
date.html
according to the XSLT stylesheet date.xsl
. The
<outputproperty>
children contribute values that would normally be
supplied by the output
element of XSLT. (Tony Coates' article deals with the
xslt
target extensively, so I'll limit my comments here.)
The xmlvalidate
target uses the xmlcatalog
type with a
<dtd>
child to specify a formal public identifier for a DTD and the
location of a local copy of that DTD. This type is based on the XML Catalog specification, an
entity and URI resolution initiative from OASIS.
The get
target gets a URL source, downloading it to a specified location. The
xmlproperty
target reads the file properties.xml
:
<?xml version="1.0"?> <build> <dir>.</dir> </build>
The arbitrary tags in the properties file determine the name or names for the variable
that
you can use elsewhere in the build file to reference values, such as
${build.dir}
. The first part of the variable name comes from the
<build>
tag and the second part from <dir>
. The
content of <dir>
becomes the value of the variable. You can also use attributes to
create property names.
Running XmlLogger
Ant provides logging and event listening facilities. One such logger-listener is defined
in
the class org.apache.tools.ant.XmlLogger
, which produces XML output. The
following command line puts the XML logger to work:
C:\Java\Ant>ant -logger org.apache.tools.ant.XmlLogger -v -l log.xml
The -v
(or -verbose
) option indicates verbose output, all of
which is sent to the log file; the -l
option (or -logfile
)
provides a name for the log file. You can find an XSLT stylesheet for log files in
the
etc
directory called log.xsl
. The following figure shows you how
log.xml
will appear in a browser after it has been transformed by
log.xsl
.
log.xml after being transformed by log.xsl |
Conclusion
I realize that Ant was not intended to be a an XML pipeline tool, but it turns out to be a pretty good one anyway. Other tools exist and may eventually do a better job, such as Sean McGrath's XPipes or Eric van der Vlist's XML Validation Interoperability Framework (XVIF). For now, though, Ant remains an attractive option. Like XML, Ant can do things that perhaps it was not originally intended to do. That's a good sign.