Using XInclude
July 31, 2002
Elliotte Rusty Harold is the coauthor of XML in a Nutshell, 2nd edition
It's often convenient to divide long documents into multiple files. The classic example is a book, which is customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally one has used XML external entity references to support document division. For example, this book has three chapters, each stored in a separate file:
<?xml version="1.0"?> <!DOCTYPE book SYSTEM "book.dtd" [ <!ENTITY chapter1 SYSTEM "malapropisms.xml"> <!ENTITY chapter2 SYSTEM "mispronunciations.xml"> <!ENTITY chapter3 SYSTEM "madeupwords.xml"> ]> <book> <title>The Wit and Wisdom of George W. Bush</title> &chapter1; &chapter2; &chapter3; </book>
However, external entity references have a number of limitations. Among them:
-
The individual component files cannot be used independently of the master document. They are not themselves complete, well-formed XML documents. For instance, they cannot have XML declarations or document type declarations and often do not have a single root element.
-
If any of the pieces are missing, then the entire document is malformed. There's no option for error recovery.
-
An entity reference cannot point to a plain text file such as an example Java program or HTML document. Only well-formed XML can be included.
XInclude is an emerging W3C specification for building large XML documents out of multiple well-formed XML documents, independently of validation. Each piece can be a complete XML document, a fragmentary XML document, or a non-XML text document like a Java program or an e-mail message.
Syntax
XInclude reference external documents to be included with include
elements in
the http://www.w3.org/2001/XInclude
namespace. The prefix xi
is
customary though not required. Each xi:include
element has an href
attribute that contains a URL pointing to the file to include. For example, the previous
book example can be rewritten like this:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <title>The Wit and Wisdom of George W. Bush</title> <xi:include href="malapropisms.xml"/> <xi:include href="mispronunciations.xml"/> <xi:include href="madeupwords.xml"/> </book>
Of course you can also use absolute URLs where appropriate:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <title>The Wit and Wisdom of George W. Bush</title> <xi:include href="http://www.whitehouse.gov/malapropisms.xml"/> <xi:include href="http://www.whitehouse.gov/mispronunciations.xml"/> <xi:include href="http://www.whitehouse.gov/madeupwords.xml"/> </book>
XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts:
<?xml version="1.0"?> <book xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="frontmatter.xml"/> <xi:include href="part1.xml"/> <xi:include href="part2.xml"/> <xi:include href="part3.xml"/> <xi:include href="backmatter.xml"/> </book>
Each part might be further divided into a part intro and several chapters:
<?xml version="1.0"?> <part xmlns:xi="http://www.w3.org/2001/XInclude"> <xi:include href="intro1.xml"/> <xi:include href="ch01.xml"/> <xi:include href="ch02.xml"/> <xi:include href="ch03.xml"/> <xi:include href="ch04.xml"/> </part>
There's no limit to how deep this can go. Only circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden. When an XInclude processor reads an XML document it resolves all references and returns a document that contains no XInclude elements.
Unparsed Text
Technical articles like this one often need to include example code: programs, XML
and HTML
documents, e-mail messages, and so on. Within these examples characters like < and
&
should be treated as raw text rather than parsed as markup. To include a document
as plain
text, you have to add a parse="text"
attribute to the xi:include
element. For example, this fragment loads the source code for the Java program
SpellChecker.java from the examples directory into a code
element:
<code> <xi:include parse="text" href="examples/SpellChecker.java" /> </code>
Processes that are downstream from the XInclusion will see the complete text of the
file
SpellChecker.java like they would any other text. For instance, such data would be
passed to a SAX ContentHandler
object's characters()
method. This
is pretty much exactly the way a parser would treat the content if it were typed in
a CDATA
section.
Fallback
Servers crash. Network connections fail. The domain name system gets congested. For
these
reasons and many others, documents included from remote servers may be temporarily
unavailable. The default action for an XInclude processor in such a case is simply
to give
up and report a fatal error. However, the xi:include
element may contain an
xi:fallback
element which contains alternate content to be used if the
requested resource cannot be found. For example, this xi:include
element tries
to load the file at http://www.whitehouse.gov/malapropisms.xml. However, if
somebody deletes that file, then it provides some literal content instead:
<xi:include href="http://www.whitehouse.gov/malapropisms.xml"> <xi:fallback> <para> This administration is doing everything we can to end the stalemate in an efficient way. We're making the right decisions to bring the solution to an end. </para> </xi:fallback> </xi:include>
The xi:fallback
element can even include another xi:include
element. For example, this xi:include
element begins by attempting to include
the document at http://www.whitehouse.gov/malapropisms.xml. However, if somebody
deletes that file, then it will try
http://politics.slate.msn.com/default.aspx?id=76886 instead.
<xi:include href="http://www.whitehouse.gov/malapropisms.xml"> <xi:fallback> <xi:include href="http://politics.slate.msn.com/default.aspx?id=76886l" /> </xi:fallback> </xi:include>
The xi:fallback
element is not used if the document can be located but is
malformed. That is always a fatal error.
Processing
XInclusion is not part of XML 1.0 or the XML Infoset. XML parsers do not perform inclusions
automatically. To resolve XIncludes, a document must be passed through an XInclude
processor
that replaces the xi:include
elements with the documents they point to. This
may be done automatically by a server side process or it might be done on the client
side by
an XInclude-aware browser. It may be hooked into a custom SAX program using a SAX
filter
that resolves the XIncludes. However, if you want this to happen, you need to ask
for it and
install the necessary software to make it possible.
One of the most common questions about XInclude is how inclusion interacts with validation, XSL transformation, and other processes that may be applied to an XML document. The short answer is that it doesn't. XInclusion is not part of any other XML process. It is a separate step which you may or may not perform when and where it is useful to you.
For example, consider validation against a schema. A document can be validated before
inclusion, after inclusion, or both. If you validate the document before the
xi:include
elements are replaced, then the schema has to declare the
xi:include
elements just like it would declare any other element. If you
validate the document after the xi:include
elements are replaced, then the
schema has to declare the replacement elements. Inclusion and validation are separate,
orthogonal processes that can be performed in any order which is convenient in the
local
environment.
Software Support
Current support for XInclude is limited, though that is slowly changing. In particular,
-
Libxml, the XML C library for Gnome, <http://xmlsoft.org/> includes fairly complete support for XInclude.
-
The Apache Cocoon application server <http://xml.apache.org/cocoon/index.html> can resolve XIncludes in a document before sending it to a client. Processing instructions in the document's prolog control the exact operations performed and the order they're applied in.
-
The 4Suite XML library for Python <http://4suite.org/> has an option to resolve XIncludes when parsing.
-
GNU JAXP <http://www.gnu.org/software/classpathx/jaxp/> includes a SAX filter that resolves XIncludes, provided no XPointers are used.