Using XInclude

July 31, 2002

Elliotte Rusty Harold is the coauthor of XML in a Nutshell, 2nd edition

It's often convenient to divide long documents into multiple files. The classic example is a book, which is customarily divided in chapters. Each chapter may be further subdivided into sections. Traditionally one has used XML external entity references to support document division. For example, this book has three chapters, each stored in a separate file:


<?xml version="1.0"?>

<!DOCTYPE book SYSTEM "book.dtd" [

  <!ENTITY chapter1 SYSTEM "malapropisms.xml">

  <!ENTITY chapter2 SYSTEM "mispronunciations.xml">

  <!ENTITY chapter3 SYSTEM "madeupwords.xml">

]>

<book>

  <title>The Wit and Wisdom of George W. Bush</title>

  &chapter1;

  &chapter2;

  &chapter3;

</book>

However, external entity references have a number of limitations. Among them:

The individual component files cannot be used independently of the master document. They are not themselves complete, well-formed XML documents. For instance, they cannot have XML declarations or document type declarations and often do not have a single root element.
If any of the pieces are missing, then the entire document is malformed. There's no option for error recovery.
An entity reference cannot point to a plain text file such as an example Java program or HTML document. Only well-formed XML can be included.

XInclude is an emerging W3C specification for building large XML documents out of multiple well-formed XML documents, independently of validation. Each piece can be a complete XML document, a fragmentary XML document, or a non-XML text document like a Java program or an e-mail message.

Syntax

XInclude reference external documents to be included with include elements in the http://www.w3.org/2001/XInclude namespace. The prefix xi is customary though not required. Each xi:include element has an href attribute that contains a URL pointing to the file to include. For example, the previous book example can be rewritten like this:


<?xml version="1.0"?>

<book xmlns:xi="http://www.w3.org/2001/XInclude">

  <title>The Wit and Wisdom of George W. Bush</title>

  <xi:include href="malapropisms.xml"/>

  <xi:include href="mispronunciations.xml"/>

  <xi:include href="madeupwords.xml"/>

</book>

Of course you can also use absolute URLs where appropriate:


<?xml version="1.0"?>

<book xmlns:xi="http://www.w3.org/2001/XInclude">

  <title>The Wit and Wisdom of George W. Bush</title>

  <xi:include href="http://www.whitehouse.gov/malapropisms.xml"/>

  <xi:include href="http://www.whitehouse.gov/mispronunciations.xml"/>

  <xi:include href="http://www.whitehouse.gov/madeupwords.xml"/>

</book>

XInclude processing is recursive. That is, an included document can itself include another document. For example, a book might be divided into front matter, back matter, and several parts:


<?xml version="1.0"?>

<book xmlns:xi="http://www.w3.org/2001/XInclude">

  <xi:include href="frontmatter.xml"/>

  <xi:include href="part1.xml"/>

  <xi:include href="part2.xml"/>

  <xi:include href="part3.xml"/>

  <xi:include href="backmatter.xml"/>

</book>

Each part might be further divided into a part intro and several chapters:


<?xml version="1.0"?>

<part xmlns:xi="http://www.w3.org/2001/XInclude">

  <xi:include href="intro1.xml"/>

  <xi:include href="ch01.xml"/>

  <xi:include href="ch02.xml"/>

  <xi:include href="ch03.xml"/>

  <xi:include href="ch04.xml"/>

</part>

There's no limit to how deep this can go. Only circular inclusion (Document A includes Document B which includes, directly or indirectly, Document A) is forbidden. When an XInclude processor reads an XML document it resolves all references and returns a document that contains no XInclude elements.

Unparsed Text

Technical articles like this one often need to include example code: programs, XML and HTML documents, e-mail messages, and so on. Within these examples characters like < and & should be treated as raw text rather than parsed as markup. To include a document as plain text, you have to add a parse="text" attribute to the xi:include element. For example, this fragment loads the source code for the Java program SpellChecker.java from the examples directory into a code element:


<code> 

  <xi:include parse="text" 

              href="examples/SpellChecker.java" />

</code>

Processes that are downstream from the XInclusion will see the complete text of the file SpellChecker.java like they would any other text. For instance, such data would be passed to a SAX ContentHandler object's characters() method. This is pretty much exactly the way a parser would treat the content if it were typed in a CDATA section.

Fallback

Servers crash. Network connections fail. The domain name system gets congested. For these reasons and many others, documents included from remote servers may be temporarily unavailable. The default action for an XInclude processor in such a case is simply to give up and report a fatal error. However, the xi:include element may contain an xi:fallback element which contains alternate content to be used if the requested resource cannot be found. For example, this xi:include element tries to load the file at http://www.whitehouse.gov/malapropisms.xml. However, if somebody deletes that file, then it provides some literal content instead:


<xi:include href="http://www.whitehouse.gov/malapropisms.xml">

  <xi:fallback>

    <para>

      This administration is doing everything we can to end the stalemate in

      an efficient way. We're making the right decisions to bring the solution

      to an end.

    </para>

  </xi:fallback>

</xi:include>

The xi:fallback element can even include another xi:include element. For example, this xi:include element begins by attempting to include the document at http://www.whitehouse.gov/malapropisms.xml. However, if somebody deletes that file, then it will try http://politics.slate.msn.com/default.aspx?id=76886 instead.


<xi:include href="http://www.whitehouse.gov/malapropisms.xml">

  <xi:fallback>

    <xi:include 

      href="http://politics.slate.msn.com/default.aspx?id=76886l" />

  </xi:fallback>

</xi:include>

The xi:fallback element is not used if the document can be located but is malformed. That is always a fatal error.

Processing

XInclusion is not part of XML 1.0 or the XML Infoset. XML parsers do not perform inclusions automatically. To resolve XIncludes, a document must be passed through an XInclude processor that replaces the xi:include elements with the documents they point to. This may be done automatically by a server side process or it might be done on the client side by an XInclude-aware browser. It may be hooked into a custom SAX program using a SAX filter that resolves the XIncludes. However, if you want this to happen, you need to ask for it and install the necessary software to make it possible.

One of the most common questions about XInclude is how inclusion interacts with validation, XSL transformation, and other processes that may be applied to an XML document. The short answer is that it doesn't. XInclusion is not part of any other XML process. It is a separate step which you may or may not perform when and where it is useful to you.

For example, consider validation against a schema. A document can be validated before inclusion, after inclusion, or both. If you validate the document before the xi:include elements are replaced, then the schema has to declare the xi:include elements just like it would declare any other element. If you validate the document after the xi:include elements are replaced, then the schema has to declare the replacement elements. Inclusion and validation are separate, orthogonal processes that can be performed in any order which is convenient in the local environment.

Software Support

Current support for XInclude is limited, though that is slowly changing. In particular,

Libxml, the XML C library for Gnome, <http://xmlsoft.org/> includes fairly complete support for XInclude.
The Apache Cocoon application server <http://xml.apache.org/cocoon/index.html> can resolve XIncludes in a document before sending it to a client. Processing instructions in the document's prolog control the exact operations performed and the order they're applied in.
The 4Suite XML library for Python <http://4suite.org/> has an option to resolve XIncludes when parsing.
GNU JAXP <http://www.gnu.org/software/classpathx/jaxp/> includes a SAX filter that resolves XIncludes, provided no XPointers are used.