Printing from XML: An Introduction to XSL-FO
October 9, 2002
Dave Pawson is the author of XSL-FO: Making XML Look Good in Print
One of the issues many users face when introduced to the production of print from XML is that of page layout. Without having the page layout right, its unlikely that much progress will be made. By way of introducing the W3C XSL Formatting Objects recommendation, I want to present a simplified approach that will enable a new user to gain a foothold with page layout.
The aim of this article is to produce that first page of output -- call it the "Hello World" program -- with enough information to allow a user to move on to more useful things. I'll introduce the most straightforward of page layouts for XSL-FO, using as few of the elements needed as I can to obtain reasonable output.
One of the problems is that, unlike the production of an HTML document from an XML
source
using XSLT, the processing of the children of the root elements is not a simple
xsl:apply-templates
from within a root element. Much more initial output is
required in order to enable the formatter to generate the pages.
Let's look at the processing necessary to get from your XML document to a PDF printable document. First, the XML must be fed to an XSLT processor with an appropriate stylesheet (developed below) in order to produce another XML document which uses the XSL-FO namespace and is intended for an XSL-FO formatter. The second stage is to feed the output of the first stage to the XSL-FO formatter, which can then produce the end product: a printable document, styled for visual presentation.
XML -> XSLT XSL-FO -> XSL-FO printable document engine document formatter document ^ | XSLT stylesheet
This approach has the advantage that the XML source document is still format neutral and may be used with other XSLT stylesheets to produce other media.
The XSL-FO Document
We need to be aware of the initial target of the XSLT transformation, the XSL-FO document. The document you are producing, which is fed to the XSL-FO formatter, contains a small number of elements:
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> [1] <fo:simple-page-master master-name="simple" > [2] <fo:region-body/> </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simple"> [3] <fo:flow flow-name="xsl-region-body"> [4] content [5] </fo:flow> </fo:page-sequence> </fo:root>
Let's look at each of the identified elements in turn.
[1] In order to layout content on a page, the formatter needs to know what sizes it
has to deal with. The layout-master-set
contains the [2]
simple-page-master
which contains this information, e.g. whether you use a
European A4 page size or an American US-letter size. It also contains the
region-body
element, which may be seen as the main body of the page layout.
[3] In order to support complex pagination, the page-sequence
element
is used. For a simple page layout, very little content is required here, other than
to refer
back to a particular page definition (the simple-page-master
).
Also within the page-sequence
element is a flow
element
[4]. The idea of a flow may or may not be familiar to you. I came across it using
desktop publishing packages, where I poured text into page areas to build up columns
for a
college magazine, hence the content flowed into page areas.
Identifying which region of the page to pour the text into is the rationale for the
xsl-region-body
. This differentiates the body of the page from the outer
areas (margins, header, footer etc.) of the page. Finally, some content [5], which is
a child of the main flow. Simple text cannot be inserted here, since the formatter
would
have to guess what you wanted to do with it, so the real content for the flow would
take the
form of <fo:block>content</fo:block>
which defines a block of text
(rectangular in shape, big as you like, taking a full list of defaults for everything)
which
will be placed as the first item on the page.
In order to get a better grasp of all this, let's fill out, minimally, how it might fit into a stylesheet whose task is to take a simple XML document and produce another XML document, which is then fed to an XSL-FO formatter.
A basic XSLT stylesheet to produce XSL-FO is shown below.
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" [1] xmlns:fo="http://www.w3.org/1999/XSL/Format"> [2] version="1.0"> <xsl:output method="xml"/> [3] <xsl:template match="/"> .... [4] </xsl:template> Other templates go here. [5] </xsl:stylesheet>
In [1] and [2] we see the namespaces, respectively, of the XSLT and FO content in this document, which differentiates transformation requests from output content.
If the XSLT engine sees content in the FO namespace, it simply writes it to the output, which is exactly what we want. [3] says that we want the output document to be valid XML, which is just what an XSL-FO document is, an XML document. [4] is the root template, which fires first, hence this is the point at which we add the essential outline content mentioned above.
Finally, at [5], we can start to add useful processing. We can now combine the two snippets above to do something useful. What we have below is a complete XSLT stylesheet, which is used by the XSLT engine to produce a valid XSL-FO document.
<?xml version="1.0" encoding="utf-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:fo="http://www.w3.org/1999/XSL/Format"> version="1.0"> <xsl:output method="xml"/> <xsl:template match="/"> <fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format"> <fo:layout-master-set> <fo:simple-page-master master-name="simple" page-height ="29.7cm" [1] page-width ="21cm" margin-left ="2.5cm" margin-right ="2.5cm"> <fo:region-body margin-top="3cm"/> [2] </fo:simple-page-master> </fo:layout-master-set> <fo:page-sequence master-reference="simple"> [3] <fo:flow flow-name="xsl-region-body"> [4] <xsl:apply-templates/> [5] </fo:flow> </fo:page-sequence> </fo:root> </xsl:template> <xsl:template match="document"> [6] <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="head"> [7] <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="para"> [8] <fo:block> <xsl:apply-templates/> </fo:block> </xsl:template> <xsl:template match="em"> [9] <fo:inline font-style="italic"> <xsl:apply-templates/> </fo:inline> </xsl:template> <xsl:template match="*"> [10] <fo:block background-color="red"> <xsl:apply-templates/> </fo:block> </xsl:template> </xsl:stylesheet>
The Source Document
Before explaining the structure, the source document for which we are designing this
stylesheet should be mentioned. I'm assuming a feed from a document class which has
4
elements, with the structure as shown below. I've kept it simple because it represents
the
vast majority of XML content meant for an XSL-FO document. It contains only two block
items
(head
and para
) and a single inline item (em
).
Our document is contained in an outer document
element, and a mix of
head
and para
elements which contain some emphasis:
<document> <head>My very first xsl-fo document</head> <para>has an <em>important</em> paragraph inside it</para> </document>
A page size is specified at [1], using European sizes. Change these to your local paper size if it's different. I've added margins since content which extends to the edges of the page is unsightly.
At [2] I've added a top margin to the main region of the page. [3] and
[4] are as before. At [5] we have a crucial difference: at this point, where
previously I simply said "content", I now use the facilities of XSLT to instruct the
XSLT
engine to process the input document. At [6] the XSLT engine processes the
document
element of the input XML file by outputting an fo:block
element, inside which all remaining content is placed. Since blocks can be nested
quite
happily in XSL-FO this isn't a problem. What it does do is ensure that any content
which
leaks -- that is, isn't handled explicitly by the stylesheet -- is still in a block.
At [7], [8], and [9] I'm back in the normal world of XML and XSLT. Matching a source document element and outputting an appropriate element from the XSL-FO vocabulary. The first two are identical and just need decorating, the latter is slightly different in that it is an inline formatting object and produces italic output.
[10] is a catch-all to show (in the output) which elements, if any, are not styled. Once styling is applied to all elements nothing will be processed by this template. It's good as a debugging option during development.
This stylesheet introduces two new elements. The first is the fo:block
element, used for many elements in the stylesheet. This is the basic layout element
which is
used to wrap content; think of it as a p
element in HTML.
The fo:inline
element is a container for inline elements in XSL-FO. Each of
these two elements has a whole range of properties, expressed syntactically as attributes,
which are used to decorate the content that they wrap.
Starting New Pages
Let's extend the source document structure to include a section which should have a new page start point. So now the document might look like this:
<document> <section> <head>My very first xsl-fo document</head> <para>has an <em>important</em> paragraph inside it</para> </section> <section> <head>The second section, starting on a new page </head> <para>Some content in the second section</para> </section> </document>
Now I need to style this addition, using one of the available properties of a block.
<xsl:template match="section"> <fo:block break-before="page"> <xsl:apply-templates/> </fo:block> </xsl:template>
This tells the XSL-FO formatting engine to create a new page when it hits a section. All the content of that section is processed within that block. To make the head element stand out, I'll also improve the appearance by choosing a larger, bold font size and by adding a little space after the content.
<xsl:template match="head"> <fo:block font-size="14pt" font-weight="bold" space-after="1cm" space-after.conditionality = 'retain' > <xsl:apply-templates/> </fo:block> </xsl:template>
That's it. To review: processing is a two stage process at its simplest. Give your source document and the above XSLT stylesheet to an XSLT processor, and the output should be a valid XSL-FO document. This can then be fed to an XSL-FO engine -- RenderX or Antenna House (both commercial, with trial options) or to PassiveTeX or FOP (non-commercial offerings).
You can download the files developed in this article here: xsl-fo-assets.zip.
Related Reading
|