Printing XML: Why CSS Is Better than XSL
January 19, 2005
Håkon Wium Lie and Michael Day
Longtime readers of XML.com will remember the battles between XSL and CSS that took place in these columns in 1999 and that were memorialized in XSL and CSS: One Year Later. Since then, the two languages have coexisted in relative peace: CSS is now used to style most web sites, XSLT (the transformation part of XSL) is used by many server-side, and XSL-FO (the formatting part of XSL) has found a niche in the printing industry.
A recent entry in the blog of a web luminary may signal the start of a second round of hostilities. Norman Walsh, a member of the W3C's Technical Architecture Group and co-author of the W3C's Web Architecture document (WebArch), recently blogged:
... web browsers suck at printing. ... And CSS is never going to fix it. Did you hear me? CSS is never going to fix it.
It's unclear if this statement is a prediction or a threat. Or just blogging on a bad day. Anyway, the pronounciation of CSS' printing ineptness gives us a splendid opportunity to explain why CSS is a better language than XSL for most printing needs. As we have just used CSS to style a 400-page book which will be published later this year (Cascading Stylesheets, designing for the web by Håkon Lie and Bert Bos, 3rd ed, forthcoming from Addison-Wesley, this year), this is not purely an academic excercise in stylesheet linguistics. So, would-be authors should continue reading.
The Problem
Both camps agree that a printed document is, in many ways, more difficult to format than on-screen presentation. A printed document must be split into numbered pages, with added headers and footers. Page margins must be specified, and they may be different on left and right pages. References that appear as hyperlinks on-screen often include page numbers on paper.
The disagreement starts with how best to express all this. Walsh's solution is to write a 1000-line XSL transformation that generates XSL-FO, which is subsequently turned into PDF. We will argue that it's much easier for most authors to express styling in CSS; in the case of the WebArch document, one can reuse the existing CSS stylesheets (200 lines or so) and add some print-specific lines. And, although browsers tend to focus on dynamic screens rather than on printing, products like Prince happily combine CSS with XML and produce beautiful PDF documents.
(Some disclosure at this point is appropriate. We, the authors, have been actively involved in shaping CSS and are now working hard to build software--Opera and Prince--that supports CSS.)
The Flavors
Before going into the print-specific features, let's compare the basic flavors of XSL and CSS. Consider this fragment from Walsh's XSL transform:
<xsl:template match="html:p[@class='copyright' and ancestor::html:div[@class='head']]" priority="100"> <fo:block space-before="8pt" space-after="8pt" font-size="75%"> <xsl:apply-templates/> </fo:block> </xsl:template>
The purpose of this code is to select certain elements (specified in the match
attribute) and to set certain formatting properties on these elements (e.g., font-size).
Using CSS, this can be written:
div.head p.copyright { margin-top: 8pt; margin-bottom: 8pt; font-size: 75% }
Compare the two fragments. Which do you find more readable? Which language would be easier to learn?
Explaining this XSL snippet to a non-programmer would also be awkward:
<xsl:template match="html:ol/html:li"> <fo:list-item> <xsl:if test="not(preceding-sibling::html:li)"> <xsl:attribute name="keep-with-next">always</xsl:attribute> </xsl:if>
The CSS equivalent, however, is more intuitive:
ol li:first-of-type { page-break-after: avoid }
Printing with CSS
As we all know, simple tools cannot always perform advanced tasks. Even if CSS were able to simplify some fragments, it wouldn't do much good if the language had inherent limitations that made it impossible to describe advanced features. The question becomes, then, whether there are any inherent limitations in CSS that could make it unfit for producing printed documents.
The answer is no. CSS2, which became a W3C Recommendation in 1998, introduced the concept of pages in CSS. By using it, one can set page breaks (even Internet Explorer supports this) and page margins. More recently, a W3C Candidate Recommendation (called CSS3 Paged Media Module) added functionality to describe headers, footers, and more. Let's start with a simple example:
@page { size: A4 portrait; }
This simple statement tells the formatter that the resulting PDF document should be
of size
A4
(which is common outside North America), and that the orientation should be
portrait. To change the size of the generated PDF document, one simply changes A4
into another size. Peeking inside the XSL sheet again, we find two 40-line switch
statements
to enable similar functionality. One of the statements is reprinted in full below
for
entertainment purposes:
<xsl:param name="page.height.portrait"> <xsl:choose> <xsl:when test="$paper.type = 'A4landscape'">210mm</xsl:when> <xsl:when test="$paper.type = 'USletter'">11in</xsl:when> <xsl:when test="$paper.type = 'USlandscape'">8.5in</xsl:when> <xsl:when test="$paper.type = '4A0'">2378mm</xsl:when> <xsl:when test="$paper.type = '2A0'">1682mm</xsl:when> <xsl:when test="$paper.type = 'A0'">1189mm</xsl:when> <xsl:when test="$paper.type = 'A1'">841mm</xsl:when> <xsl:when test="$paper.type = 'A2'">594mm</xsl:when> <xsl:when test="$paper.type = 'A3'">420mm</xsl:when> <xsl:when test="$paper.type = 'A4'">297mm</xsl:when> <xsl:when test="$paper.type = 'A5'">210mm</xsl:when> <xsl:when test="$paper.type = 'A6'">148mm</xsl:when> <xsl:when test="$paper.type = 'A7'">105mm</xsl:when> <xsl:when test="$paper.type = 'A8'">74mm</xsl:when> <xsl:when test="$paper.type = 'A9'">52mm</xsl:when> <xsl:when test="$paper.type = 'A10'">37mm</xsl:when> <xsl:when test="$paper.type = 'B0'">1414mm</xsl:when> <xsl:when test="$paper.type = 'B1'">1000mm</xsl:when> <xsl:when test="$paper.type = 'B2'">707mm</xsl:when> <xsl:when test="$paper.type = 'B3'">500mm</xsl:when> <xsl:when test="$paper.type = 'B4'">353mm</xsl:when> <xsl:when test="$paper.type = 'B5'">250mm</xsl:when> <xsl:when test="$paper.type = 'B6'">176mm</xsl:when> <xsl:when test="$paper.type = 'B7'">125mm</xsl:when> <xsl:when test="$paper.type = 'B8'">88mm</xsl:when> <xsl:when test="$paper.type = 'B9'">62mm</xsl:when> <xsl:when test="$paper.type = 'B10'">44mm</xsl:when> <xsl:when test="$paper.type = 'C0'">1297mm</xsl:when> <xsl:when test="$paper.type = 'C1'">917mm</xsl:when> <xsl:when test="$paper.type = 'C2'">648mm</xsl:when> <xsl:when test="$paper.type = 'C3'">458mm</xsl:when> <xsl:when test="$paper.type = 'C4'">324mm</xsl:when> <xsl:when test="$paper.type = 'C5'">229mm</xsl:when> <xsl:when test="$paper.type = 'C6'">162mm</xsl:when> <xsl:when test="$paper.type = 'C7'">114mm</xsl:when> <xsl:when test="$paper.type = 'C8'">81mm</xsl:when> <xsl:when test="$paper.type = 'C9'">57mm</xsl:when> <xsl:when test="$paper.type = 'C10'">40mm</xsl:when> <xsl:otherwise>11in</xsl:otherwise> </xsl:choose> </xsl:param>
As the alert reader will already have inferred, the statement lists the heights of many different paper sizes. As such, it is interesting reading. However, we do not understand why this list belongs in a stylesheet. CSS provides a simple and elegant alternative by naming the different sizes in the specification rather than in each stylesheet.
Another example that shows the elegant simplicity of CSS is that of page numbering.
Page
numbers are commonly printed on the outside
of a page so that they are easily visible
when flipping through a book. So, on a right page the page number should be on the
right
side, and on a left page it should be on the left side. On the first page, there should
be
no page number. In CSS, you can express this with:
@page :left { @bottom-left { content: counter(page); } } @page :right { @bottom-right { content: counter(page); } } @page :first { @bottom-right { content: normal; } }
The statements, while not pure English prose, are easily understandable for anyone who has read this far, and it would be a simple exercise for the reader to move the page number from the bottom of each page to the top.
Because of size constraints, we're not going to show you how page numbers are expressed in XSL. We challenge you to find it and then try explaining it to the first person you meet.
Reuse and Cascading
One reason why the web took off in the early 90's was the manner in which HTML is authored. By looking at the source code of other documents, web authors could easily get started in web publishing. In a sense, HTML is the most successful open source movement. CSS also encourages reuse of code and has formalized how it works through the cascading rules. For authors, this means they can take an existing stylesheet and add to it their own rules instead of writing a new one themselves.
One case in point is how to express page breaks for printed documents. Typically, you want to avoid page breaks after headings, and this can be expressed by adding a simple rule:
h1, h2, h3, h4, h5, h6 { page-break-after: avoid; }
Here, the first line lists elements to which the second line applies. As a result,
the
formatter will avoid page breaks after these elements. XSL has no concept of cascading
and
cannot easily express the above example. Instead of grouping elements, one has to
add a rule
to each element's template. Here is what the template for h1
elements looks
like:
<xsl:template match="html:h1"> <fo:block space-before="0.25in" color="#00599C" font-size="16pt" font-family="{$title.font.family}" keep-with-next="always" id="{generate-id()}">
(XSL has chosen another name for the property, i.e., keep-with-next
instead of
page-break-after
.)
Likewise, it is easy in CSS to remove text decorations (e.g. underlining) on all elements:
* { text-decoration: none }
Table of Contents
Many documents start with a table of contents (TOC). On-screen, the TOC is clickable and takes the user to the requested section. Paper, being more static in nature, needs references that can be followed manually. A TOC on paper, therefore, lists the number of the page where the section can be found.
Expressing this in CSS results in a slightly more complex rule than the examples you have seen so far. Consider this:
ul.toc a:after { content: target-counter(attr(href), page); }
In English, the rule would read as follows: inside ul
elements of class
toc
, all a
elements should be trailed (:after
) by
some generated content. The generated content is the page number where the target
of the
link is found. The link is expressed in the href
attribute of the
a
element.
One reason for the added complexity is that CSS, contrary to a common misconception,
has
been designed to work with generic XML as well as HTML. In HTML, links are expressed
in
href
attributes on a
elements. In generic XML, however, links
can be anywhere, and their position must be specified.
Another common feature of TOCs on paper is a dotted line between section titles and the respective page numbers. This is called a leader in typesetting terminology and can be expressed in CSS as follows:
ul.toc a:after { content: leader('.') target-counter(attr(href), page); }
Compared with this three-line CSS solution, expressing TOCs in the WebArch XSL stylesheet takes more than 50 lines. In fairness, the XSL code also expresses other properties for TOCs (for example, that page breaks should be avoided). The CSS syntax in the above examples is still at the draft stage.
By combining the print- specific CSS stylesheet described above with the WebArch document, a nicely formatted PDF document can be created.
Multi-Column Layouts
On paper, content is often laid out in multiple columns. Stylesheets must be able to express this. Using CSS, one can easily create multi-column layouts:
body { column-count: 2; column-gap: 8mm; }
The content of the body
element will now be poured into two columns, between
which there is an 8mm
gap. Multi-column layouts are also available in XSL, but
the obligatory verbosity/complexity warnings apply.
Conclusions
So can CSS do everything better than XSL? Not quite. XSL is a Turing-complete language which, in principle, can be used for all programming tasks and is particularly suited for document transformations. Styling documents is only one of many things XSL can do. CSS, on the other hand, has been developed with only one task in mind: styling documents.
On the web, CSS is the style sheet language of choice. However, the usefulness of CSS is not limited to screens. If you want to transfer web content--be it XML or HTML--onto paper, there are good reasons to use CSS. The language is radically simpler than that of XSL, and it is suitable both on-screen and on paper. This means that you probably don't have to write a stylesheet at all but can reuse an existing one.
Finally, by using CSS you can preserve the semantics of your content all the way to the printer. That, however, is a different discussion.