On Display: XML Web Pages with Mozilla
March 29, 2000
Direct display of XML in a web browser is finally becoming a reality. This article is the first of a series in which we will examine XML support in the Mozilla, Opera, and Internet Explorer browsers. -- E.D.
Although Cascading Style Sheets Level 2 provides a solid set of tools for presenting XML documents in web browsers, web developers have been waiting a very long time for an implementation that lets them really use their CSS skills with XML. Internet Explorer 5.0 took some credible first steps toward XML+CSS (see Tim Bray's review of Windows IE5 for details), but the latest work from Mozilla goes beyond first steps to a usable set of tools. The solid XML+CSS core and the underlying DOM support suggests that Mozilla will be a useful platform for building applications, not just web pages. Add to that a dash of XLink support, and it looks like Mozilla may be leading the pack.
Mozilla's emphasis on standards-orientation makes its implementation of XML a real
pleasure
to work with. Developers used to working with CSS in an HTML context have a bit of
extra
learning to do, as a CSS property called display
is critical to presenting XML
documents. This property doesn't generally receive much use in HTML. Fortunately,
finding
information on display
isn't difficult. For the most part, I've relied on the
W3C specs as documentation for writing this article, a real change from my usual practice
of
combing through vendor documentation and creating test cases to see if they're accurate.
(In
particular, I used the CSS2 Recommendation.)
There are, of course, a few bugs yet to iron out -- this isn't even beta software yet -- though it should be quite soon. We'll start by exploring the XML+CSS support, building some test pages that will show off what's possible, and then connecting them together with some basic links. By the end of this article, we'll have a very capable set of tools for building simple web sites, and a solid foundation for building web applications.
Working with XML in Mozilla
The infrastructure Mozilla provides for handling XML is pretty simple. The XML support in Mozilla is built on James Clark's non-validating expat parser. The output from expat is then fed into a DOM-tree builder and document structures can be styled with CSS just like HTML documents. From a casual user's perspective, well-formed XML documents with style sheets look just like HTML documents, and work like them as well. Users can navigate and print XML documents just like they do HTML documents.
Mozilla's XML support lives up to the XML specification, but does have a few quirks developers should know about. Mozilla ignores external parsed entities and entities declared outside of the actual document. If you want to use entities in Mozilla, you'll need to declare them in the internal subset. Entity references don't appear on screen as part of the content -- they just disappear. This behavior is perfectly legal, though it may seem inconvenient. Mozilla does provide a bin/dtd folder where you can put additional entities. This is used to provide support for things like MathML, but isn't a readily available option for most developers.
Mozilla also supports namespaces -- the HTML namespace in particular can be very useful at times. Namespace support will also be critical for XLink when support arrives for more recent XLink drafts, giving Mozilla a way to find XLink's "global attributes" for proper link processing. Mozilla's XUL tools for building user interfaces with XML also rely on namespaces in a similar fashion. A W3C working draft describing relationships between namespaces and CSS is also implemented in Mozilla.
Getting Started with XML and CSS
To get started with XML and Cascading Style Sheets, we'll build a "sandbox" example, a fairly meaningless document. The document shows off Mozilla's tools for formatting arbitrary XML content with CSS, though it more or less replicates HTML's capabilities in a different vocabulary. "Laboratory documents" are very handy for figuring out what works and what doesn't, without the pressure of a particular format. However, they are a luxury that typically only writers and trainers get to indulge in!
We'll start out with a block
element containing an inline
element, along with two different kinds of lists and a small table:
<?xml version="1.0" ?>
<test>
<block>This is a block element that contains
<inline>inline</inline> elements.</block>
<bulletList>
<listItem>This is a list item, contained inside of a
bulleted list.</listItem>
<listItem>This is a second list item, also contained
inside of a bulleted list.</listItem>
</bulletList>
<numberList>
<listItem>This is a list item, contained inside
of a numbered list.</listItem>
<listItem>This is a second list item, also contained
inside of a numbered list.</listItem>
</numberList>
<hidden>This element shouldn't appear at all.</hidden>
<myTable>
<tableRow>
<tableData>Howdy!</tableData>
<tableData>Adios!</tableData>
</tableRow>
<tableRow>
<tableData>Howdy!</tableData>
<tableData>Adios!</tableData>
</tableRow>
</myTable>
</test>
It's probably reasonably obvious to a human how this information should be presented, but the browser will be lost until we give it more explicit directions: the style sheet.
Connecting Style Sheets to XML Documents
XML documents don't have the link
or style
elements that are used
in HTML to connect style information to particular documents. Instead, the W3C has defined a processing instruction
that provides that information, based on the model of the HTML link
element. To
connect a CSS style sheet to your XML document so that the browser can find it, use
a
processing instruction like
<?xml-stylesheet type="text/css" href="URI"?>
where URI
is the address of the style sheet. We'll use a style sheet called display1.css for our first test
document. The processing instruction can go right after the XML declaration.
<?xml version="1.0" ?>
<?xml-stylesheet type="text/css" href="display1.css"?>
<test>
...
You can connect to multiple style sheets when needed -- the style sheet processing
instruction will work with the rules built into CSS describing the cascade. XML documents
cannot use in-line styling. Unlike HTML elements, the style
attribute has no
particular meaning in XML. If you need to style the content of a particular element
differently, your best bet is probably to assign it an ID attribute and then reference
that
ID value in the style sheet.
Next step: building a backbone for the document's presentation using CSS2's
display
property.
The Document Backbone: CSS2 display
CSS1 was designed to work alongside the HTML vocabulary, which came with its own formatting semantics. Paragraphs were separated from other text by line breaks, while bold text would just flow with the surrounding text. An intricate set of rules could be used to construct all kinds of complex table structures. CSS2, designed at least partly for work outside of HTML, gives developers a chance to identify those structures for their own vocabularies. This means we can take the sample document above and create a foundation set of rules for it, on which we can then layer the rest of our formatting through CSS.
The display
property defines how element content fits within a flow of text,
based on a set of rules for creating (or not creating) boxes on the page. Effectively,
it
lets you map your elements to a set of types based on structures used by HTML, though
some
of the types are new. The structures you describe with the display
property
provide the base on which you can layer all your other formatting.
The two most commonly used values for the display
property are block
and inline. Setting the display property's value to block specifies that the
element should be treated as its own block of text, not flowed together with the preceding
and following content. The value inline means the opposite -- no block is created,
and the textual content of the element is flowed with the text before and after. A
third
basic option, none, means that none of the content whatsoever is shown, and the
element and its contents are invisible, leaving no trace on the document presentation.
Most of the other properties are variations on block and inline. The list-item value, for instance, creates a block to present an item within a list; and an inline block within that contains the content of the list item. The marker, run-in, and compact values may behave as block or inline, depending on context. The rest of the options describe tables (which may occupy blocks, or be inline within a containing block) and components within those tables. This relatively small set of descriptions includes enough flexibility to describe most document flows easily.
To see what this looks like in practice, we'll build a style sheet that presents the
XML
document above in the format inferred by its elements' names. First, we'll make the
element
type named block
behave as a block element, and make the
inline
element type behave as an inline element. So that we can see
what we're doing in the results, we'll also make the inline element bold.
block {display:block}
inline {display:inline; font-weight:bold;}
The lists are next. Lists, as in HTML, are typically built out of a list container
and then
the items within the list. Most of the formatting for the list is done in the containing
element. For the bulletList
, we'll specify a display property of block,
along with a left indent of forty pixels and a style of disc, which will produce round
bullets to the left of the item text. For listItems
within the
bulletList
element, we'll just set the display property to
list-item.
bulletList {display:block; margin-left:40px;
list-style-type:disc;}
bulletList listItem {display:list-item; }
The numberList
element and listItem
element types within that
element will get similar formatting, except that we'll change the
list-style-type
so that the list gets assigned numbers rather than bullets.
We'll need to add additional information to generate the counters, using the
counter-reset
, counter
, and counter-increment
properties as well. (Leaving out the counter information produces zeros in your
documents.)
numberList {display:block; margin-left:40px;
list-style-type:decimal; counter-reset: item;}
numberList listItem {display:list-item; }
numberList {
content: counter(item);
counter-increment: item;
}
The hidden
element will live up to its name with a display property value of
none. It will disappear entirely from the presentation.
hidden {display:none;}
Finally, we'll build an extremely simple table, using only the display property values of table, table-row, and table-cell.
myTable {display:table;}
tableRow {display:table-row;}
tableData {display:table-cell;}
The results aren't exactly gorgeous, but the CSS display property does what it's supposed to: build a backbone for presentation structure, as shown below.
Figure 1 - Mozilla understands the display property values used.
Unfortunately, Internet Explorer 5.01 doesn't understand nearly as much about the display property, picking up only on the block, inline, and none settings. Note, for instance, that the indents on the list items are inherited from their containing elements' style, but the list items themselves are treated as inline.
Figure 2 - Internet Explorer 5.01 is a little behind on support for the display property, though some things work.
Using this basic framework, you can build some pretty amazing document structures, though it's still basically the same set of things you could do with plain old HTML. That shouldn't be discounted, however. Even if all you want to do is create a localized vocabulary for marking up documents, this is handy, and the ability to map CSS presentation structures to arbitrary markup means that it's easy to build quick viewers for information.
A More Complex Example
Example Files |
•display1.xml |
The catalog listing below represents a simple table structure:
<?xml version="1.0"?>
<catalog>
<book>
<author>Simon St. Laurent</author>
<title>XML Elements of Style</title>
<pubyear>2000</pubyear>
<publisher>McGraw-Hill</publisher>
<isbn>0-07-212220-X</isbn>
<price>$29.99</price>
</book>
<book>
<author>Elliotte Rusty Harold</author>
<title>XML Bible</title>
<pubyear>1999</pubyear>
<publisher>IDG Books</publisher>
<isbn>0764532367</isbn>
<price>$49.99</price>
</book>
<book>
<author>Robert Eckstein</author>
<title>XML Pocket Reference</title>
<pubyear>1999</pubyear>
<publisher>O'Reilly and Associates</publisher>
<isbn>1-56592-709-5</isbn>
<price>$8.95</price>
</book>
</catalog>
We'll turn it into a table with a 3-line style sheet:
catalog {display:table;}
book {display:table-row;}
book *{display:table-cell; padding:5px;}
The asterisk in the last line indicates that all the child elements of book element instances should be treated as table cells -- there's no need to list all the possible child elements. (This is especially useful when you have different behaviors for elements both inside and outside of tables.) The results of this tiny style sheet aren't exactly exquisite, but they make it much easier to work with the information.
Figure 3 - Simple book catalog
Once you've built the foundation structures for your document flow, all of CSS's tools for formatting content -- margins, fonts, borders, generated content, and more -- are available. They work the same way with XML that they worked with HTML, though without the background information that HTML always provided. Every XML element -- except those explicitly placed in the HTML namespace -- is a blank slate.
If document flows don't meet your needs, Mozilla also offers CSS2 positioning. By
combining
the position
property with values for the top
, left
,
height
, and width
properties, you can lay out your document
content on a pixel-by-pixel basis. The float
property lets you specify how the
rest of the content should flow around these and other blocks in the text. Mozilla
M14 seems
to lock up its scroll bars as soon as position:fixed
is used, but its support
for the rest of positioning appears to work smoothly.
Adding Basic Links and Images
While Mozilla doesn't support the entire XLink vocabulary -- a reasonable decision
given
its uncertain status as a working draft -- it supports enough of it to let users create
simple links in their documents. The syntax dates back to the March 1998 draft, now
thoroughly outdated. It doesn't support enough of XLink to move beyond the HTML
IMG
element, so adding elements to documents requires using the HTML
namespace. We'll take a look at Mozilla's support for XLink by adding some pictures
and
links to the book example.
The new document looks much like its predecessor, but has one extra column containing linking and image information, along with some additional linking attributes on the title elements containing the book titles (for reasons of space, I've abbreviated the image URLs; see books2.xml for the full file):
<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="books2.css"?>
<catalog xmlns:html="http://www.w3.org/TR/REC-html40" >
<book>
<cover xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=007212220X/">
<html:img src="http://images.amazon.com/..." />
</cover>
<author>Simon St. Laurent</author>
<title xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=007212220X/">
XML Elements of Style
</title>
<pubyear>2000</pubyear>
<publisher>McGraw-Hill</publisher>
<isbn>0-07-212220-X</isbn>
<price>$29.99</price>
</book>
<book>
<cover xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=0764532367/">
<html:img src="http://images.amazon.com/..." />
</cover>
<author>Elliotte Rusty Harold</author>
<title xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=0764532367/">
XML Bible
</title>
<pubyear>1999</pubyear>
<publisher>IDG Books</publisher>
<isbn>0764532367</isbn>
<price>$49.99</price>
</book>
<book>
<cover xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=1565927095/">
<html:img src="http://images.amazon.com/..." />
</cover>
<author>Robert Eckstein</author>
<title xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=1565927095/">
XML Pocket Reference
</title>
<pubyear>1999</pubyear>
<publisher>O'Reilly and Associates</publisher>
<isbn>1-56592-709-5</isbn>
<price>$8.95</price>
</book>
</catalog>
The root catalog
element now defines a namespace for HTML 4.0:
<catalog xmlns:html="http://www.w3.org/TR/REC-html40" >
A new cover
element contains both a link to a place where the book can be
purchased, and an HTML img
element that lets us reference and display the
image:
<cover xml:link="simple" show="replace"
href="http://www.amazon.com/exec/obidos/ISBN=007212220X/">
<html:img src="http://images.amazon.com/..." />
</cover>
The xml:link
attribute identifies the cover element as a simple XLink, the
show
attribute indicates that the referenced document should replace the
current page in the browser window, and the href
identifies the document
targeted by the link. In the latest draft of XLink, the XLink namespace would have
to be
declared, xlink:type
would replace xml:link
, and
xlink:show
and xlink:href
would be used in place of
show
and href
, respectively.
The html:img
element behaves exactly like an img
element in HTML,
with a src
attribute identifying the location of the image to be displayed. All
the other features of the HTML img
element, like the alt
,
height
, and width
attributes, are still available and work as
they do within HTML.
The title element carries the same XLink information attributes as the cover element.
While
Mozilla will format HTML a
elements as links, it doesn't presently do so for
information linked using XLink. We'll update the style sheet to highlight the title
in a
rather traditional way:
catalog {display:table; }
book {display:table-row; }
book *{display:table-cell; padding:5px;}
title {color:blue; text-decoration:underline;}
The result is a table of books with book jackets displayed and links to a sales process established:
Figure 4 - Adding linked images into the catalog
Next Steps
We haven't built any masterpieces here, but the foundation is sound. Mozilla is now capable of displaying XML on par with HTML, and the extra flexibility that style sheets provide make it a genuine contender as a tool for creating web documents (or will someday when more users have the software!). There's a lot more CSS to explore in Mozilla, as well as tools that connect CSS to the DOM.
While Mozilla isn't yet polished, and some aspects of its development (XLink in particular) could use an update, it's a solid enough platform to be well worth exploring. By providing a tool for reading and exploring XML documents that works on a wide variety of platforms, and has an open source license to boot, the Mozilla project has definitely made a contribution to the XML community.