XML support in IE5
March 18, 1999
Editor's Note: In looking at Microsoft Internet Explorer 5's promised and actual implementation of XML, XML.com technical editor Tim Bray decided to write his review in XML. Since IE5 can display XML, but other browsers are a bit behind in this, we've provided two versions of this document. The version you are looking at is an HTML version. The XML version is here. Aside from this intro, the HTML version was generated from the XML source on the server side. Tim did some interesting things with the style sheets, which we've left mostly intact - which is why this page looks a bit different than our usual layout.
The plan was that this story would cover XML in IE5, including the base language, setting up the server, CSS, XSL, and the DOM. Unfortunately, we had a hard deadline (IE5 went public on March 18th), and when it arrived, I'd invested so much time in learning XSL, without getting anything based on the public drafts working in IE5, that it occupied all the time we'd budgeted for both that and the DOM. We'll keep struggling with XSL until we get something that actually works and plays by the rules. Following that, we'll go on and do some DOM coverage.
The event that motivates publishing this article at this time is, of course, the arrival of Microsoft Internet Explorer Release 5. While we will try out one or two things in the bleeding-edge pre-alpha Netscape "Gecko" code, the main focus of this article is publishing and browsing XML in a standards-compliant way using IE5. Of course, when there is actually some sort of real released product from Netscape, we'll publish an even more interesting article - how to publish and browse XML in an interoperable way.
This is after all XML.com, and it's about time that we started publishing in XML. The subject of the story, naturally, enough, is how to go about publishing and browsing XML on the Web.
There are a bunch of different ways to deliver XML over the Web. The first, and most obvious, is to give up and not try; after all, it's going to be years, probably, before most people will have XML-capable browsers on their desktops. So you could just take the XML and turn it into HTML and deliver that. (And in fact, if you're not adventurous, that version of this article may be what you're now reading).
A second option is to just go ahead and deliver XML, as-is, don't sweat stylesheets or anything, and just see what happens. With IE5, this turns out to be not as useless as you might think.
The right thing to do is to send the XML to a browser with a stylesheet, today in in CSS and before too long, in XSL. This has a bunch of advantages:
- it cuts down on the amount of data you have to transmit
- it offloads a lot of the formatting work to the browser that's coasting along on your under-worked Pentium desktop
- it allows you to do cool things with the DOM.
If you are running Apache 1.3.4, you are in luck, since the mime type for xml files is already configured. If you are running an earlier version of Apache and you have access to the server configuration:
Edit the mime.types file and add the mime type by adding the line:
text/xml xml
then restart the Apache
server.
If you don't have access to your server configuration, or don't want to mess with
the
configuration files, ask your system administrator to add the mime type:
text/xml xml
Other servers have slightly different methods for adding mime types.
Before you can worry about browing XML, you have to find a server (or set up your own) that serves it properly. We've enclosed a note from one of XML.com's webmasters that explains how you might go about doing that.
If you just want to read the XML out of a file (as I'm doing right now while writing this article) things are a lot easier; when IE5 opens a file whose name ends with ".xml", it assumes that it's going to contain XML and does the right things.
At this point, now that you're ready to serve, you need something to serve. For this document, just to keep things clear, I invented the tags as I went along, choosing reasonable-sounding names, and didn't bother with a DTD. In many cases, you can use someone else's pre-cooked DTD; one good candidate would be the HTML-in-XML currently under development at the W3C.
Whether you've got a DTD or whether you're just making it up as you go along, you're going to need something to type it in with. The really basic approach is your usual text editor; that's what I do, except for my usual text editor is GNU emacs, which isn't basic at all. Emacs is really a tool for the hard-core geeks; you'd probably be better off having a look through XML.com's list of authoring tools.
Of course, you wouldn't want to invent all your own tags. If you need a list, or a hypertext reference, HTML has those already built in, and with the magic of XML namespaces, you should be able to use those HTML elements.
Here's how it ought to work, in theory:
- Declare a namespace prefix and bind it to the official namespace for HTML.
- Use the names of HTML elements in your document, but attach that prefix you declared.
For example:
<start xmlns:H="http://official-namespace-of-HTML"> <H:ol> <H:li>Declare a namespace prefix and bind it to the official namespace for HTML.</H:li> <H:li>Use the names of HTML elements in your document, but attach that prefix you declared.</H:li> </H:ol> </start>
This kind of works, which is a little surprising. Since the W3C hasn't gotten around to declaring an official namespace name for HTML, the IE team has a tough problem in figuring out how to follow the rules. If it didn't work, you wouldn't see all the nice bullet-lists and hyperlinks in the XML version of this document.
What you have to do is declare a namespace prefix, and that namespace prefix has to be html - no other string will work! You have to declare it, but you don't have to map it to any namespace name in particular (do a "view source" on this page to see what I mean). This is a huge violation of the essence of the namespace spec, which would suggest that Microsoft somehow Just Doesn't Get It about namespaces, except for we know that they do. Puzzling.
If you want to use stylesheets, you'll have to tell the browser. The way to do that is to put a "stylesheet linking PI" at the top of your document; here's an example:
<?xml-stylesheet href="first-x.css" type="text/css" ?>
Which leads us to the first nasty bug. That little fragment is supposed to begin with "<?xml-stylesheet...", but in all the IE5 examples I've seen, it begins with "<?xml:stylesheet...", a now-obsolete version that grossly violates the "Namespaces in XML" specification.
When I was preparing the example just above, I ran across another really ugly problem. I wanted to show the stylesheet linking PI, but I couldn't just cut-n-paste it in as-is, because it has a "<" character, which you can't have in XML text. So I "escaped" it using the standard built-in XML (and HTML) "&lt;" technique:
<?xml-stylesheet href="first-x.css" type="text/css" ?>
Unfortunately, this made the whole example vanish! It seems that IE gets confused somehow and sees the "<" as a "<" unless it's followed by a space, and starts parsing away. After pondering this one for quite a bit, I ended up with the following:
<<no-op/>?xml-stylesheet href="first-x.css" type="text/css" ?>
The trick is that the empty "<no-op />" element keeps IE from getting confused.
Another trick would be to use "CDATA Sections" for examples like this. But they seem pretty well completely broken in IE5 as well; it complains about undeclared namespace prefixes and so on.
Sigh, release 1.0 of anything is always exciting, even when it's called release 5.0.
If you load an XML document into IE5 with no stylesheet at all, you get a nice tree-structured display with little +/- icons that you can click to hide subtrees. I've actually started using this quite a bit to have a quick look at XML documents that people send me; it's a good way to get a feeling for the content and structure of some arbitrary XML.
At this point in history, there is only one official, approved, stable, production-quality standard for stylesheets, and it's named Cascading Style Sheets, or CSS for short. CSS 1 has been around since December 1996, and CSS 2 since May 1998.
I don't really have the time and space to do a full investigation on CSS compliance, but I don't need to, because my colleagues on the Web Standards Project are hard at work on this even as I write.
In general I found the IE's CSS handling pretty good; i.e. everything I tried worked more or less first time. It's probably worth your while to grab the CSS stylesheet that's being used here and have a look at it; it's not rocket science, but it does illustrate a few tricks that I think will be useful for a lot of people.
In particular, I'm fond of the float technique; in the XML+CSS form of this article, the sidebar and examples and little good/bad/bug graphics are done with CSS floats; previously, you would have had to use <TABLE> kludges to achieve this kind of effect.
But I have to end on kind of a sour note. We may be moving the paperless office, but a lot of us still need to print quite a few of our documents. With XML+CSS, IE5 can't; that is to say, when you print, you get an unformatted dump. So near, and yet so far.
I wondered whether the XML+CSS display would work in the pre-pre-pre-release of Mozilla. Since I hadn't downloaded that in a few weeks, I went over there and pulled down a recent build, ignoring all the blood-curdling warnings about using this untested and pre-cooked software. Good news! It worked not too badly at all, first time out. Mozilla is a lot pickier about margins and so on, and does the floats a little bit differently from IE (I'm not enough of a CSS scholar to say which is right) - but we are looking at two pieces of software with strongly converging behavior. Maybe there's hope for standards yet.
It is perfectly crystal-clear that Microsoft is un-enthused about CSS. The Microsofties who helped us out with this article kept reminding us that we should show off formatting with XSL. And in fact, if you have links both to a CSS and an XSL stylesheet, IE picks the XSL version. Just to review, XSL - Extensible Stylesheet Language - is a W3C work-in-progress which is scheduled for completion sometime later in 1999. It comes in two parts - a "transformation language" used for preparing documents for display, and a "formatting object set" that is used for actual visual styling.
At this point in history, several groups (including Microsoft) have implemented a snapshot of the transformation language, but no-one has got the formatting part going. What Microsoft would like us to do, apparently, is to use the XSL transformation engine to turn XML into HTML before displaying it.
This is going to cause problems. For example, in order to write this article, I needed to teach myself XSL, so I went and looked at the XSL Working Draft - it's called a "Working Draft" because it's in-progress and might change; if I may quote from the introduction:
It quickly became obvious that the Microsoft XSL examples contain many things that aren't in the XSL Working Draft. Is this because they, as members of the XSL Working Group, know about things that will be in a soon-to-arrive draft of XSL? I don't know. Is it safe to use them? I don't know.
The bottom line was that when the deadline for this article rolled around, I didn't have XSL working. This is a pity, because XSL has a couple of tremendously attractive properties. Perhaps the most important is that it will run both in the browser and in the server; so you can send XML+XSL to XSL-capable browsers, and for the rest, run the same code on the server to generate HTML.
But in XML.com, we try hard to do things by the rules, so you won't see any XSL in production here until we can figure out how to use it by the book - we assume that IE5 will be able to do this.
After our little expedition force had washed up on the rocks of XSL, we were left with a document, in XML only, that couldn't display in the real-world (rev 4 and behind) browsers that real people have on their real desktops.
But we were undismayed, because we opened our grease-stained tool box, and whipped out the all-purpose tool. 127 lines of perl later (using, of course, the brand-new "XML::Parser" module), we had a less-pretty form of the article (probably what some of you are reading), in HTML generated automatically from the XML. It'll be interesting to note, after we get the job done with XSL, whether it turns out to be more or less code than in perl. Also it'll be interesting to make a judgement on which is more maintainable and flexible.
First, you have to install it. Microsoft kindly sent us a CD before the rest of the world got it (thanks, Dave); and the installation on my NT box was pretty pain-free. Following the advice of my local expert, I uninstalled the IE5 beta first, and rebooted just to be sure. Firing up the CD reveals that it contains not just IE but NT Service Pack 4, which I hadn't installed. So, installing the service pack and taking all the defaults, you're looking at 3 or so reboots, and a lot of waiting while watching polite messages from Windows about how it's optimizing your system.
I thought it was interesting that the CD-ROM contains 198 MB of data; you probably don't need to get that much to get yourself an IE5, but it's safe to say that IE5 is going to be a great big honkin' download. Microsoft has in recent times been making their browser updates available on CD-ROM for almost nothing, basically the cost of duplication and shipping; my experience suggests that this is probably the best way to go about getting IE5.
This article isn't a Web browser review, but IE5 seems like a pretty nice Web browser, except the XML-related problems. It runs fast (faster than IE4, much faster than any version of Communicator, about the same as the Mozilla pre-releases) and looks good. As before, the smooth scrolling and cleaner screen are improvements over Netscape. As before, it insists, when you create a new browser window, at starting in the same page that was active, rather than your home page. It does one thing better than any previous version of IE - namely, when you use the "back" button, it does a fine job of coming back to a point in the page not too distant from what you left behind.
IE's error-handling seems exactly per the spec, which is delightful. I've been giving speeches for a couple of years now telling people that XML-style error handling in the browser wouldn't really change work patterns; you'd bash out your XML, and when it displayed in the browser, you could ship it. I'm glad to have discovered that I've been telling the truth. At least, that's how this article got created - whenever I stupidly put a tag in the wrong place or forgot a quotation mark, IE politely but firmly told me where the problem was.
IE's error handling becomes very irritating, of course, when there's not really an error - for example, when the browser refuses to bypass escaped "<" characters. But we can assume that Microsoft will work on fixing that.
Is the glass half-empty or half-full? It's too early to call; rendering XML with CSS is nice (and will be even nicer once IE 5.x fixes a few more bugs), but the real value-add of XML in the browser isn't so much displaying it as processing it right there in the browser. For that, you need the DOM; if IE5 turns out to have a nice clean usable DOM, that will make up for a lot of little awkwardness in the parser. If not, this will look like a (huge amount of) wasted effort. Stay tuned.