XML at Five
February 12, 2003
The XML 1.0 Recommendation turned five years old this Monday, 10 February. Since February 1998 XML has grown and developed, vastly exceeding all initial expectations. During those five years, XML.com has followed the XML world tirelessly, publishing nearly a thousand articles and becoming a forum for the heart of the XML community itself.
To celebrate this auspicious anniversary, I asked some XML old-hands and friends of XML.com to comment on their experience with XML over the last five years. Read on for their entertaining, illuminating and thought-provoking comments.
Life-changing
It's no exaggeration to say that XML has changed my life, professionally and personally. What about my respondents?
Rick Jelliffe, standards activist and Topologi CTO, still waves the flag of the SGML guard: "Not at all, actually. I started working in software for publishing using markup fifteen years ago, and still am." Others mused on the effect XML had on their professional life, and in particular that working with XML brought them into contact with some unique people. Eve Maler, a founding member of the W3C XML working group: "In addition to legitimizing my nervous habit of hierarchically structuring all data within my reach, it has allowed me to work with and learn from some of the most talented people in the world." Michael Sperberg-McQueen, Architecture Domain Leader at the W3C and co-editor of the XML 1.0 spec, added that XML "got me out of the university, out of Chicago, and into a job where others in the organization agree that working on open standards for information is a worthwhile way to spend time. Works for me."
Norbert Mikula, Director of Runtime Technology at Intel Labs and author of one of the very first XML parsers, recalls those early, halcyon days: "[XML] enabled me to work with some of the brightest minds in the computer industry. I fondly remember a time when the first day of an XML conference was usually a continuous group-hug exercise. It was really just a large family of weirdos." Of course, we've shaken that self-congratulatory thing these days...
It's good to know that XML experts walk the XML walk as well as they talk the talk. Henry Thompson of the University of Edinburgh and the W3C, is completely absorbed with XML: "All three of my jobs are centered on XML, I started an XML company 18 months ago, my intellectual focus is on XML."Ken Holman of Crane Softwrights and noted XML trainer, explained the benefits of XML in his business:
"I get professional looking document publishing results and easily managed content synthesized into presentations for my web site. I have been able to create and sell home-grown information products (my electronic books, licensed training materials, paper books, and instructor-led training handouts) all from a single set of sources. My customer management is made easy by on-the-fly aggregation of customer-maintained data in XML files at arbitrary URL addresses."
O'Reilly editor and long-time enfant terrible of the XML world, Simon St. Laurent, described how his entanglement with XML emerged from an interest in hypertext: "XML has derailed me from my original hypertext interests and forced me to think a lot harder about information structures in general. Along the way it's gotten me a few jobs, sparked a few fights, and generally left me wanting both less (fewer specs) and more (well-thought-out applications)."
Favorites
Ken Holman already described how XML is used in his business but what, I wondered, were the favorite XML applications of the experts?
Old habits die hard with Eve Maler: "My personal favorite is good old authoring and publishing. Using XSLT in this way still feels like a 'killer app' to me, because I remember the days when it wasn't trivial to get your information out of SGML form."
Rick Jelliffe and Henry Thompson, both involved in young XML-based companies, are intrigued by new processing models and possibilities. Jelliffe said "I like things you can see and play with. XML on the desktop is great, especially because it allows lots more client-server and, especially, peer-to-peer interaction." Thompson offers an alternative to conventional web services as his favorite: "Auto-updating of transcluding documents as an alternative to messaging."
Michael Sperberg-McQueen offered two applications, and echos Eve Maler's approval of XSLT.
"First, I like using XML to encode historically important manuscripts, with rich detail concerning the text and its inscription in the manuscript: what words have been deleted, what words have been added later, and then using XSLT to offer the user a choice of many different views of the material, created from the same single source: a new-spelling text with annotation suitable for high-school students or undergraduates, an old-spelling text with annotation suitable for graduate students and scholars, a diplomatic transcription with paleographic and codicological details; annotation for textual critics, annotation for historians, annotation for literary scholars; the text of Hamlet in the Folio, in this or that Quarto, or as emended by this or that critic of the past four hundred years.
Second, I get a particular charge out of seeing XML used in applications for which SGML was proposed, in the late 1980s and early 1990s, but usually rejected out of hand as hopelessly impractical. XML for configuration files (in Gnome, or Cocoon, or a zillion other packages); XML for graphics (SVG rules!); XML for programming (XSLT has become one of my favorite programming tools)."
Simon St. Laurent enjoys the combination of independence and interoperability that XML offers him: "I like that I can concoct my own vocabularies for particular documents depending on what the document looks like, yet still have pathways to more standard vocabularies. It works well for writing documents and converting them to DocBook or XHTML, but it's also pretty nice for mixing and matching rule-sets for programs."
Norbert Mikula revels in the joy tags bring to everyday correspondence, using XML
"to add
semantics to my e-mails such as: My <value currency="USD">0.02</value>
<rant>...</rant>
or <IMHO>...</IMHO>
."
Crazies
A while back I enjoyed myself by poking fun at various parts of the XML world, picking out some of the more amusing applications of XML. What's the craziest XML application our XML experts have seen?
Eve Maler agrees with Norbert Mikula, "I would have to say spontaneous well-formed
markup
in human-written email, such as <snip/>
and
<sarcasm>...</sarcasm>
. It works great with wetware parsers, and
you don't have to worry about performance issues."
You can have too much of a good thing, however, and Norbert Mikula does have limits to where he'll put those nutritious angle brackets. His too-crazy scenario for XML is "an XML-syntax based programming language." Rick Jelliffe took the sentiment further, and criticized XHTML: "HTML thrived on being more forgiving of missing tags than even SGML allowed. Throwing that away makes HTML a lot less attractive. Which is true of markup in general, actually."
Michael Sperberg-McQueen asserts that "there are no crazy applications, only crazy application designers," leading nicely on to St. Laurent's craziest use of XML: "I haven't written it yet."
Regrets, We've Had A Few
No retrospective is complete without wishing we'd done something differently. Simon St. Laurent worried about the increase in complexity of XML, a point also made by Norbert Mikula.
"When Jon Bosak approached me at SGML '96 with the challenge to write an XML parser in three weeks (because so it was written as a requirement in one of the first XML drafts) I was actually able to do it. Now, that we have dozens of layers added to it, you need an engineering department for the same task. Sometimes, to the innocent bystander, XML's complexity has just gotten out of control. The good news is, however, that tools and infrastructure increasingly absorb some of this complexity which allows us to keep going."
Ken Holman also finds this a concern, "I would make the layers of processing far more distinct than how they have evolved...e.g. separating the representation of information in text markup from the application interpretation of information found in text markup. I'm really worried W3C Schema is going to ruin text processing and disenfranchise document writers, and I think RELAX-NG is probably the way of the future because if its clean lines."
This encouragement toward simplicity is also echoed by Eve Maler, "...get some of those communities to keep the words 'What Is The Minimum Required To Declare Victory?' uppermost in their minds. I suspect this is what Tim Bray attempts to do on a daily basis!"
Michael Sperberg-McQueen offered two recipes for reducing confusion:
"(a) I'd provide a more convenient way for people to distinguish between parsers which can support all of XML and parsers which require standalone XML to work right. Lots of problems result from people wanting the functionality of validating XML processors, but insisting on deploying non-validating parsers which don't want to read external DTDs. One reason is that we don't have a good, concise way to describe the different possible kinds of processors.
(b) I'd take everyone who says XML is 'semi-structured data' (in contrast with relational data, which they believe to be 'structured') out back and engage in what Steve Zilles calls "severe counseling". The difference between data in third normal form and XML is not the difference between structured and semi-structured data: it's the difference between simple, regular structures that are easy to treat with the mathematical theory of relations and natural structures that are not so regular, not so simple, but very, very real, and very, very structured."
Looking Forward
As XML marches on, what would you most like to see?
Ken Holman wants us to be one happy family, wishing for "community harmony." Michael Sperberg-McQueen also sees this as important.
"I'd like the community of users to stay together and avoid forking. That means the minimum defined standard has to be powerful enough to support what most people need to get done, even if if means having some things there that seem unnecessary for some specialized applications. It also means those with more stringent demands being willing for some functionality to be supplied by standardized applications, rather than by the core language. For a language intended for open data exchange and interoperability, it is more important that there be one language with as few optional features as possible than that the details of the language turn out in any particular way."
Rick Jelliffe had some markup-oriented wishes to share.
"I'd also like to see so-called microparsing capabilities increase, so that structures can be written using their natural notation where they have one. W3C XSLT2 and ISO DSDL look like they will improve life for this.
Oh, and I would like a thorough review of XML for robustness and reliability, and soon. The idea that the class of parser should determine the information to be presented to an application is incoherent. The question should not be 'how do we simplify XML?' but 'how do we make it more robust and reliable?' Ends before means; Viagra not circumcision!"
Henry Thompson would like to give XML users better access to their data, wanting to see "declarative data-binding, so that the choice between manipulating XML as XML and working directly with the data it (sometimes) encodes is not constrained by irrelevant overheads."
Ever optimistic, Simon St. Laurent wishes for a period of calm reflection. " I'd actually like to see a period of quiet -- less development -- on the specification side, and more effort put into figuring out what it is we already have. Computing isn't exactly a contemplative field, but too many people seem to be racing forward painting 'XML' on as much as possible without even wondering what that might mean."
Closing Comments
Finally, I asked my respondents if there was anything particularly relevant to this five year anniversary they would like to add.
Ken Holman expressed the sentiments of many, observing that "James Clark hasn't received enough kudos yet for all his contributions to our community."
Henry Thompson banged the drum for interoperability. "Interoperability is the key to XML's success -- let's keep it that way: say 'no' to subsets, profiles and alternative formats."
Rick Jelliffe reminded us to continue with the emphasis on internationalization brought about by XML. "XML is not a prescription for peace with the Muslim world. But respectful attention to allowing other countries and individuals to develop and flourish in the ways they choose surely is necessary and, God willing, sufficient. Standards for the WWW need to be made with this respect and care built-in, and everyone involved in XML's specification should feel proud of its internationalization."
I will, however, leave the closing comments to the eloquent Michael Sperberg-McQueen (who, with Dave Hollander, can also be found reflecting on five years of XML on the W3C web site).
My thanks to all of my interviewees.
"XML and related specs are a great achievement of the community. Without community support, involvement, and hard work, the original XML spec would not have been finished so fast, argued out so thoroughly, or adopted so widely. Later specs have built upon, and helped build, an even larger and more vigorous community. It's important that we remember to give credit where it is due: to the foundational work of the ISO SGML working group and to the hard work of the scores of experts who served on the original SGML-on-the-Web Working Group (later renamed the XML Interest Group).
Also in <taglines/>
It's worth noting that XML by itself does not do nearly as much as some people (mostly in marketing, I hope) have said. It doesn't solve all the problems of data exchange and data reusability: you still have to make the data rich enough to be worth exchanging and reusing. To tag data richly, you still have to know the data. Tagging 'everything that could be of interest' is still impossible (not just impractical, but impossible). All XML does is get some extraneous difficulties out of the way of solving those problems. It allows you to avoid tying the data tightly to a specific hardware or software platform. It allows you to tag what you think is interesting, rather than what seemed interesting to the programmers who work for your word-processor vendor. Compared with other approaches, which add to the difficulties instead of reducing them, that's a win.
Now that 'database' data and 'word-processing' data can both be processed in the same format, we have opportunities which we never have had before. I have heard too many document-oriented people whining about the horrible things the database people have done to XML -- imagine! They have forced the development of datatypes other than NAME, NUMBER, NMTOKEN, and CDATA for attributes! And I've heard too many database people complaining that the document people don't know enough about database theory -- imagine, the XML spec doesn't say whether an empty element has no content, or has the empty string as its content! How am I supposed to map that to a relational column with NULLs?
Be cool, guys. It's all information, and it's all information your users want to have access to in the same place. Document designers have been wanting better datatypes for years, and with every revision the SQL spec has migrated further from the purely relational model and closer to the directed-graph model of XML. Don't try to split them apart again, just to avoid having to deal with 'those people.' Learn to live together; you might come to enjoy it."