Doing it Simpler
August 1, 2001
This week the Deviant gives a quick update from the SML-DEV mailing list.
A Retrospective
The SML-DEV list was born in November of 1999, following a fairly heated XML-DEV debate about simplifying XML. Since then it's been involved in a number of debates about the complexity of XML specifications, including ideas about how the complexity might be mitigated with judicious subsetting and other forms of simplification.
The list's first deliverable was Common XML ("All the XML you need"), a selection of the useful aspects of the XML 1.0 specification. Common XML was generally well received, even in quarters where the idea of XML subsetting was previously anathema, no doubt largely due to it presentation as distilled best practice rather than overt simplification.
SML-DEV then moved swiftly on to MinML, a true XML subset that threw out everything except elements and simple text content. The MinML specification was debated long and hard, acquiring a checkered history even on SML-DEV. A particularly contested issue was whether attributes should or, indeed, could be usefully removed. This particular debate has still not gone away: Sjoerd Visscher summarized some issues in a recent paper, and James Clark recently commented that XML applications should minimize distinctions between the two styles of markup.
TREX tries to treat attributes and elements as uniformly as possible. If you're designing an XML or SGML markup language, it's often pretty much arbitrary whether you represent some bit of information as an attribute or as a child element. In my view, XML processing tools and languages should try to minimize the differences between elements and attributes and should try to treat them as uniformly as possible. You can see that in XSLT and XPath. I wanted to apply that idea to schema languages.
The MinML specification was successfully completed, however, and a number of MinML parsers have since appeared. One of the most interesting findings from the MinML experiment is that adding more features -- attributes, DTDs, and so on -- increases neither parser implementation complexity nor, with care, the associated data model.
SML-DEV, relatively quiet for a time, eventually snowballed into the definition of YAML ("Yet Another Markup Language") which "broke free" from XML by throwing out pointy bracketed syntax entirely. YAML is aimed at a much smaller problem space, data serialization, particularly for Perl and Python applications.
SML-DEV has tried three different approaches to date: profiling XML through collation of best practice, subsetting XML by paring it to the bone, and providing an XML competitor for a specific application area. What next?
SML-DEV has begun to show signs of activity again, perhaps prompted by the recent debates calling for refactoring XML to layer and expose the dependencies between its specifications more cleanly, as well as guidance for developers toward the core XML technologies they really need. Michael Champion's (a long time SML-DEV contributor) own article "Daring to Do Less With XML" contains further sound advice on these issues.
Doing It Differently
Joe Lapp posted to SML-DEV this week wondering if anyone would be interested in creating "what XML should have been". Lapp listed various reasons for not wishing to build directly on either MinML or YAML, suggesting instead that work begin on another alternative.
If anybody out there is interested in creating an alternative to XML that is incompatible with XML, I think it would be wise for us to start by tackling the niches in which XML does poorly...
Of course, I'd still want to be able to use the new language where I'd otherwise be inclined to use XML, since a big motivation for me is to have an excuse to use something other than XML. Another big motivation for me is the prospect that the little guys could ultimately generate enough momentum to overthrow much of XML. (This last sentence only makes sense if you understand that the complexity of XML and its ties to the W3C necessarily make it primarily a game for deep pockets and long horizons.)
A slightly surprising tack given that Common XML is likely to be SML-DEV's most successful venture. Is creating another markup syntax really necessary? Michael Champion, who has previously commented that the Common XML/Best Practice approach to simplification is the correct tack, seems to be convinced that a real alternative is the only option.
Interesting timing ... I'm having a "why oh why oh why did I ever get mixed up in this XML $#!+" kind of day. I can't talk about the details because of the W3C creed of Omerta, but suffice it to say that the little inconsistencies between the data models of the extended XML specifications (DOM, XPath, XSL, XQuery, the PSVI, the InfoSet, ad nauseum) are slowing W3C progress to a crawl. The solution of breaking a few things, radically simplifying, and starting over is not even politely listened to in W3C circles. Godfather Darwin is going to be taking XML (broadly defined) out to a landfill in 'Jersey before long. The only question in my mind is whether some other reasonably open markup language takes its place (SGML-lite? an ISO or OASIS-defined XML subset? An ad-hoc semi-standardized XML subset that everyone embraces and extends?) or whether we go back to the Bad Ol' Days of proprietary "post-XML" formats and tools.
So, I'm interested, but utterly stymied as to the politics/business model of some XML simplification. The W3C is beyond hope as a venue for simplification, and no other existing organization shows any interest either (well, maybe OASIS and RELAX-NG ...)
In a subsequent post Joe Lapp noted that success depends on defining the particular problem area at which an alternative might be targeted.
If we decide that we need a good serialization language, well, then we would probably end up with something close to YAML. If we decide we need a good way to mark up human verbiage, well, then we would probably end up with something like Paul's PXML. If we decided that we need a language that does a little something for everybody, well, then we would probably end up with XML.
I think our success hinges critically on us picking a problem or a set of problems to solve and then finding a darn good solution.
The message continues by outlining Lapp's current thinking, which is actually a slightly tweaked version of MinML. Taking an alternate tack, Tom Bradford believed that Common XML would actually be a better starting point.
I think a subset of XML and related technologies that refused to acknowledge external dependencies, used a non brain-dead namespacing mechanism, and reduced the amount of interdependencies, as well as the number of overlapping and duplicated features would be a good start. There's Common XML, which I'm inclined to think is probably the best foundation with which to define such a beast.
Bradford has recently published two papers which are further fuel to the simplification debate. The first, "Clean Namespaces", suggests naming patterns to qualify XML elements. The second, "The Future of XML" is a brief rant on the state of XML development, concluding that the "grand vision" of XML will crumble because the
rate of adoption of XML by entities who actually need it to solve problems is inversely proportional to the complexity of XML as applied to those problems. The more specification interdependencies, and the more complex the specifications being released by the W3C, the worse the rate of adoption for XML will become. We're basically heading for SGML all over again.
Don Park, the founder of SML-DEV, also admitted bad feelings about the direction in which XML is going.
I still like XML, but have bad feelings about where it is going. XML's relationship with W3C is both a blessing and a curse. They took our acceptance of XML as open invitation to shove whatever standard they approve down our throat. Whatever W3C does is tinted with politics and compromises between document vs. data use, verbosity vs. brevity, etc. W3C is like Washington D.C.
Also in XML-Deviant
...I do see that there is a rising sea of frustrations. Whether there are enough fuel to reach escape velocity out of this global gravity-well called XML, your guess is as good as mine.
The problem with simplification is deciding which bits to leave out. Once you begin to move away from the core specifications, people's requirements begin to rapidly diverge. Cut out too much and you may well disenfranchise a large part of your potential user base. A well-targeted utility language may well be successful, but it's likely to remain in a niche area. Investments already made in XML and related technologies aren't going to be readily squandered either. The rapid success of XML may be inadvertently perpetuating an "if you build it, they will come" illusion. Reality is far harsher.
Sharing best practices is the only viable option. The real concern over simplification is that the temptation to start over from first principles could well limit progress.