
How Do I Hate Thee?
by Edd DumbillNovember 03, 2004
Let Me Count the Ways
Two of XML-DEV's stalwarts, Mike Champion and Len Bullard, started the trouble. Champion posted to the list observing that five years on from the XML simplification effort in 1999, it might be time to revisit the topic.
So, five years later ... is it NOW time to think seriously about cleaning up the core XML specs to address the challenges that real-world non-XML geeks have with them (hopefully without throwing out the interoperability baby with the bathwater), is it time to redouble efforts to educate non-XMLgeeks on why they should eat their XML 1.0 veggies and stop whining, will better tools and best practice guidelines solve the problems, or what?
As usual, Champion's mail sparked a lengthy series of responses, not all of which can be covered in this article. What I will follow is the entertaining thread about XML's faults provoked by Len Bullard. In fitting with the season, Bullard somewhat impishly suggested everybody list their top five problems with XML. Everybody loves lists, and the results should turn out to be interesting. Bullard forecast that the real problem would be XML namespaces.
Of the cases presented, isn't the really gnarly one namespaces? In other words, if the edges of that were tidied, how much pain would go away?
Robin Berjon was the first to oblige, picking on DTDs as his bête noir.
- DTDs
- other legacy cruft
- DTDs
- more legacy cruft you always forget is there
- DTDs
Bill de hÓra again mentioned DTDs, but had a broader mix in his top five:
- Default namespaces
- DOM
- No Clark notation in XPath (or XML)--see 1 for details.
- Whitespace
- DTDs
|
Related Reading
|
While
Robin Berjon agreed that
the namespace notation was a pain in XPath, he didn't want to see "Clark notation" (where
the namespace URI is written in full) but instead use of the xmlns() XPointer
scheme.
Eric Hanson picked up on one of my own wishlist items, XML packaging, though I'm not sure I'd characterize it as a major flaw:
There is no way to look up, discover and retrieve the library of resources that support with a namespace-qualified element. If you come across a piece of data, there may be hundreds of supporting resources like XSL transformations, schemas, xforms, text documentation, etc. We need a way to link the resources to the data.
Sean McGrath has had more opportunity to polemicize over the faults of XML than most, and unsurprisingly, his list of faults has a broader outlook. Now, if you asked XML-DEV for a list of five things and were actually expecting to get replies with five unique items, you'd be crazy. So McGrath gave six and a neoligism or two to boot.
- The lack of sane, simple roundtrippability. I read in some XML, I write it straight back out again. I [lose] stuff on the way ...
- Namespaces--specifically defaulting and the "declare 'em anywhere you like buddy" aspects.
- No sane, simple pull based XPath 1.0 subset.
- W3C XML Schema--pretty much everything about it.
- Doctype. We should have left assertions about schema compliance (and consequently the entire idea of an embedded document type declaration subset) on the clipping room floor...
- Fuzziness over the use of terms like "XML parser" and "XML Editor" and "XML aware" and "XML compliant" ... Interop problems are the inevitable result.
Rick Jelliffe also picks up on the conformance issue and the general pain around DTDs. Jelliffe comes from a document-oriented XML processing background, so his list of XML faults brings a different perspective to the debate.
- Needs adjusted conformance levels: no-DTD, or DTD+validating ...
- Needs to reserve ISO standard entity names with ISO meanings, so that no-DTD processors can be used in the publishing industry ...
- Need to have namespace-aware DTDs. Even just to allow that @xmlns and @xmlns:* do not need to be declared in the DTD would be a giant step forward ...
- Needs
xml:space="strip"for use with no DTD.- W3C needs to endorse ISO Schematron ...
- Whingers who dissipate real opportunities for change ... I certainly think it is time for XML 2.0, but to remove specific problems with the existing syntax, not to reduce the infoset or adopt some different syntax or disenfranchise publishing people further.
Jelliffe's list will certainly strike a fellow feeling with anyone who's ever written more than handful of documents in XML. His point of view is a welcome reminder of XML's role in the publishing world and the seeming blindness of the W3C working groups to its applications there.
So far, the sword of simplicity dangles dangerously over both namespaces and DTDs. But what else is on the chopping block? A recent weblog post from Derek Denny-Brown, a Microsoft developer working on XML products, attempted to document where "XML goes astray." In a fascinating post, Denny-Brown explains the difficulties of XML, designed as a document format, applied in data scenarios.
Dare Obasanjo helpfully pulled out the main points from Denny-Brown's article.
- XML's treatment of whitespace confuses developers.
- The limitation in the range of allowed characters in XML is a hassle which the Microsoft XML team sees customers complain about on a weekly basis.
- Namespaces are close to a disaster [but not quite, that dubious honor goes to W3C XML Schema]
Elliotte Rusty Harold however was unequivocal in his disagreement with Denny-Brown.
This article is absolute crap, and a typical example of Microsoft think. It blames XML for the very problems Microsoft created and which don't exist in other tools and on other platforms.
He goes on, in a similar vein, to assert that many of the problems Microsoft's customers face are due to a misunderstanding of XML as implemented in Microsoft's APIs, not problems with XML itself. Read the full post if you wish to steep yourself in vitriol. One thing that Harold picked up on that is worth mentioning is the second point as summarized by Obasanjo, the restriction on character ranges in XML 1.0, which would seem to be solved by XML 1.1. That is, assuming it wasn't a confusion between characters permitted in XML text content and XML names.
As the W3C's Liam Quin noted, we'll no doubt expect rapid deployment of XML 1.1 from Microsoft.
Now, onto the most oft-cited XML fault: namespaces.
What Exactly Is the Problem with Namespaces?
Adding his bugbears to the list, Robert Koberg mentioned that he doesn't see what the problem is with namespaces. Peter Hunsberger clearly doesn't agree, citing namespaces five times over as his favorite problem with XML. So, what exactly is the issue?
Joe English writes that his complaint is the hassle of carrying around a namespace URI and a local name:
When I complain about namespaces, it's just the opposite: I don't want to have to use URI/localname pairs everywhere. I'd rather treat element type names and attribute names as simple, atomic strings. This is possible with a sane API, but most XML APIs aren't sane.
Robin Berjon highlighted another common problem, the expectation generated by the use of URIs as namespace names.
People [think] the URIs resolve to something magical (which they should, but usually don't). Then they think that they inherit to descendant elements or to attributes. This is usually dealt with by repeating ten times over that namespaces are dumb.
Michael Kay explained that having something as fundamental as naming as an added extra to XML 1.0 was a bad idea.
Naming is architecturally fundamental: changing the naming architecture of XML by means of a bolt-on to the core standard was an incorrect layering that was bound to lead to many practical problems.
Kay continues, identifying more issues:
It has always been ambiguous whether prefixes are significant or not.
The indirection between prefixes and URIs makes the interpretation of many textual fragments (XML entities, XPath expressions, XQueries, even schema documents) context-dependent.
The use of URIs as namespace names has always been fuzzy around the edges, as exemplified by the "relative URI" debacle.
If namespaces are so bad, can we live without them? Gavin Thomas Nicol thinks so, and said so in his list of XML bugbears:
- Namespaces (who *needs* them?)
- DTD's (should be broken out of the core)
- External Entities (not really necessary)
"But what about XSLT?" asked Robin Berjon. Nicol expands:
XSLT does not need namespaces as such, and could have got along fine with just alpha-renaming (i.e. like elisp packages), and even that wasn't strictly necessary.
History shows this isn't the first time Nicol has suggested this idea. Alpha-renaming is the process of rewriting names to scope them locally, but it's not entirely clear what Nicol is proposing. Any explanations will be gratefully received.
Michael Kay presented a more tangible solution, echoing Bill de hÓra's earlier wish for a "Clark notation" for namespaces.
I have advocated one change which I believe would alleviate the problems: there should be a lexical representation of expanded names that uses the URI and local name, rather than prefix and local name, and this representation should be permitted in any context where a lexical QName is permitted, including in element names and attribute names in source XML, in QName-valued attributes, and in path expressions. This would mean that any XML fragment, XPath expression, etc, could be "namespace-normalized" to make it context free.
Births, Deaths, and Marriages
The latest announcements from the XML-DEV mailing list.
- JAXP 1.3 RI is public on java.net
-
JAXP 1.3 Reference Implementation showcases a variety of new features in the Java API for XML processing, including DOM L3 Core, DOM L3 LS, SAX 2, XML 1.1, XInclude, and a new Schema independent validation framework
- freebXML CC
-
freebXML CC is "a set of tools developed to facilitate the work of domain experts managing data dictionaries," designed to work with ebXML Core Components and interoperate with ebXML implementations.
|
Also in XML-Deviant | |
- W3C updated XQuery/XPath working drafts
-
The new working drafts include a number of changes made in response to comments received during the Last Call period that ended on Feb. 15, 2004.
Scrapings
In return, can I get my name spelt correctly? ... can I ... please? ... XML pedants in a swing state ... a firestorm of 297 messages to XML-DEV last week, Len rating 8% (firestarter!) ... more enforcement of schema philosophy in XML editors ... and more talk of namespaces not to everybody's taste ... the LMNL meme will never die.
Add your 5 hates about XML here!
(* You must be a member of XML.com to use this feature.)
Comment on this Article
| Titles Only | Titles Only | Newest First |
- irrational hate
2004-11-15 01:48:30 jim fuller [Reply]
I cant resist...though I will split into 2 categories; irrational and rational hatred....
top five irrational things I hate about XML, but wont fight too much;
1) ampersands...yes we need a few special characters but why could we not just state that an ampersand with whitespace around it is just an ampersand...I know, I know...
2) W3C XML Schema...I dont like its structure, I dont like its complexity...though I think this is more related to 'the right tool for the right scope'....W3C XML Schema says 'enterprise' to me...and along with SOAP and WS-* I would probably embark on using these 'heavy battleship' technologies and ML's when scope demanded it.
3) which leads to 'no simple schema or typing': is it so frightening to have
<somexml ss:type="integer"></somexml>
which says the value enclosed is an integer, string or whatever the top 5-10 types are?...w/o inheriting from W3C Schema..just a starting point...anything....along these lines why not an example XML document which implicitly defines a structure (e.g. yes exampletron).
4) linking/selection: what a missed opportunity, why did we not just inherit simple linking in XML ?
<x href="http://example.org/test.xml" >
with default behavior being a simple include...of course we could expand our definition....by adding xpath|xpointer|xquery
<x href="http://example.org/test.xml#xpointer(someelementtype)">
why not some xpath
<x href="http://example.org/test.xml#xpath(//someelementtype[@test='5'])">
....finally couldnt we use the effective XQuery syntax(sorry dont know compact form)whereby something like this
doc("auction.xml")//music:record[music:remark/@xml:lang = "de"]
turns into something like;
http://www.example.org/auction.xml#xquery({//music:record[music:remark/@
xml:lang=de]})
lets try a more advanced xquery linking/selection example;
{
let $doc := doc("prices.xml")
for $t in distinct-values($doc//book/title)
let $p := $doc//book[title = $t]/price
return
<minprice title="{ $t }">
<price>{ min($p) }</price>
</minprice>
}
ok I am hacking up syntax all over the place e.g. note use of curly brackets, not to mention that url encoding might complain but these are simple challenges.
http://wwww.example.com/prices.xml#xquery(<minprice><price>min(//book[ti
tle={distinct-values(//book/title)}]/price])</price></minprice>)
5) namespace distraction: I like namespaces...they are a simple method of avoiding name collision which is very useful if u are working at the nexus of many ML vocabularies...to me they are benign and do not cause any consternation other then the various processors which misinterpret the spec on their usage...though this is true of any technology.
I dont understand why a namespace cant link to a meta definition which contains all possible meta data which describes the xml.. whatever you may imagine....for example (for me) ...a mixture of namespace def ala NRL (namespace routing language), dublin core for authoring/versioning information and a bit of RDF..;
<namespace>
<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" ID="date">
<dc:subject>XHTML</dc:subject>
<dc:subject></dc:subject>
<dc:subject></dc:subject>
<rdf:Description xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/dc/elements/1.1/" ID="date.1">
<dc:creator>James Fuller</dc:creator>
<dc:date>2004-05-12</dc:date>
<dc:description>An extensible html superset blah blah blah</dc:description>
</rdf:Description>
</rdf:Description>
<rules xmlns="http://www.thaiopensource.com/validate/nrl">
<anyNamespace>
<validate schema="xhtml.rng"
useMode="#attach"/>
<validate schema="xhtml.sch"
useMode="#attach"/>
</anyNamespace>
</rules>
<!---put whatever you want here -->
</namespace>
though I could have added RDDL document...namespace should be a 'container'.. a dense rich multi-layered xml document which describes all aspects of an xml document....neat. One last pratical thing, namespaces make textual merging processes simple...such as encountered with most SCM's.
I will leave the rational things I hate for my own brooding!
Jim Fuller
- Defaulted Values
2004-11-09 06:02:27 AndrewWelch [Reply]
I dislike DTDs and I hate the idea of defaulted values...
One of XML's strengths is that it is human readable and only requires a simple text editor to modify it. Yet with defaulted values you need to parse the source XML against the DTD to see the real picture. It's a mix of content and validation just doesn't sit right when content should be seperate from everything else. It also means that to process the XML you must also validate it every time - even though you may have already validated it yesterday, last week, last year.
Namespaces seem quite practical in comparison...
- Name Spelling
2004-11-04 06:22:53 Len Bullard [Reply]
Apologies, Edd. Too much typing (sort of like XML Schemas).
len
