Ken, your comments on "self-describability" are right on. I've been writing about precisely this for the past few days, and it has kept bothering me. I too have studied philosophy and using such a term does make me cringe every time.
Any suggestion of a name that describes the ability to retrieve the node names without recourse to external data, *and* is understood immediately by most, would be much welcome.
But on to the meat of it. I wish to pick at your claims regarding the "subset of a subset" and the "Web per se".
How many mobile units shipping with support for such technologies as SVG, XHTML, or SOAP does it take to make you consider it large enough for consideration? I've heard that the US was a bit behind in adopting those, but surely you wouldn't be that culture-centric? How many homes need to have interactive TV set-top boxes to make you happy? How many people need to be using SOAP before it counts?
And what's that "Web per se" business? Is it only the Web if I'm browsing porn from a beefy desktop box? Do other devices not count?
We're talking millions of users already. And their content is webbish, or being webized, when it isn't the Web already. And I won't get into the other uses, timed text, X3D, NewsML, GML...
Should each of those technologies be inventing its own solution? Don't you think they've tried gzip? If they all go their own routes, how will I create content that works for multiple platforms? What are the chances that it'll be royalty-free? How does it deal with language updates? If OMA and 3GPP come up with their own standards one for XHTML and the other for SVG (as was very nearly avoided) how can I mix them?
I agree that the workshop announcement has some confusing terminology. Well, that's life, it's not a document that needs to stay in the annals of history.
So we've got a set of varied technologies, all of them using XML, all of them finding issues, working on and with the Web, and having millions of users, with every indication that there are many more to come. Hmmmm. To me, it smells like a good area to produce solutions that span the XML spectrum properly. Besides, for the pleasure of pushing it a little further... audio-video, SVG, X3D, mobile, P2P, nomadic Web Services, etc. that's a bunch of areas where interesting stuff is going on, probably more interesting than the quasi-dead Web-as-just-a-desktop-browser space. And then there are more specific needs such as those for instance of mapping or CAD. They still add their numbers to the lot.
Creating solutions, whether ad hoc or not, has a cost. Do you think they'd all be asking for binary infosets if gzip worked for them? You touch only on speed and size, both of which are well-solved using gzip for good-bandwidth-fair-power situations, neither of which gzip addresses well enough for those people. And you don't mention things like dynamic update or random access, which solve important problems not addressed by gzip.
Oh, and since you're the first one to ask for proofs, could you please point me to data that sustains the claims made by ERH that you repeat here? The fact that there is no technical advantage needs benchmarks to be sustained, just as does the opposite claim. "The only motive for pushing a binary variant is proprietary vendor lock-in"? That's a pretty strong claim to be relayed unqualified. Is there proof? If that's the case, what's the point of going to the W3C? In my book, that's called FUD. As for the quality of potential resulting specs, well, I tend to leave WGs with the benefit of the doubt, especially when they currently don't exist... Coming from a heavy Java advocate, I do find that statement somewhat ironic to be honest.
--Ken, your comments on "self-describability" are right on...Any suggestion of a name that describes the ability to retrieve the node names without recourse to external data, *and* is understood immediately by most, would be much welcome.--
Hi Robin. First, if it's all the same to you, my name is "Kendall". Thanks. I think "self-documenting" is pretty good, though that's still not great; maybe "self-naming"? Ick. I personally just let this entire line of "XML advantage" drop, since I don't think it's worth that much, no matter what you call it. I mean, surely, that's not *the* major XML benefit?
--How many mobile units shipping with support for such technologies as SVG, XHTML, or SOAP does it take to make you consider it large enough for consideration?--
I don't have a number in mind; but these small devices keep getting more and more powerful. They do all sorts of processing tasks which seem to me way more intensive than processing XML. And they will continue to get more and more powerful, or so it seems safe to conclude.
A binary variant of XML strikes me as a bad thing to have, all other things being equal.
--And what's that "Web per se" business? Is it only the Web if I'm browsing porn from a beefy desktop box? Do other devices not count?--
That's a rather tendentious way of making a point (what point are you making, actually?).
--Should each of those technologies be inventing its own solution?--
Yes, perhaps they should, actually. I mean, I keep being told by "people who know" that they need domain-specific compression schemes.
--Don't you think they've tried gzip?--
I don't know what they've tried. My point about gzip was that if we want one, general standard for compressing XML content -- given the profile of deployed XML in the world and other considerations -- gzip makes the most sense to me. One, general standard for compression obviously can't optimize for every domain specific data pattern.
--If they all go their own routes, how will I create content that works for multiple platforms? What are the chances that it'll be royalty-free? How does it deal with language updates?--
These are some of the concerns I have for *any* binary variant of XML, so I certainly share these concerns. My answer to all of them is "don't do that".
--And you don't mention things like dynamic update or random access, which solve important problems not addressed by gzip.--
Yes, the number of things I didn't mention in this column expands about as quickly as the universe itself -- doesn't that make you wonder, not about the quality (or lack thereof) of the column, but rather about the Pandora's Box which you're begging to have opened? At the very least, gimme a break! It's a 1000 word column, and I clearly couldn't have mentioned *everything*.
--Oh, and since you're the first one to ask for proofs,--
Bzzzt! Wrong. You can't be said to be doing science or research w/out empirical data to back up your claims. I'm so NOT the first person to come up with that idea.
--could you please point me to data that sustains the claims made by ERH that you repeat here? The fact that there is no technical advantage needs benchmarks to be sustained, just as does the opposite claim. "The only motive for pushing a binary variant is proprietary vendor lock-in"?--
I should supply proof of claims that ERH makes, because I repeat them? Is this the *first* XML-Deviant column you've ever read? Do you realize that what you're asking would make writing a column like XML-Deviant impossible?
And his claim about motivations is probably a different kind of claim, not one which is amenable to empirical warrant anyway, so let's not mix apples and oranges.
I'm less certain about vendor lock-in than ERH, but no less worried about it.
--Coming from a heavy Java advocate, I do find that statement somewhat ironic to be honest.--
Fair enough; why not take it up with him? If I had to defend every person's claim who I quote in an XMl-Deviant column, there wouldn't be any XML-Deviant column.
I certainly have no issue with using your full name, sorry if it bothered you that I didn't do that at first.
*self-naming*
That's fine by me. I agree it's not the biggest benefit, but it is one benefit, and one people do not want to lose. Data outlives applications, and those little labels can be terribly useful in such cases.
*mobile units*
The number of mobile units in the world is several times that the number of desktop computers. Yes, they do keep getting more powerful, but no that is not sufficient to solve performance issues. Other factors, such as batteries for instance, don't follow Moore's law by any margin. The more CPU you use, the more battery you burn.
We'd all like to see those problems go away, but wishing them gone doesn't do much.
*the Web per se*
The point I'm making is that you're dismissing a large amount of terminals -- again, more than there are desktops -- with a wave of the hand as "a subset of a subset" and not being the real Web.
Well, they're on the real Web, and there are lots of it. That is just simple factual inadequacy.
*ad hoc approaches*
They have been tested for several years now. They work. But they cause no end of interoperability problems, and they've already kept some technologies from being integrated with one another. It is well time that all those that have been doing that got together for a chat.
I've been working on a way to render the need for domain-specific encoders (often done as codecs in a generic format) pretty much disappear. That's the sort of issue that can be solved in a single place, with all interested parties, much better than vertically where one knows it will fail to be reused by others.
*gzip*
Gzip does not solve the issues, full stop. Where it does, it's used. For instance, SVG mandates its support and people use it when it works. However, when you get mobile, mapping, broadcasting, elearning people coming to you saying that it isn't enough for SVG -- even though gzip'd SVG documents are on average smaller than SWF files implementing the same functionality -- well maybe at some point it's worth paying attention. They've tried gzip, it doesn't cut it for them.
*lock-in concerns*
Well, surely, in that case one should rejoy to see that sort of activity happen within the W3C rather than a variety of other places!
*dynamic update and random access*
What I point out there is that those are oft-cited requirements, which put together with size and speed makes a total of four. I believe that they've been mentionned a sufficient number of times on lists you subscribe to that I'd have hoped you'd have thought about it. It's not very pleasant to see someone take a subset of the requirements you have, point at another solution, and declare victory.
I don't think four requirements qualifies as opening a Pandora box. In my position paper, in order to encompass as much of the field as possible I've listed two or three others, but they're more marginal.
*first one to ask for proof*
In this discussion. On this page. You posted first :) I have vaguely heard of that "science" thing which you mention.
*ERH's claims*
There's a difference between just repeating someone's claims and making them almost the sole meat of a section called "What do XML Developers Say?" when those comments are from a single developer, half of which unfounded, the other half blatant FUD. It's your fault if I've been used to more fairness and even-handedness from the Deviant before.
I've taken those claims up with him, on xml-dev, two weeks ago. I have yet to receive an answer.