Composition
July 20, 2005
"If you use inheritance where composition will work, your designs will become needlessly complicated." —Bruce Eckel, Thinking in Java
This week's column takes a look at two new specifications that are winding through the W3C Recommendation Track, xml:id as a Proposed Recommendation and XLink 1.1 as a Last Call Working Draft. These two specifications share an important common trait: neither is a standalone vocabulary, but rather they are intended to be combined into larger vocabularies. The formal name for "combining stuff" this way is "composition," this week's topic. First, a thirty-second review of some basic terms for any nonprogrammers.
Code written by someone who has just learned object-oriented techniques is usually
pretty
easy to spot. One of the main tells is enthusiastic overuse of derivation, that is
writing
new code (called a "class") explicitly based on existing code, modulo specific
changes. Derivation is an important tool for expressing clear is-a relationships.
It would
make sense to derive a new class, say Circle
or Polygon
, from a
more fundamental Shape
class, because a polygon or circle really is a kind of a
shape. More to the point, a sign that derivation makes sense is when circles and polygons
can be treated more generally as shapes, for example the code might say, "Attention
all
shapes: render yourselves to SVG." Other cases, however, are not so clear cut. For
example, is an XMLDom
class really a derivation of String
?
Probably not. In designing software, many such situations arise; experts such as Bruce
Eckel
recommend sticking with composition in those cases.
In the realm of standards, both derivation and composition take place. Derivation might occur when one specification forms an extended subset of another—almost always a sign that something has gone horribly wrong. At a basic level, though, composition happens all the time in the normal course of producing specifications. A look at the XML 1.1 specification shows eight "normative" references, including the Unicode specification, which forms the basis for much of XML's interoperability.
Designing an XML vocabulary is a special case of producing a specification. Despite XML Namespaces finalization more than five years ago, architects are still uncovering trouble in applying composition in XML, more commonly referred to under the banner "compound documents." The first to look at this week is a revision to XLink.
XLink
XLink 1.0 became a W3C Recommendation in June 2001, amid a fair amount of controversy. Support for the standard has been slow to come; in a March 2002 article titled "XLink: Who Cares?" Bob DuCharme noted that only seven partial implementations were available. In fact, the death of XLink has become one of several permathreads on the xml-dev mailing list.
The actual changes in 1.1 are modest, and previously documented. On the xml-dev Len Bullard pondered:
After the thread a while back about the death of XLink, I keep finding examples of it being cited, particularly in the OGC [Open GIS Consortium] literature...I wonder if there is a term for this phenomenon: polite inclusion of other works that are not successful in terms of ground support
One of the most prominent examples of politely including XLink has been inside SVG
(scalable vector graphics), but even there, DTD (document type definition) tricks
were used
to remove the need for some of the explicit markup. XLink 1.1 legitimizes the use
of sole
xlink:href
attributes without need of any further tricks, which is, of
course, a pleasant development.
Even so, the changes in version 1.1 don't address the more fundamental complaints with XLink. For example, XLink 1.1 is still incompatible with any version of XHTML, but especially XHTML 2.0, which uses two distinct attribute names for different kinds of links.
However, the remainder of the thread on xml-dev consisted largely of folks pointing out increasing usage of XLink: Geoffrey Shuetrim pointed out use in XBRL, Alexander Johannesen pointed out Topic Maps XTM, and Rick Jelliffe highlighted his company's Schema Management Tool, which uses XLink extensively under the hood.
Does this indicate a resurgence in XLink's fortunes? Bullard concludes by saying that "the usefulness of architectural forms is apparent," a sensible position.
Prediction: XLink 1.1 isn't going to do much beyond encouraging uses where it has historically been politely included.
xml:id
Proposals for something like xml:id
go back a long stretch of time. Even in
November of 2001, Leigh Dodds, in this very column, outlined several proposals including
the
one we see today substantially as a Proposed Recommendation. Tim Bray (famously) wrote:
This is in danger of tripping over what is maybe the #1 gaping architectural hole as regards XML & the Web. The problem is that at the moment, given some arbitrary XML, there is no good way to determine what's an ID without recourse to some external resource like a DTD or schema, and that, to use a technical term, sucks.
Who's to say that long-standing problems with XML never get solved? But the tricky
thing
about the way specifications interact under composition is that solving one problem
often
uncovers another. In this case, the use of the reserved xml
prefix, originally
justified for this architectural hole as regards XML, learned a new trick, or at least
unlearned an old one.
The attributes xml:lang
and xml:space
, both defined as part of
XML itself, as well as later addition xml:base
, function over a scope that
includes child elements. For example, defining any of these attributes once on the
root
element would have an effect on every element in the document. But xml:id
, in
contrast, doesn't follow this assumption: it applies to a single element. Specifications
that made assumptions about scoping of xml
-namespaced attributes will run into
problems if documents with xml:id
become common. Canonical XML (including its
influence on XML Signature) is the most often cited example here.
Prediction: xml:id will rapidly find its way into vocabularies that need well-known
IDs,
especially without DTD processing. Other specifications that over assumed about
xml
-attribute scoping will quickly come into line.
Conclusion
In Java programming, composition is straightforward and almost foolproof. With specifications, however, it's still possible for all kinds of unexpected interactions to take place. Further, the number of technical specifications seems to be steadily increasing with time, so that chances of conflicts continue to rise. The xml:id example showed that sometimes even a nearly-unrelated spec can come along and throw chaos into your world by dismantling fundamental assumptions.
Prediction: more problems are yet to be uncovered in the various combinations of existing XML specifications, to say nothing of the new ones. Somehow, we'll keep muddling through.
Births, Deaths, and Marriages
New versions of nxslt
Versions 1.6 and 2.0Beta1 of the nxslt, a free .NET XSLT command-line utility from Oleg Tkachenko.
Arbortext Bought
Atom 1.0 Baked
Documents and Data
An identifier by any other name...
Rick Jelliffe on untangling the Schema spaghetti
If you're into the podcast thing, this one looks interesting.