Comments in a "No comment" World
July 30, 2003
Q: How do I control the formatting of comments?
I have the following piece of XML:
<?xml version="1.0" encoding="UTF-8" ?> <TopLevel> <!-- TopLevel comment --> <SubLevel /> <!-- SubLevel comment --> </TopLevel>
When viewed in the browser it looks like this:
<?xml version="1.0" encoding="UTF-8" ?> <TopLevel> <!-- TopLevel comment --> <SubLevel /> <!-- SubLevel comment --> </TopLevel>
This makes it hard to determine which element the comment refers to. Is there a way to have the comments output on the same line or is this dependent on the browser? Should comments generally be placed before or after the elements they refer to?
A: Commenting XML documents is not often given much attention in XML books and articles, for at least two reasons:
- Technology: XML parsers generally pass comments on to whatever downstream applications use the parsers' output. But the XML Recommendation explicitly says they needn't do so. Thus, planning for special treatment of comments (as in your case) implies that you know your own parser's behavior.
- Human nature: XML is no different from other areas of geek focus in at least one respect: its practitioners love (in some cases the word is no exaggeration) the nuts-and-bolts of character and parameter entities, Unicode, XSLT template rules, RDF and XML Schema, to say nothing of the software to process it all. In contrast to the near-geometric beauty of document content, comments -- potentially messy, free-form, undisciplined clots of text -- are dull and of little interest. And like white space added for readability, they can seem just so much noise clogging up the signal.
A vague whiff of disrepute also lingers over markup comments, dating back to (X)HTML's accepted practice of (ironically) requiring that they be "structured" in some cases: when their contents embed scripting language code, in order to hide the code from the browser's user. (I don't think of myself as a purist, but this hack has always seemed to me more sneaky than elegant, like painting your driveway black when it really needs fresh asphalt.)
At the same time, commenting code has a long and honorable history among software developers who recognize the importance of letting other developers know what's going on, especially in particularly thorny, inscrutable passages of code. So my heart leapt up when I read your question: Here, I thought, is somebody who wants to do something good, and wants to do it the right way.
If you want to tie the display of your code to the browser medium, you've got an uphill
battle before you. You probably know that XML parsers must pass to downstream applications
all non-markup characters, including white space (blanks, newlines, tabs) included
for
readability. Furthermore, you can
explicitly force this behavior for white space by adding an
xml:space="preserve"
attribute to your document's root element. The problems
begin once the parser hands off the data to the requesting application, such as a
browser.
Unconstrained by the XML Recommendation (which affects the behavior only of parsers
or
"processors" as the spec calls them) or, really, by much of anything except perhaps
their
developers' nobility of purpose, browsers can do pretty much whatever they want with
comments or whitespace.
But if you're determined to do this right, and do it in a browser to boot, then you can always rely on XSLT to force the formatting for you. This can effect almost any kind of look you want, from a bare-bones "plain old text" to a more elaborate JavaDoc-like appearance. As you've already seen, the challenge is to display not just the document's contents, but its structure.
One simple approach is suggested by G. Ken Holman's SHOWTREE stylesheet, available in both non-Microsoft and Microsoft-specific versions. Holman's stylesheet uses a numbering system to indicate "how far down" a particular node exists in the document tree. For example, applied to the document you supplied in your question, the SHOWTREE output looks like this:
SHOWTREE Stylesheet - http://www.CraneSoftwrights.com/resources/ Processor: SAXON 6.2.2 from Michael Kay 1 Proc. Inst. 'xml-stylesheet': {type="text/xsl" href="showtree-20000610.xsl" } 2 Element 'TopLevel': 2.1 Text (TopLevel): { } 2.2 Comment (TopLevel): { TopLevel comment } 2.3 Text (TopLevel): { } 2.4 Element 'SubLevel' (TopLevel): 2.5 Text (TopLevel): { } 2.6 Comment (TopLevel): { SubLevel comment } 2.7 Text (TopLevel): { }
A fancier approach is the Pretty XML Tree Viewer developed by Mike Brown (with help from Jeni Tennison). This actually consists of not just an XSLT stylesheet, but a CSS stylesheet as well. The CSS stylesheet augments the display of the XHTML code to which your XML is (via the XSLT stylesheet) transformed. Brown's stylesheets together render your sample document like this:
By the way, both Holman's and Brown's stylesheets highlight one potentially troublesome
aspect of the code fragment you supplied. This has to do with the placement of comments
and
white space not physically, but structurally, relative to their corresponding elements.
To
wit: Everything in the document is a child of the TopLevel
element. Even the
comment which (to a human reader's eye) "belongs with" the SubLevel
element
actually is a child of TopLevel
-- because SubLevel
, being an
empty element, contains nothing at all.
In addition, the whitespace (sometimes consisting of a single newline) can actually hamper the output if you're going the XSLT route. For instance, you can see four text nodes you probably weren't even thinking of as present in your document. In this case, you might want to strip out all "insignificant" whitespace at the time you do the transformation.
On the issue of where to place the comments, it depends on whether the comments are brief and in-line, as in your example, or set up as larger blocks. For the former, at the end of the line seems to make better sense. As for the latter, my own preference is to read the comment before the corresponding bit of code. (In everyday terms, this translates to a distaste for "What the heck was that?" experiences.) While there's no official rulebook for such things, I'd bet that this is the de facto standard for most block-style comments in XML documents as in programming source code.
Q: Any tools for rendering an XML schema's annotations?
I am looking for a way (in XSLT, I guess) to extract the documentation in an XML schema
contained inside (standard) xsd:annotation
/xsd:documentation
elements to a human-readable document, ideally a DocBook or HTML.
A: Start with Chris Maden's xsd2html ("XSD-documenting XSLT
stylesheet"), tweaking it to your own needs. (Note especially that the XML Schema
namespace prefix used in this stylesheet is xs:
, not xsd:
. If you
prefer the latter prefix, be sure at least to change all occurrences of xs
: to
xsd:
in your copy of Maden's stylesheet.) The result tree vocabulary in this
case is HTML.
Ironically, this schema-documenting tool is itself undocumented, but the simple structure
makes tweaking easy. Each XML Schema element and attribute has at least one template
rule.
In some cases, specific occurrences of a given element within the node tree are elevated
to
special status. For instance, xs:documentation
elements are generally
transformed to simple HTML p
elements, except in the case of the first
xs:documentation
child of the first xs:annotation
child of the
root xs:schema
element. This first (and presumably, "most important")
xs:documentation
element's content is promoted to full-blown h1
status for large display.
What does output of Maden's stylesheet look like? Well, we can use it to view (for example) the XML Schema 1.0 " Schema for Schema Structures." This is a heavily-annotated schema (as well it should be). One fragment from the beginning of this schema looks like the following:
<xs:schema targetNamespace="http://www.w3.org/2001/XMLSchema" blockDefault="#all" elementFormDefault="qualified" version="Id: XMLSchema.xsd,v 1.48 2001/04/24 18:56:39 ht Exp " xmlns:xs="http://www.w3.org/2001/XMLSchema" xml:lang="EN"> <xs:annotation> <xs:documentation source="http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/structures.html"> The schema corresponding to this document is normative, with respect to the syntactic constraints it expresses in the XML Schema language. The documentation (within <documentation> elements) below, is not normative...</xs:documentation> </xs:annotation>
<xs:annotation> <xs:documentation> The simpleType element and all of its members are defined in datatypes.xsd</xs:documentation> </xs:annotation> ...[etc.]... </xs:schema>
As you can see, there's a lengthy xs:documentation
element at the outset,
followed by a shorter one which precedes the (imported) declaration of the
simpleType
element. The first xs:documentation
element renders
as follows in a browser:
And the second like this:
One of the first things you might want to tweak, obviously, is the set of assumptions
about
the first xs:documentation
element in the schema. In the first place, its
transformation to an HTML h1
element looks freakishly large; at least it does
when the element's content is lengthy. Also, the element's content all gets dumped
into the
browser title bar.
Also in XML Q&A |
|
I asked Maden what his real-world experience had been with using the xsd2html stylesheet. He said he's used it principally in work for clients, but added: "Be sure to mention that this is very beta -- or even alpha -- but it's definitely open source, and I'd encourage folks to share improvements they make." He assured me any such improvements "will be credited to [their developers] in future releases."
I also checked with Eric van der Vlist, author of O'Reilly & Associates' XML Schema, to see if he knew of any other tools to perform this task.
Van der Vlist says he applied a series of home-grown multiple XSLT transformations ("nothing I would dare to show, at least nothing generic enough") to the W3C XML Schema schema (referenced above) in order to produce chapters 15 and 16 of his book. The main task: "to simplify [the schema] into something usable." He added, "That's probably one of trickiest jobs I have done with XSLT".
I can believe it. Luckily, if all you are really interested in are the
xs:annotation
and xs:documentation
elements, your task should be
much, much simpler.