XSLT Surgery
April 25, 2001
Q: How do I transform an XML document that I can't edit?
I have a source XML file which I don't control. I can't edit it. But I want to display
that XML file on my site using XSLT. I know I need to add an
xml-stylesheet
PI to a document to associate it with an XSLT stylesheet, like
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
But since this XML file is maintained by somebody else, how do I link my stylesheet
to
it?
A: First -- this has nothing to do with XML per se and everything to do
with common courtesy -- get permission to use someone else's content verbatim. If
courtesy doesn't appeal to you as a motive, then think of liability. Like it or not,
the Web
nowadays is a place where you can get in trouble simply for linking to someone else's
content, let alone cribbing it. I'd be only half surprised to learn that someone was
preparing to sue someone else for just looking at the plaintiff's Web content.
Once you've gotten permission, here's one approach to solving your problem. You want
to
create an XML document which simply includes the targeted content. In addition, this
XML document will contain your xml-stylesheet PI
. So it would look something
like
<?xml version="1.0"?>
<!DOCTYPE wrapper [
<!ENTITY incl_content
SYSTEM "uri of included content">
]>
<?xml-stylesheet
type="text/xsl" href="text.xsl"?>
<wrapper>
&incl_content;
</wrapper>
Replace uri of included content
with the included document's URI.
The wrapper
element is necessary because parsers will reject any document that
doesn't have its own root element. (You can call this element type by some other name
if you
want, although wrapper
seems descriptive enough.) Just remember that the
stylesheet may need to take into account the fact that the document it's processing
has such
an element as its root element, within which appears all the content of the included
document.
This will work, as long as the following conditions are true:
- The document to be included must, at a minimum, be well-formed XML.
- If the document to be included comes with its own
xml-stylesheet
PI, it may override your "text.xsl" stylesheet (depending on your XSLT processor).
As a final note, remember to include in "text.xsl" a template which instantiates a credit to the included page's author. Something like
<xsl:template match="/">
<html>
...other bits of templates,
calls to other template rules, etc....
<body>
...other bits of templates, calls to other template rules, etc....
<h5>Above material included by permission from
its author, Joe Blow.
Copyright 2001 by Joe Blow.</h5>
</body>
</html>
</xsl:template>
Q: How can I use two different XSLT stylesheets for the same XML document?
I have a XML file with data both in Portuguese and in English, and I want a link to
the
English version in the index.xml file.
A: This is a variation of the first question. The trick is not to associate
that Portuguese/English XML document itself with any stylesheet. Instead, relegate
that association to XML documents which include the Portuguese/English document and
link to the appropriate stylesheet. So your index.xml file might look something
like the following (of course, substitute the correct element and attribute names
for the
bogus ones I'm using here):
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE index_root [
<!ENTITY atilde "ã" >
]>
<index_root>
<link
lang_page="portuguese.xml">Versão portuguese</link>
<link
lang_page="english.xml">English version</link>
</index>
When a user opens index.xml, they'll see two links. Selecting the one which
displays as "Versão portuguese" will open a document named portuguese.xml; the
one displaying "English version," a document named english.xml. (I'm greatly
oversimplifying the potential problem of how to do linking in XML. If you're using
XHTML in
the index.xml document, replace the link
elements above with
a
elements, and the lang_page
attributes with href
attributes.)
The portuguese.xml and english.xml documents would be nearly identical,
differing only in their xml-stylesheet
PIs:
<?xml version="1.0"?>
<!DOCTYPE wrapper [
<!ENTITY incl_content
SYSTEM="uri of Portuguese/English document">
]>
<?xml-stylesheet
type="text/xsl" href="uri of language-specific stylesheet"?>
<wrapper>
&incl_content;
</wrapper>
Like the last question, replace uri of Portuguese/English document
with
your Portuguese/English data document's URI. In portuguese.xml, replace uri of
language-specific stylesheet
with the URI of the stylesheet for processing
Portuguese-language data; in english.xml, with that for processing English-language
data. As
in the last question, too, don't forget that the stylesheets may need to take into
account
the presence of the artificial wrapper
element.
Q: How do I easily define many values for variables in a multi-language lexicon?
I'm implementing a four-language lexicon with XSLT variables. For example,
<td><xsl:value-of select="$account-number"/></td>
where the value of the account-number variable depends on some other variables, as follows:
<xsl:variable name="account-number">
<xsl:choose>
<xsl:when
test="$is-en">Account Number</xsl:when>
<xsl:when test="$is-fr">Numero
de Compte</xsl:when>
<xsl:when test="$is-he">Mispar
Heshbon</xsl:when>
<xsl:when test="$is-ru">Choter</xsl:when>
</xsl:choose>
</xsl:variable>
As you can see, referencing the variable is convenient, but defining it is verbose. Is there an easier way?
A: If verbosity is undesirable, XSLT is definitely not the language for you. Seriously, though, you're probably stuck with what you've already settled on.
The possible good news -- depending on exactly what your source tree looks like and
how
much freedom you have with it -- is that the obstacle may be surmountable, with a
simple
change to your source tree's structure. Specifically, the obstacle seems to me to
be all
those secondary variables with a value of true or false: is-en
,
is-fr
, is-he
, and is-ru
.
They bother me for two reasons. First, presumably they're mutually exclusive values;
only
one will be true at any given time. And second, if that first assumption is correct,
this
seems like an ideal situation in which to use the built-in xml:lang
attribute.
(It's "built-in" in the sense that XML 1.0 defines it; and, since all other
XML-related specs devolve from that one, xml:lang
should be respected by any
compliant software.)
You can't use just any values for the xml:lang
attribute. The spec refers to a
number of other standards for information about allowable values, especially IETF
RFC 1766, ISO 639, and ISO 3166. The
allowable values are internationally accepted two- or three-character codes unique
to each
country or language. You're already well on your way to using this standard, since
"en",
"fr", "he", and "ru" -- which map to the names of your variables -- are all legitimate
language codes; hence, allowable values for the xml:lang
attribute.
(As a general rule, country codes are uppercase; language codes, lowercase.)
So you might have a source tree structure like this, using xml:lang
instead of
multiple two-valued attributes (one per language):
<terms>
<term id="num">
<term-lang xml:lang="en">Account
Number</term-lang>
<term-lang xml:lang="fr">Numero de
Compte</term-lang>
<term-lang xml:lang="he">Mispar
Heshbon</term-lang>
<term-lang xml:lang="ru">Choter</term-lang>
</term>
...etc. - other terms as needed...
</terms>
Now you can assign the value of the account-number
variable without using an
xsl:choose
block at all, like
<xsl:variable name="account-number"
select="//term-lang[@xml:lang=$page-lang and
../@id='num']"/>/>
Then your example result tree would look like this, given a value of "HE" for the
page-lang
variable:
<td>Mispar Heshbon</td>
(Of course, you could then proceed to turn the searched-for id
value into a
variable as well, so you didn't need to hard-code it.)
Q: I'm searching for software which automatically generates XSL. Do you know of any?
A: Assuming by "XSL" you mean XSLT, I recently looked at a product which does just that. Sort of.
It's called XSLWiz and is published by EBProvider. The general idea is that you feed it descriptions of the source document (input) and the destination (output), and then connect a point from the former to a corresponding point on the latter, using a simple drag-and-drop method. The descriptions you provide to XSLWiz are in the form of DTDs or XML Schemas; if you have neither, you can supply a document instance in its place, and XSLWiz will infer the schema from it.
(XSLWiz actually works against XML Schema files only. If you don't provide your descriptions in that form, then XSLWiz builds them from what you do provide. One implication: You can use XSLWiz just to create schema files from DTDs or document instances, without ever generating a line of XSLT.)
After going through the product tutorial to get an idea of how it worked, I tested XSLWiz using my own FlixML vocabulary as the source and the XHTML 1.0 Transitional DTD as the destination. This didn't work; even when I mapped only a few connections between the former and the latter, XSLWiz complained that there were too many.
So I scaled down my expectations a bit. I stripped both vocabularies to about a dozen elements and attributes apiece. This worked fine. So somewhere between the two extremes is where XSLWiz's limits lie. (I couldn't find a mention of any such limit on the EBProvider site.)
The XSLT code that XSLWiz generated was rather idiosyncratic. For instance, the entire
result tree was instantiated by a single template rule (xsl:template
element).
I'm reasonably certain that you'd need to hand-tweak the generated XSLT, if for no
other
reason than performance.
Although apparently written in Java, XSLWiz runs only on Microsoft Windows platforms. The product's stated requirements do not include the Internet Explorer browser, nor even the MSXML XML/XSLT processor, so I'm not sure what the Windows dependencies are. A free, seven-day timed evaluation version is available for download; purchase price was a rather hefty, given the product's apparent limitations, $995 per license.