Using XSLT to Fix Swing
August 2, 2006
Swing is a Java framework for building cross-platform graphical user interfaces. Many areas of the framework have recently enjoyed some compelling improvements. The portion that renders HTML hasn't been so lucky. When Swing debuted, HTML was at a youthful version 3.2, and practices such as nestling paragraphs within tags could still be considered newfangled. Today, they are familiar if not mandatory, and it's easy to consider it a bug when Swing inserts an unrequested line break between a formally declared paragraph and its preceding headline.
One solution is to abandon Swing's HTML Renderer and JEditorPane
in favor of
another renderer, such as the javadesktop Flying Saucer project or a native HTML browser. This approach won't help if you are
working with frameworks such as JavaHelp, which is hardwired to use Swing's renderer.
Luckily, there's a straightforward solution involving XSLT.
Screenshot of a JEditorPane incorrectly displaying HTML with an extra line break
Extensible Stylesheet Language Templates
XSLT (eXtensible Stylesheet Language Templates) is origami for XML. XSLT can easily fold one kind of XML document into any other kind you can dream up. It accomplishes this by matching a set of templates to patterns of varying specificity and position in a target XML document. These templates introduce new patterns into an output document, constructed from information found in the target. This lends itself to very subtle and recursive use.
Conveniently for us, Java has had XSLT built-in since J2SE 1.4.
We'll need to format our HTML data as XHTML. XHTML is a flavor of XML and accordingly, is susceptible to the wiles of XSLT.
<html>
|
→ |
<?xml version="1.0" encoding="UTF-8"?>
|
source.html | source.xhtml |
As you can see, the only significant differences here are the XML header and the self-closing tags. If your source HTML has validity problems or you're just lazy, TagSoup is an excellent tool for automating the process.
Using XSLT, we can convert our original XHTML document into an HTML 3.2 representation that will look good in Swing's HTML renderer.
The Identity Transform
We'll start with an identity transform that will more or less faithfully reproduce the structure of our original document.
<?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.0"> <xsl:output method="html" version="3.2" /> <xsl:strip-space elements="*" /> <xsl:template match="*"> <xsl:element name="{local-name()}"> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:template> <xsl:template match="@*"> <xsl:attribute name="{local-name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:template> </xsl:stylesheet>
The html
method attribute in the xsl:output
tag will prompt empty
output tags to refrain from closing themselves. That's important, because the practice
confounds Swing's HTML Renderer.
The *
wildcard character in the match
attribute of the first
template will pull in a source element. Initially, this will be the source document
root.
The stylesheet will copy the matched element to the output document. Finally, the
template
will start over with another element or will release control to the next template
for
processing child attributes.
The @*
character sequence in the second template's match
attribute will target a source attribute. The stylesheet will copy the attribute into
the
output document.
In this manner, all elements and attributes will find their way over to the output document.
Save the stylesheet above to a file named identity.xsl. Pull up a command shell and type
java org.apache.xalan.xslt.Process -IN source.xhtml -XSL identity.xsl -OUT output.html
This should invoke a handy main
method in Xalan, Java's default XSLT
implementation. If you're using a JDK more recent than 1.4, you may need to download a fresh copy of Xalan and put it into
your class path, because the more recent bundled versions seem to lack this interface
on at
least one platform.
Review the output file and verify that your basic document structure is unchanged. In fact, it should closely resemble the pre-XHTML version.
Changing the Document Structure
Next, we'll tweak the stylesheet to target paragraph tags that are immediately preceded by headline tags, and liberate their content.
<?xml version="1.0" encoding="UTF-8" ?> <!-- Convert XHTML to HTML that will look good when rendered by Java Swing's JEditorPane component. The result should not contain the pattern <h3>my headline</h3> <p>my paragraph</p> Instead, it will have <h3>my headline</h3> my paragraph --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xhtml="http://www.w3.org/1999/xhtml" version="1.0"> <xsl:output method="html" version="3.2" /> <xsl:strip-space elements="*" /> <xsl:template match="xhtml:p"> <xsl:variable name="previous-node-name" select="name(preceding-sibling::*[1])" /> <xsl:choose> <!-- To get Swing's JEditorPane to render correctly, we need to remove the paragraph --> <xsl:when test="previous-node-name='table'"> <xsl:apply-templates /> </xsl:when> <xsl:when test="string-length($previous-node-name)=2 and substring($previous-node-name,1,1)='h'"> <xsl:apply-templates /> </xsl:when> <xsl:otherwise> <xsl:element name="p"> <xsl:apply-templates /> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="*"> <xsl:element name="{local-name()}"> <!-- go process attributes and children --> <xsl:apply-templates select="@*|node()"/> </xsl:element> </xsl:template> <xsl:template match="@*"> <xsl:attribute name="{local-name()}"> <xsl:value-of select="."/> </xsl:attribute> </xsl:template> </xsl:stylesheet>
Save this to a new stylesheet file and try it out. The result should render in Swing without the extraneous line break.
If the prospect of calling this output format "HTML" seems a little perverse to you, feel free to give it a custom file extension such as .swing-html. Swing won't care.
Namespaces
You may have noticed the xhtml:
prefix in front of matched element names in
our stylesheet. This refers to a namespace defined in the attributes of the
stylesheet's root document element. Without this prefix, our stylesheet wouldn't recognize
any of the source hypertext markup. The reason for this is that the elements of XHTML,
body
and table
and so on, are considered to reside in a special
and lexically distinct land that is unsentimentally called
http://www.w3.org/1999/xhtml
.
The local-name()
function transcribes base element names, but withholds their
XHTML namespace from the HTML 3.2 output document. It was the need to filter this
namespace
that precluded our use of a more common and implicit implementation of the identity
transform involving the xsl:copy
element. But it's just as well; I find our
implementation easier to follow, if a little verbose.
The proper handling of namespaces is probably the trickiest part of XSLT.
Further Changes to the Document Structure
Let's add one more workaround. This one will do some juggling to prevent Swing from rendering extraneous line breaks before named anchors that immediately follow other elements.
<?xml version="1.0" encoding="UTF-8"?>
|
source2.xhtml |
The new templates below should be added to the existing stylesheet file after the template that matches paragraphs, and before the ones that match everything.
<!-- workaround for JEditorPane's tendency to issue an extraneous line break upon encountering a named anchor According to the suggestion at http://docs.sun.com/source/819-0913/release/limitations.html The best way to work around this problem is to nest the text of the target within the anchor tag. For example: <H2><a name="widgets">Working With Widgets</a></H2> --> <xsl:template match="xhtml:a"> <xsl:variable name="next-node-name" select="name(following-sibling::*[1])" /> <xsl:choose> <!-- If this is named and href-less, and the next element is a headline, we want to shift that headline inside this --> <xsl:when test="not(@href) and @name and string-length($next-node-name)=2 and substring($next-node-name,1,1)='h'"> <xsl:element name="a"> <xsl:apply-templates select="@*|node()|following-sibling::*[1]"> <xsl:with-param name="shifted">yes</xsl:with-param> </xsl:apply-templates> </xsl:element> </xsl:when> <xsl:otherwise> <xsl:element name="a"> <xsl:apply-templates select="@*|node()" /> </xsl:element> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="xhtml:h1|xhtml:h2|xhtml:h3|xhtml:h4|xhtml:h5|xhtml:h6"> <xsl:param name="shifted">no</xsl:param> <xsl:variable name="previous-node-name" select="name(preceding-sibling::*[1])" /> <xsl:if test="$shifted='yes' or $previous-node-name!='a' or preceding-sibling::*[1]/@href or not(preceding-sibling::*[1]/@name)"> <xsl:element name="{local-name()}"> <xsl:apply-templates /> </xsl:element> </xsl:if> </xsl:template>
The first new template will shift headlines inside named anchors. The second template will transcribe any unshifted headlines that are encountered. But it will refrain from re-outputting shifted headlines when iterating past their original locations in the source document.
The completed stylesheet will produce documents that are much better able to weather the transition from your favorite web browser to Swing.
Coordination
These XSLT revisions can be applied at runtime via the
javax.xml.transform.Transformer
class and what is sometimes referred to as
the TrAX API.
Transformer autobot = TransformerFactory.newInstance().newTransformer( new StreamSource( ClassLoader.getSystemResourceAsStream("xsl/xhtml2swing.xsl")); autobot.transform(new StreamSource( ClassLoader.getSystemResourceAsStream("resources/source.xhtml"), new StreamResult(outputStream)); outputStream.flush();
Alternately, you could generate HTML at compile time, and package it up with your
distribution. The Ant
target for this might look like
<target name="dist" depends="compile"> <style in="${dist.docs.dir}/source.xhtml" style="${scripts.dir}/xhtml2swinghtml.xsl" processor="trax" out="${resources.dir}/content.html" force="true" /> <jar jarfile="${dist.purejava.dir}/${ant.project.name}"> <fileset dir="${classes.dir}" /> <fileset dir="${resources.dir}" /> </jar> </target>
Then, you would read the modified document from a resource stream with the following Java code
JEditorPane jEditorPane = new JEditorPane(); URL docURL = ClassLoader.getSystemResource("resources/content.html"); jEditorPane.setPage(docURL);
If you enlist TagSoup's parser (org.ccil.cowan.tagsoup.Parser
), you could even
cut out the step of generating XHTML files and process raw HTML directly.
Future Directions
Using these techniques, you should be able to accommodate the rendering limitations of any HTML renderer, not just Swing's.
Once you get comfortable with XSLT, you may want to use it upstream from HTML to spruce up your content model. If you start with a more conceptual document format such as DocBook XML (not to be confused with DocBook SGML) or poorly named Extensible Stylesheets Language Format Objects (XSL-FO), then XHTML can become just another output format alongside PDF, RTF, and JavaHelp.
You can write the stylesheets to accomplish this yourself, or appeal to DocBook XSL (not to be confused with DocBook DSSL) and Apache Forms Object Processor (FOP).
You might also want to experiment with other XSLT implementations. In particular, Saxon supports the XSLT 2.0 specification, which includes many improvements such as more flexible grouping and standardized data-type conversions.