Using XSLT to Fix Swing

August 2, 2006

Swing is a Java framework for building cross-platform graphical user interfaces. Many areas of the framework have recently enjoyed some compelling improvements. The portion that renders HTML hasn't been so lucky. When Swing debuted, HTML was at a youthful version 3.2, and practices such as nestling paragraphs within tags could still be considered newfangled. Today, they are familiar if not mandatory, and it's easy to consider it a bug when Swing inserts an unrequested line break between a formally declared paragraph and its preceding headline.

One solution is to abandon Swing's HTML Renderer and JEditorPane in favor of another renderer, such as the javadesktop Flying Saucer project or a native HTML browser. This approach won't help if you are working with frameworks such as JavaHelp, which is hardwired to use Swing's renderer.

Luckily, there's a straightforward solution involving XSLT.

Screenshot of a JEditorPane incorrectly displaying HTML with an extra line break

Extensible Stylesheet Language Templates

XSLT (eXtensible Stylesheet Language Templates) is origami for XML. XSLT can easily fold one kind of XML document into any other kind you can dream up. It accomplishes this by matching a set of templates to patterns of varying specificity and position in a target XML document. These templates introduce new patterns into an output document, constructed from information found in the target. This lends itself to very subtle and recursive use.

Conveniently for us, Java has had XSLT built-in since J2SE 1.4.

We'll need to format our HTML data as XHTML. XHTML is a flavor of XML and accordingly, is susceptible to the wiles of XSLT.

`<html> <body> line 1<br> line 2<br> line 3 <h3>my headline</h3> <p>my paragraph</p> </body> </html>`	→	`<?xml version="1.0" encoding="UTF-8"?> <html xmlns="http://www.w3.org/1999/xhtml"> <body> line 1<br/> line 2<br/> line 3 <h3>my headline</h3> <p>my paragraph</p> </body> </html>`
source.html		source.xhtml

As you can see, the only significant differences here are the XML header and the self-closing tags. If your source HTML has validity problems or you're just lazy, TagSoup is an excellent tool for automating the process.

Using XSLT, we can convert our original XHTML document into an HTML 3.2 representation that will look good in Swing's HTML renderer.

The Identity Transform

We'll start with an identity transform that will more or less faithfully reproduce the structure of our original document.

<?xml version="1.0" encoding="UTF-8"?>

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 

    xmlns:xhtml="http://www.w3.org/1999/xhtml"

    version="1.0">

    

  <xsl:output method="html" version="3.2" />

  

  <xsl:strip-space elements="*" />

  

  <xsl:template match="*">

    <xsl:element name="{local-name()}">

      <xsl:apply-templates select="@*|node()"/>

    </xsl:element>

  </xsl:template>

  

  <xsl:template match="@*">

    <xsl:attribute name="{local-name()}">

      <xsl:value-of select="."/>

    </xsl:attribute>

  </xsl:template>

  

</xsl:stylesheet>

The html method attribute in the xsl:output tag will prompt empty output tags to refrain from closing themselves. That's important, because the practice confounds Swing's HTML Renderer.

The * wildcard character in the match attribute of the first template will pull in a source element. Initially, this will be the source document root. The stylesheet will copy the matched element to the output document. Finally, the template will start over with another element or will release control to the next template for processing child attributes.

The @* character sequence in the second template's match attribute will target a source attribute. The stylesheet will copy the attribute into the output document.

In this manner, all elements and attributes will find their way over to the output document.

Save the stylesheet above to a file named identity.xsl. Pull up a command shell and type

java org.apache.xalan.xslt.Process -IN source.xhtml -XSL identity.xsl -OUT output.html

This should invoke a handy main method in Xalan, Java's default XSLT implementation. If you're using a JDK more recent than 1.4, you may need to download a fresh copy of Xalan and put it into your class path, because the more recent bundled versions seem to lack this interface on at least one platform.

Review the output file and verify that your basic document structure is unchanged. In fact, it should closely resemble the pre-XHTML version.

Changing the Document Structure

Next, we'll tweak the stylesheet to target paragraph tags that are immediately preceded by headline tags, and liberate their content.

<?xml version="1.0" encoding="UTF-8" ?>



<!--

Convert XHTML to HTML that will look good when rendered by Java Swing's

JEditorPane component.



The result should not contain the pattern

  <h3>my headline</h3>

  <p>my paragraph</p>

  

Instead, it will have

  <h3>my headline</h3>

  my paragraph

-->



<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 

    xmlns:xhtml="http://www.w3.org/1999/xhtml"

    version="1.0">

    

  <xsl:output method="html" version="3.2" />

  

  <xsl:strip-space elements="*" />

    

  <xsl:template match="xhtml:p">

    <xsl:variable name="previous-node-name" select="name(preceding-sibling::*[1])" />

    <xsl:choose> <!-- To get Swing's JEditorPane to render correctly, we need to

                      remove the paragraph -->

      <xsl:when test="previous-node-name='table'">

        <xsl:apply-templates />

      </xsl:when>

      <xsl:when test="string-length($previous-node-name)=2 and

          substring($previous-node-name,1,1)='h'">

        <xsl:apply-templates />

      </xsl:when>

      <xsl:otherwise>

        <xsl:element name="p">

          <xsl:apply-templates />

        </xsl:element>

      </xsl:otherwise>

    </xsl:choose>

  </xsl:template>

  

  <xsl:template match="*">

    <xsl:element name="{local-name()}">

      <!-- go process attributes and children -->

      <xsl:apply-templates select="@*|node()"/>

    </xsl:element>

  </xsl:template>

  

  <xsl:template match="@*">

    <xsl:attribute name="{local-name()}">

      <xsl:value-of select="."/>

    </xsl:attribute>

  </xsl:template>

  

</xsl:stylesheet>

Save this to a new stylesheet file and try it out. The result should render in Swing without the extraneous line break.

If the prospect of calling this output format "HTML" seems a little perverse to you, feel free to give it a custom file extension such as .swing-html. Swing won't care.

Namespaces

You may have noticed the xhtml: prefix in front of matched element names in our stylesheet. This refers to a namespace defined in the attributes of the stylesheet's root document element. Without this prefix, our stylesheet wouldn't recognize any of the source hypertext markup. The reason for this is that the elements of XHTML, body and table and so on, are considered to reside in a special and lexically distinct land that is unsentimentally called http://www.w3.org/1999/xhtml.

The local-name() function transcribes base element names, but withholds their XHTML namespace from the HTML 3.2 output document. It was the need to filter this namespace that precluded our use of a more common and implicit implementation of the identity transform involving the xsl:copy element. But it's just as well; I find our implementation easier to follow, if a little verbose.

The proper handling of namespaces is probably the trickiest part of XSLT.

Further Changes to the Document Structure

Let's add one more workaround. This one will do some juggling to prevent Swing from rendering extraneous line breaks before named anchors that immediately follow other elements.

 <?xml version="1.0" encoding="UTF-8"?>
 <html
                                 xmlns="http://www.w3.org/1999/xhtml">
 <body>
 <p>my
                                 paragraph</p>
 <a name="chester" />
 <h3>my
                                 headline</h3>
 </body>
 </html>

source2.xhtml

The new templates below should be added to the existing stylesheet file after the template that matches paragraphs, and before the ones that match everything.

<!-- workaround for JEditorPane's tendency to issue an

extraneous line break upon encountering a named anchor



According to the suggestion at

http://docs.sun.com/source/819-0913/release/limitations.html



   The best way to work around this problem is to nest the text of the target

   within the anchor tag. For example:

   <H2><a name="widgets">Working With Widgets</a></H2>

-->



<xsl:template match="xhtml:a">

  <xsl:variable name="next-node-name" select="name(following-sibling::*[1])" />

  <xsl:choose>

    <!-- If this is named and href-less, and the next element is a headline,

         we want to shift that headline inside this -->

    <xsl:when test="not(@href) and @name and string-length($next-node-name)=2

        and substring($next-node-name,1,1)='h'">

      <xsl:element name="a">

        <xsl:apply-templates select="@*|node()|following-sibling::*[1]">

          <xsl:with-param name="shifted">yes</xsl:with-param>

        </xsl:apply-templates>

      </xsl:element>

    </xsl:when>

    <xsl:otherwise>

      <xsl:element name="a">

        <xsl:apply-templates select="@*|node()" />

      </xsl:element>

    </xsl:otherwise>

  </xsl:choose>

</xsl:template>



<xsl:template match="xhtml:h1|xhtml:h2|xhtml:h3|xhtml:h4|xhtml:h5|xhtml:h6">

  <xsl:param name="shifted">no</xsl:param>

  <xsl:variable name="previous-node-name" select="name(preceding-sibling::*[1])" />

  <xsl:if test="$shifted='yes' or $previous-node-name!='a'

      or preceding-sibling::*[1]/@href or not(preceding-sibling::*[1]/@name)">

    <xsl:element name="{local-name()}">

      <xsl:apply-templates />

    </xsl:element>

  </xsl:if>

</xsl:template>

The first new template will shift headlines inside named anchors. The second template will transcribe any unshifted headlines that are encountered. But it will refrain from re-outputting shifted headlines when iterating past their original locations in the source document.

The completed stylesheet will produce documents that are much better able to weather the transition from your favorite web browser to Swing.

Coordination

These XSLT revisions can be applied at runtime via the javax.xml.transform.Transformer class and what is sometimes referred to as the TrAX API.

Transformer autobot

    = TransformerFactory.newInstance().newTransformer(

    new StreamSource(

    ClassLoader.getSystemResourceAsStream("xsl/xhtml2swing.xsl"));

autobot.transform(new StreamSource(

    ClassLoader.getSystemResourceAsStream("resources/source.xhtml"),

    new StreamResult(outputStream));

outputStream.flush();

Alternately, you could generate HTML at compile time, and package it up with your distribution. The Ant target for this might look like

<target name="dist" depends="compile">

  <style in="${dist.docs.dir}/source.xhtml" style="${scripts.dir}/xhtml2swinghtml.xsl"

      processor="trax" out="${resources.dir}/content.html" force="true" />

  <jar jarfile="${dist.purejava.dir}/${ant.project.name}">

    <fileset dir="${classes.dir}" />

    <fileset dir="${resources.dir}" />

  </jar>

</target>

Then, you would read the modified document from a resource stream with the following Java code

JEditorPane jEditorPane = new JEditorPane();

URL docURL = ClassLoader.getSystemResource("resources/content.html");

jEditorPane.setPage(docURL);

If you enlist TagSoup's parser (org.ccil.cowan.tagsoup.Parser), you could even cut out the step of generating XHTML files and process raw HTML directly.

Future Directions

Using these techniques, you should be able to accommodate the rendering limitations of any HTML renderer, not just Swing's.

Once you get comfortable with XSLT, you may want to use it upstream from HTML to spruce up your content model. If you start with a more conceptual document format such as DocBook XML (not to be confused with DocBook SGML) or poorly named Extensible Stylesheets Language Format Objects (XSL-FO), then XHTML can become just another output format alongside PDF, RTF, and JavaHelp.

You can write the stylesheets to accomplish this yourself, or appeal to DocBook XSL (not to be confused with DocBook DSSL) and Apache Forms Object Processor (FOP).

You might also want to experiment with other XSLT implementations. In particular, Saxon supports the XSLT 2.0 specification, which includes many improvements such as more flexible grouping and standardized data-type conversions.