Controlling Whitespace, Part Two
December 5, 2001
In part 1 of this three-part series, we looked at how the xsl:strip-space and xsl:preserve-space instructions let you control extra whitespace from your source tree by element type. This month we'll look at how xsl:text can not only add whitespace where you want it, but also make it easier to prevent unwanted whitespace characters from showing up in your result tree. We'll also look at a built-in XSLT function that lets you make the use of whitespace in your result tree more consistent.
Adding and Removing Whitespace with xsl:text
The xsl:text instruction adds a text node to the result tree. When result tree whitespace characters -- in particular, carriage returns -- aren't coming out the way you want them, this element is handy for both adding and preventing whitespace in your result document.
For example, let's say you want to print out the children of this employee element with spaces or carriage returns between them. (Sample data and complete stylesheets demonstrating these examples can be found in this zip file.)
<employee hireDate="09/01/1998"> <last>Herbert</last> <first>Johnny</first> <salary>95000</salary> </employee>
In this template rule, the comment shows that there's a space after the xsl:apply-templates element that adds the hireDate attribute value, and there's obviously a carriage return after that comment and after the second and third xsl:apply-templates elements.
<!-- xq529.xsl: converts xq528.xml into xq530.txt --> <xsl:template match="employee"> <xsl:apply-templates select="@hireDate"/> <!-- note space --> <xsl:apply-templates select="first"/> <xsl:apply-templates select="last"/> </xsl:template>
Because an XML parser ignores whitespace between elements if that whitespace is the only character data between those elements, the XML parser that reads in the stylesheet and hands it to the XSLT processor won't hand over that space and those carriage returns, so the template creates this result from the source document:
09/01/1998JohnnyHerbert
The xsl:text element is a great way to say "don't throw this whitespace out." As we saw in last month's column, element types in your source document can be designated as whitespace-stripping or whitespace-preserving elements; XSLT stylesheets are XML documents too, and xsl:text elements are the only whitespace-preserving elements in those documents.
This revision of the template above shows how xsl:text elements with a single space as content ensure that those spaces end up in the result.
<!-- xq531.xsl: converts xq528.xml into xq532.txt --> <xsl:template match="employee"> <xsl:apply-templates select="@hireDate"/><xsl:text> </xsl:text> <xsl:apply-templates select="first"/><xsl:text> </xsl:text> <xsl:apply-templates select="last"/> </xsl:template>
When applied to the same source document, the revised stylesheet creates a result with spaces separating the values.
09/01/1998 Johnny Herbert
The xsl:text elements in this next version of the template each have a single carriage return as their contents instead of a single space.
<!-- xq533.xsl: converts xq528.xml into xq534.txt --> <xsl:template match="employee"> <xsl:apply-templates select="@hireDate"/><xsl:text> </xsl:text> <xsl:apply-templates select="first"/><xsl:text> </xsl:text> <xsl:apply-templates select="last"/> </xsl:template>
With the same source document used again, the result of this template has each value separated by a carriage return.
09/01/1998 Johnny Herbert
This last template isn't indented very nicely. For those two xsl:text elements to each have a single carriage return and only a single carriage return as their contents, their end-tags must be right at the beginning of the line after the start-tag. If they were indented with the rest of the child elements of the xsl:apply-templates element, like this,
<!-- xq535.xsl: converts xq528.xml into xq536.txt --> <xsl:template match="employee"> <xsl:apply-templates select="@hireDate"/><xsl:text> </xsl:text> <xsl:apply-templates select="first"/><xsl:text> </xsl:text> <xsl:apply-templates select="last"/> </xsl:template>
the XSLT processor would add the carriage return and also the two spaces used to indent those end-tags before each value:
09/01/1998 Johnny Herbert
One handy trick that gets around this indenting problem and makes stylesheets more readable is to declare a general entity that has an xsl:text element with a space or carriage return as its contents and then to reference that entity in the document. This next version of the stylesheet does this for both characters and references these entities to put a carriage return after the hireDate value and a space after the first value.
<!-- xq537.xsl: converts xq528.xml into xq538.txt --> <!DOCTYPE stylesheet [ <!ENTITY space "<xsl:text> </xsl:text>"> <!ENTITY cr "<xsl:text> </xsl:text>"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:template match="employee"> <xsl:apply-templates select="@hireDate"/>&cr; <xsl:apply-templates select="first"/>&space; <xsl:apply-templates select="last"/> </xsl:template> </xsl:stylesheet>
The result has the carriage return and space right where the entity references put them.
09/01/1998 Johnny Herbert
Usually, stylesheets declare entities like this when they need to be used repeatedly in a document. If your stylesheet needs to have many carriage returns or single spaces inserted, declaring entities for them like this is very often worth it because &cr; and &space; are easier to write over and over than the text strings they represent. See the earlier column Entities and XSLT for more on the use of entities in stylesheets.
An XML processor will not delete a carriage return that's in an element with other character data, but sometimes you don't want that carriage return. The xsl:text element can help here, too, as easily as it can help to add carriage returns. For example, if we want to add the contents of the source document above to the result tree with the labels "Hire Date:" and "Name:" preceding each line, we might try it like this:
<!-- xq539.xsl: converts xq528.xml into xq540.txt --> <!DOCTYPE stylesheet [ <!ENTITY space "<xsl:text> </xsl:text>"> <!ENTITY cr "<xsl:text> </xsl:text>"> ]> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:template match="employee"> Hire Date: <xsl:apply-templates select="@hireDate"/>&cr; Name: <xsl:apply-templates select="first"/>&space; <xsl:apply-templates select="last"/> </xsl:template> </xsl:stylesheet>
The result shows a carriage return after each label.
Hire Date: 09/01/1998 Name: Johnny Herbert
If we don't want those carriage returns, we can wrap those labels in xsl:text elements. This splits the carriage returns after those labels off to where they're no longer next to non-whitespace characters and will therefore be ignored by the XML processor that hands this stylesheet to the XSLT processor.
<!-- xq541.xsl: converts xq528.xml into xq542.txt --> <xsl:template match="employee"> <xsl:text>Hire Date: </xsl:text> <xsl:apply-templates select="@hireDate"/>&cr; <xsl:text>Name: </xsl:text> <xsl:apply-templates select="first"/>&space; <xsl:apply-templates select="last"/> </xsl:template>
The result has the labels on the same line as the data they go with.
Hire Date: 09/01/1998 Name: Johnny Herbert
Whether you're trying to add carriage returns or delete them, the xsl:text instruction is great for controlling how carriage returns are added to your result tree.
Normalizing Space
Also in Transforming XML |
|
Imagine that your source document has extra whitespace in places, but not consistently, and you want to get rid of this whitespace to make it consistent. For example, the first employee element in the following has no extra spaces or carriage returns within its child elements, but the second one has plenty.
<employees> <employee hireDate="09/01/1998"> <last>Herbert</last> <first>Johnny</first> <salary>95000</salary> </employee> <employee hireDate=" 04/23/1999"> <last> Hill </last> <first> Phil </first> <salary>100000 </salary> </employee> </employees>
A simple stylesheet to create comma-delimited versions of each employee's data, like this,
<!-- xq546.xsl: converts xq543.xml into xq548.txt --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:template match="employee"> <xsl:apply-templates select="@hireDate"/> <xsl:text>,</xsl:text> <xsl:apply-templates select="first"/> <xsl:text>,</xsl:text> <xsl:apply-templates select="last"/> </xsl:template> </xsl:stylesheet>
creates output that includes all that extra whitespace:
09/01/1998,Johnny,Herbert 04/23/1999, Phil , Hill
The normalize-space() function, in addition to converting strings of multiple space characters into a single space, deletes any leading and trailing spaces from the string passed to it as an argument. Using it can solve the problem with the stylesheet above:
<!-- xq544.xsl: converts xq543.xml into xq547.txt --> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <xsl:output method="xml" omit-xml-declaration="yes"/> <xsl:template match="employee"> <xsl:value-of select="normalize-space(@hireDate)"/> <xsl:text>,</xsl:text> <xsl:value-of select="normalize-space(first)"/> <xsl:text>,</xsl:text> <xsl:value-of select="normalize-space(last)"/> <!-- Following alternative won't work: <xsl:apply-templates select="normalize-space(@hireDate)"/> <xsl:text>,</xsl:text> <xsl:apply-templates select="normalize-space(first)"/> <xsl:text>,</xsl:text> <xsl:apply-templates select="normalize-space(last)"/> --> </xsl:template> </xsl:stylesheet>
Note the comment in the second half of the "employee" template rule. We can't just insert the normalize-space() function inside the select attributes of the previous stylesheet's xsl:apply-templates instructions, because this function returns a string and xsl:apply-templates expects to see a node set expression as the value of its select attribute. So, the template uses xsl:value-of instructions instead, the normalize-space() function works, and the result is formatted consistently:
09/01/1998,Johnny,Herbert 04/23/1999,Phil,Hill
In next month's final installment of this series on controlling whitespace, we'll look at how an XSLT stylesheet can add tabs to your result document and a built-in feature that automates intelligent indenting of your result documents.