Getting started with XSLT and XPath (III)
August 23, 2000
2.2 Syntax basics: Stylesheets, Templates, Instructions
Next we'll look at some basic terminology both helpful in understanding the principles of writing an XSLT stylesheet and recognizing the constructs used therein. This section is not meant as tutelage for writing stylesheets, but only as background information, nomenclature, and practice guidelines.
Note: |
I use two pairs of diametric terms not used as such in the XSLT Recommendation itself: explicit/implicit stylesheets and push/pull design approaches. Students of my instructor-led courses have found these distinctions helpful even though they are not official terms. Though these terms are documented here with apparent official status, such status is not meant to be conferred. |
2.2.1 Explicitly declared stylesheets
An explicitly declared XSLT stylesheet is comprised of a distinct wrapper element
containing the stylesheet specification. This wrapper element must be an XSLT instruction
named either stylesheet
or transform
, thus it must be qualified by
the prefix associated with the XSLT namespace URI. This wrapper element is the document
element in a standalone stylesheet, but may in other cases be embedded inside an XML
document.
|
|
The XML declaration is consumed by the XML processor embedded within the XSLT processor, thus the XSLT processor never sees it. The wrapper element must include the XSLT namespace and version declarations for the element to be recognized as an instruction.
The children of the wrapper element are the top-level elements, comprised of global constructs, serialization information, and certain maintenance instructions. Template rules supply the stylesheet behavior for matching source tree conditions. The content of a template rule is a result tree template containing both literal result elements and XSLT instructions.
The example above has only a single template rule, that being for the root of the document.
2.2.2 Implicitly declared stylesheets
The simplest kind of XSLT stylesheet is an XML file implicitly representing the entire outcome of transformation. The result vocabulary is arbitrary, and the stylesheet tree forms the template used by the XSLT processor to build the result tree. If no XSLT or extension instructions are found therein, the stylesheet tree becomes the result tree. If instructions are present, the processor replaces the instructions with the outcomes of their execution.
|
|
The XML declaration is consumed by the XML processor embedded within the XSLT processor, thus the XSLT processor never sees it. The remainder of the file is considered the result tree template for an implicit rule for the root of the document, describing the shape of the entire outcome of the transformation.
The document element is named "html
" and contains the namespace and version
declarations of the XSLT language. Any element type within the result tree template
that is
qualified by the prefix assigned to the XSLT namespace URI is recognized as an XSLT
instruction. No extension instruction namespaces are declared, thus all other element
types
in the instance are literal result elements. Indeed, the document element is a literal
result element as it, too, is not an instruction.
2.2.3 Stylesheet requirements
Every XSLT stylesheet must identify the namespace prefix used therein for XSLT
instructions. The default namespace cannot be used for this purpose. The namespace
URI
associated with the prefix must be the value
http://www.w3.org/1999/XSL/Transform
. It is a common practice to use the
prefix xsl
to identify the XSLT vocabulary, though this is only convention and
any valid prefix can be used.
XSLT processor extensions are outside the scope of the XSLT vocabulary, so other URI values must be used to identify extensions.
The stylesheet must also declare the version of XSLT required by the instructions
used
therein. The attribute is named version
and must accompany the namespace
declaration in the wrapper element instruction as
version="version-number"
. In an implicit stylesheet where the XSLT
namespace is declared in an element that is not an XSLT instruction, the namespace-qualified
attribute declaration must be used as
prefix:version="version-number"
.
The version number is a numeric floating-point value representing the latest version of XSLT defining the instructions used in the stylesheet. It need not declare the most capable version supported by the XSLT processor.
2.2.4 Instructions and literal result elements
XSLT instructions are only detected in the stylesheet tree and are not detected in the source tree. Instructions are specified using the namespace prefix associated with the XSLT namespace URI. The XSLT Recommendation describes the behavior of the XSLT processor for each of the instructions defined based on the instruction's element type (name).
Top-level instructions are considered and/or executed by the XSLT processor before processing begins on the source information. For better performance reasons, a processor may choose to not consider a top-level instruction until there is need within the stylesheet to use it. All other instructions are found somewhere in a result tree template and are not executed until that point at which the processor is asked to add the instruction to the result tree. Instructions themselves are never added to the result tree.
Some XSLT instructions are control constructs used by the processor to manage our stylesheets. The wrapper and top-level elements declare our globally scoped constructs. Procedural and process-control constructs give us the ability to selectively add only portions of templates to the result, rather than always adding an entire template. Logically-oriented constructs give us facilities to share the use of values and declarations within our own stylesheet files. Physically-oriented constructs give us the power to share entire stylesheet fragments.
Other XSLT instructions are result tree value placeholders. We declare how a value is calculated by the processor, or obtained from a source tree, or both calculated by the processor from a value from a source tree. The value calculation is triggered when the XSLT processor is about to add the instruction to the result tree. The outcome of the calculation (which may be nothing) is added to the result tree.
All other instructions engage customized non-standard behaviors and are specified using extension elements in a standardized fashion. These elements use namespace prefixes declared by our stylesheets to be instruction prefixes. Extension instructions may be either control constructs or result tree value placeholders.
Consider the simple example in our stylesheets used earlier in this chapter where the following instruction is used:
01 <xsl:value-of select="greeting"/> |
|
This instruction uses the select=
attribute to specify the XPath expression
of some value to be calculated and added to the result tree. When the expression is
a
location in the source tree, as is this example, the value returned is the value of
the
first location identified using the criteria. When that location is an element, the
value
returned is the concatenation of all of the #PCDATA text contained therein.
This example instruction is executed in the context of the root of the source document
being the focus. The child of the root of the document is the document element. The
expression requests the value of the child named "greeting
" of the root of the
document, hence, the value of the document element named "greeting
". For any
source document where "greeting
" is not the document element, the value
returned is the empty string. For any source document where it is the document element,
as
is our example, the value returned is the concatenation of all #PCDATA text in the
entire
instance.
A literal result element is any element in a stylesheet that is not a top-level element and is not either an XSLT instruction or an extension instruction. A literal result element can use the default namespace or any namespace not declared in the stylesheet to be an instruction namespace.
When the XSLT processor reads the stylesheet and creates the abstract nodes in the stylesheet tree, those nodes that are literal result elements represent the nodes that are added to the result tree. Though the definition of those nodes is dictated by the XML syntax in the stylesheet entity, the syntax used does not necessarily represent the syntax that is serialized from the result tree nodes created from the stylesheet nodes.
Literal result elements marked up in the stylesheet entity may have attributes that are targeted for the XML processor used by the XSLT processor, targeted for the XSLT processor, or targeted for use in the result tree. Some attributes are consumed and acted upon as the stylesheet file is processed to build the stylesheet tree, while the others remain in the stylesheet tree for later use. Those literal result attributes remaining in the stylesheet tree that are qualified with an instruction namespace are acted on when they are asked to be added to the result tree.
2.2.5 Templates and template rules
Many XSLT instructions are container elements. The collection of literal result elements and other instructions being contained therein comprises the XSLT template for that instruction. A template can contain only literal result elements, only instruction elements, or a mixture of both. The behavior of the stylesheet can ask that a template be added to the result tree, at which point the nodes for literal result elements are added and the nodes for instructions are executed.
Consider again the simple example in our stylesheets used earlier in this chapter where the following template is used:
01 <b><i><u><xsl:value-of select="greeting"/></u></i></b> |
|
This template contains a mixture of literal result elements and an instruction element.
When the XSLT processor adds this template to the result tree, the nodes for the
<b>
, <i>
and <u>
elements are
simply added to the tree, while the node for the xsl:value-of
instruction
triggers the processor to add the outcome of instruction execution to the tree.
A template rule is a declaration to the XSLT processor of a template to be added to the result tree when certain conditions are met by source locations visited by the processor. Template rules are either top-level elements explicitly written in the stylesheet or built-in templates assumed by the processor and implicitly available in all stylesheets.
The criteria for adding a written template rule's template to the result tree are
specified in a number of attributes, one of which must be the match=
attribute.
This attribute is an XPath pattern expression, which is a subset of XPath expressions
in
general. The pattern expression describes preconditions of source tree nodes. The
stylesheet
writer is responsible for writing the preconditions and other attribute values in
such a way
as to unambiguously provide a single written or built-in template for each of the
anticipated source tree conditions.
In an implicitly declared stylesheet, the entire file is considered the template for the template rule for the root of the document. This template rule overrides the built-in rule implicitly available in the XSLT processor.
Back to the simple example in our explicitly declared stylesheet used earlier in this chapter, the following template rule is declared:
01 <xsl:template match="/"> 02 <b><i><u><xsl:value-of select="greeting"/></u></i></b> 03 </xsl:template> |
|
This template rule defines the template to be added to the result tree when the root of the document is visited. This written rule overrides the built-in rule implicitly available in the XSLT processor. The template is the same template we were discussing earlier: a set of result tree nodes and an instruction.
The XSLT processor begins processing by visiting the root of the document. This gives control to the stylesheet writer. Either the supplied template rule or built-in template rule for the root of the document is processed, based on what the writer has declared in the stylesheet. The writer is in complete control at this early stage and all XSLT processor behavior is dictated what the writer asks to be calculated and where the writer asks the XSLT processor to visit.
2.2.6 Approaches to stylesheet design
The last discussion in this two-chapter introduction regards how to approach using templates and instructions when writing a stylesheet. Two distinct approaches can be characterized. Choosing which approach to use when depends on your own preferences, the nature of the source information, and the nature of the desired result.
Note: |
I refer to these two approaches as either stylesheet-driven or data-driven, though the former might be misconstrued. Of course all results are stylesheet-driven because the stylesheet dictates what to do, so the use of the term involves some nuance. By stylesheet-driven I mean that the order of the result is a result of the stylesheet tree having explicitly instructed the adding of information to the result tree. By data-driven I mean that the order of the result is a result of the source tree ordering having dictated the adding of information to the result tree. |
2.2.6.1 Pulling the input data
When the stylesheet writer knows the location of and order of data found in the source tree, and the writer wants to add to the result a value from or collection of that data, then information can be pulled from the source tree on demand. Two instructions are provided for this purpose: one for obtaining or calculating a single string value to add to the result; and one for adding rich markup to the result based on obtaining as many values as may exist in the tree.
The writer uses the <xsl:value-of select="XPath-expression"/>
instruction in a stylesheet's element content to calculate a single value to be added
to the
result tree. The instruction is always empty and therefore does not contain a template.
This
value calculated can be the result of function execution, the value of a variable,
or the
value of a node selected from the source tree. When used in the template of various
XSLT
instructions the outcome becomes part of the value of a result element, attribute,
comment,
or processing instruction.
Note there is also a shorthand notation called an "attribute value template" that
allows
the equivalent to <xsl:value-of>
to be used in a stylesheet's attribute
content.
To iterate over locations in the source tree, the <xsl:for-each
select="XPath-node-set-expression">
instruction defines a template to
be processed for each instance, possibly repeated, of the selected locations. This
template
can contain literal result elements or any instruction to be executed. When processing
the
given template, the focus of the processor's view of the source tree shifts to the
location
being visited, thus providing for relative addressing while moving through the
information.
These instructions give the writer control over the order of information in the result. The data is being pulled from the source on demand and added to the result tree in the stylesheet-determined order. When collections of nodes are iterated, the nodes are visited in document order. This implements a stylesheet-driven approach to creating the result.
An implicitly-declared stylesheet is obliged to use only these "pull" instructions and must dictate the order of the result with the above instructions in the lone template.
2.2.6.2 Pushing the input data
The stylesheet writer may not know the order of the data found in the source tree, or may want to have the source tree dictate the ordering of content of the result tree. In these situations, the writer instructs the XSLT processor to visit source tree nodes and to apply to the result the templates associated with the nodes that are visited.
The <xsl:apply-templates select="XPath-node-expression">
instruction visits the source tree nodes described by the node expression in the
select=
attribute. The writer can choose any relative, absolute, or arbitrary
location or locations to be visited.
Each node visited is pushed through the stylesheet to be caught by template rules. Template rules specify the template to be processed and added to the result tree. The template added is dictated by the template rule matched for the node being pushed, not by a template supplied by the instruction when a node is being pulled. This distinguishes the behavior as being a data-driven approach to creating the result, in that the source determines the ultimate order of the result.
An implicitly-declared stylesheet can only push information through built-in template rules, which is of limited value. As well, the built-in rules can be mimicked entirely by using pull constructs, thus they need never be used. There is no room in the stylesheet to declare template rules in an implicitly-declared stylesheet since there is no wrapper stylesheet instruction.
An explicitly-declared stylesheet can either push or pull information because there is room in the stylesheet to define the top-level elements, including any number of template rules required for the transformation.
Putting it all together
We are not obliged to use only one approach when we write our stylesheets. It is very appropriate to push where the order is dictated by the source information and to pull when responding to a push where the order is known by the stylesheet. The most common use of this combination in a template is localized pull access to values that are relative to the focus being matched by nodes being pushed.
Note that push-oriented stylesheets more easily accommodate changes to the data and are more easily exploited by others who wish to reuse the stylesheets we write. The more granularity we have in our template rules, the more flexibly our stylesheets can respond to changes in the order of data. The more we pull data from our source tree, the more dependent we are on how we have coded the access to the information. The more we push data through our stylesheet, the less that changes in our data impact our stylesheet code.
Look again at the examples discussed earlier in this article and analyze the use of the above pull and push constructs to meet the objectives of the transformations.
These introductions and samples in this article have set the context, and only scratch the surface of the power of XSLT to effect the transformations we need when working with our structured information.
XML.com has continuing coverage and tutorials about XPath and XSLT in its regular column, Transforming XML.
This is a prose version of an excerpt from the book "Practical Transformation Using XSLT and XPath" (Eighth Edition ISBN 1-894049-05-5 at the time of this writing) published by Crane Softwrights Ltd., written by G. Ken Holman; this excerpt was edited by Stan Swaren, and reviewed by Dave Pawson.