Listen Print Discuss
Push, Pull, Next!

Push, Pull, Next!

by Bob DuCharme
July 06, 2005

In a recent weblog post, XML.com's "Python and XML" columnist Uche Ogbuji provided a nice collection of links to discussions about the push vs. pull styles of XSLT stylesheet development. What do we mean by "push" and "pull"? As a short example of each, let's look at two approaches to converting the following DocBook document to XHTML:



<book>
  <title>Beneath the Underdog</title>
  <para>In other words, I am three.</para>
  <para>"Which one is real?"</para>
  <para>"They're all real."</para>
</book>

The first stylesheet below takes a push approach. The XSLT processor "pushes" the source tree nodes through the stylesheet, which has template rules to handle various kinds of nodes as they come through:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:template match="book">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title><xsl:value-of select="book/title"/></title>
      </head>
      <body>
        <xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="para">
    <p><xsl:apply-templates/></p>
  </xsl:template>

  <xsl:template match="title">
    <h1><xsl:apply-templates/></h1>
  </xsl:template>

</xsl:stylesheet>

Each xsl:apply-templates instruction is the stylesheet's way of telling the XSLT processor to send along the context node's child nodes to the stylesheet's relevant template rules. (Or, to quote Curtis Mayfield, "Keep On Pushing.")

A pull-style stylesheet minimizes the use of xsl:apply-template instructions. It uses instructions such as xsl:value-of and xsl:for-each to retrieve the nodes it wants and then puts them where it needs them, like this:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:template match="/">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <title><xsl:value-of select="book/title"/></title>
      </head>
      <body>
        <h1><xsl:value-of select="book/title"/></h1>
        <xsl:for-each select="book/para">
          <p><xsl:value-of select="."/></p>
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

Related Reading

XSLT 1.0 Pocket Reference

XSLT 1.0 Pocket Reference
By Evan Lenz

Few stylesheets rely strictly on push or pull processing. For example, while my first stylesheet above includes a template rule to convert the title element into an h1 element when it comes along, it needs to explicitly go get the title value to plug it into the XHTML head element's title element. The use of such a simple source document also made the pull example a little too clean and simple; in the real third paragraph of the book Beneath the Underdog, the word "all" is emphasized, and a pure pull style approach to handling in-line content would require contortions that doubled the length of the stylesheet.

The pull style can feel more natural to developers intimidated by XSLT's roots in the functional programming style used by its ancestors DSSSL, Scheme, and LISP. A pure pull stylesheet like the one above has one template rule that tells the XSLT processor, "When you find the root node of the source document, do this, then do this, then do this, then do this..." It's a series of steps to perform, as with a typical declarative programming language. Other template rules in such a stylesheet are usually named template rules—that is, template rules with name attributes that get explicitly called with xsl:call-template instructions instead of being called when the XSLT processor finds a node matching the condition described in the template rule's match attribute. (An xsl:template element can have both a match attribute and a name attribute, but typical template rules have one or the other.) These named template rules play the role of the subroutines or procedures of a procedural programming language, adding modularity to a growing program the old-fashioned way.

A Matter of Style

Considering XSLT's functional roots, though, I find the pull approach to be unnatural, and it scales up badly. I have minimal experience with functional languages—I never got beyond toy examples with DSSSL and I struggled a bit with Scheme and LISP school. I do have a theory why I never had such problems with the structure of XSLT stylesheets: those of us who began our document processing careers in the SGML days don't see XSLT as a successor to DSSSL (which few people used in production applications) but as a successor to Omnimark, which is what most developers used to turn SGML into something else. Omnimark is a pattern matching language that uses a streaming model, like today's SAX interfaces, and an Omnimark script is structured almost like a series of event handlers: when a title element comes along, do this with it; when a para element comes along, do that with it, and so forth.

Thinking of XSLT as an event-driven environment has served me pretty well if I consider the XSLT processor's discovery of various kinds of nodes as the events to write handlers for. I won't push the analogy to event-driven development too far, but I will say that it works much better than attempts to shoehorn XSLT into the declarative, "do this, then this, then this," style of a purist pull approach.

Stylesheets that use the push approach also make debugging easier. Usually, when I see people ask for help with a stylesheet, they're hoping that a one- or two-line change will fix their problem. I often look at one of these stylesheets, which typically have a minimal number of template rules each trying to execute too much program logic, and I think, "If they just rewrote it with more template rules to handle the different source node types, this would be easy to fix." Of course, telling people to revise the whole architecture of their stylesheet is not what they want to hear, so I'll rewrite part of their stylesheet using a push approach to demonstrate "one approach to the problem."

In a panel discussion on XSLT, I once asked Michael Kay what aspect of XSLT was most underused and underappreciated. I expected him to name some little-known instruction, function, or xsl:output attribute, and he surprised me with his reply that template rules—the most fundamental unit of an XSLT stylesheet—weren't used enough. A comparison of my two stylesheets above, though, demonstrates his point: a set of template rules can usually express the logic necessary to handle a source document's elements and attributes better than a single template rule with lots of xsl:if and xsl:choose instructions inside of it to express the processing logic for that application. This is especially true with publishing-oriented (or "document-oriented") XML documents, with their irregular structure and in-line elements, because a pull stylesheet can have a difficult time finding find the specific pieces of information it needs in such documents.

Pull Advantages?

Keeping the program logic for multiple classes of nodes in one template rule can be an advantage if you want to perform some specific steps on each node type, as well as some other steps on all those nodes. For example, let's say I want to wrap every member element from the following sample document in a p element.


<members>
  <member joinDate="2003-10-03">Jimmy Osterberg</member>
  <member joinDate="2005-03-07">Declan McManus</member>
  <member joinDate="2003-10-03">Richard Starkey</member>
  <member joinDate="2004-08-23">Vincent Furnier</member>
</members>

I want to precede each with a p element that says "(founding member)" if the joinDate date equals "2003-10-03", and with a p element of "(new member)" if the joinDate attribute begins with "2005". The following does this easily in a single template rule:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                version="1.0">

  <xsl:template match="member">

    <xsl:if test="@joinDate='2003-10-03'">
      <p>(founding member)</p>
    </xsl:if>

    <xsl:if test="substring(@joinDate,1,4) = '2005'">
      <p>(new member)</p>
    </xsl:if>

    <p><xsl:apply-templates/></p>

  </xsl:template>

</xsl:stylesheet>

XSLT 2.0: New Options

XSLT 2.0 offers another approach. The xsl:next-match element tells the XSLT processor to find the next most applicable template rule for the context node being processed and apply it, letting you apply multiple template rules to a node while still using a push approach. Normally, when multiple template rules all have match conditions that can describe the same element (for example, if one template rule has a match condition of "*", another has one of "member," and another has one of "member[@joinDate='2003-10-03']," they can all apply to the first member element shown above), the XSLT processor applies the one with the most specific description to the node—in this case, the one with a match condition of "member[@joinDate='2003-10-03']." (The choice is actually made based on a priority number to help judge how specific the description is. You can override this by explicitly setting a priority attribute value in the template rule.)

While an XSLT processor processes a particular node in a template rule, the xsl:next-match instruction tells it, "Go find the next most appropriate template rule after this one, execute all of its instructions, and then resume in this template rule." This lets you rewrite the stylesheet above like this, with the same effect:


<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
     version="2.0">

    <xsl:template match="member[@joinDate='2003-10-03']">
      <p>(founding member)</p>
      <xsl:next-match/>
    </xsl:template>

    <xsl:template match="member[substring(@joinDate,1,4) = '2005']">
      <p>(new member)</p>
      <xsl:next-match/>
    </xsl:template>

    <xsl:template match="member">
      <p><xsl:apply-templates/></p>
    </xsl:template>

</xsl:stylesheet>

When either the first or second template rule here is triggered, it outputs the p element shown and then triggers the third template rule.

It's best to use short examples in this kind of article, and the examples above are so short that the difference between the last two stylesheets seems trivial. You'll find that the usefulness of the xsl:next-match instruction becomes clearer as the amount of program logic to execute scales up. When you have different combinations of large blocks of instructions to execute on a set of nodes, putting these blocks inside of xsl:if instructions or the xsl:when children of xsl:choose elements makes a stylesheet increasingly difficult to read. When you combine the conditional processing made possible by carefully chosen match conditions with the template rule chaining allowed by xsl:next-match, you can have a much more elegant, readable solution. For even greater control over the relationship between the calling and the called templates, you can add xsl:with-param children to the xsl:next-match element to pass parameters, just like you can with named templates. (See my earlier column Setting and Using Variables and Parameters for an introduction to this.)

The same section of the XSLT 2.0 specification that covers xsl:next-match covers a related instruction: xsl:apply-imports. To understand its value, let's first review xsl:include and xsl:import instructions: both tell an XSLT processor to treat the identified file as part of the stylesheet with the xsl:include or xsl:import instruction. The latter lets you override template rules from the imported stylesheet, making it great for creating personal customizations of large, complex stylesheets. For example, you can import Norm Walsh's Docbook stylesheets and then, after the xsl:import instruction, add revised versions of the template rules that you've customized for yourself. (See my earlier column Combining Stylesheets with Include and Import for further review with examples.)

If the template rule that you overrode was long and complex and you just wanted to override one or two details in an XSLT 1.0 stylesheet, you had to copy the whole thing into your importing stylesheet and then change those details. The xsl:apply-imports instruction gives you a new option: it lets the overriding template rule call the imported stylesheet's overridden one. If your overriding template rule only needs to add a few things to the result of the overridden one, you can add them before and after an xsl:apply-imports instruction and let the imported template rule do the rest of the work. This instruction also lets you add xsl:with-param children, giving the overriding template rule even greater control over the behavior of the overridden one.

Controlling the Flow

The pull approach to XSLT stylesheet development may give the illusion of greater control because of its resemblance to a declarative programming style, but it often results in some quirky surprises that frustrate many stylesheet developers. The push approach offers several tools to navigate the natural flow of an XSLT processor's handling of a source tree, and XSLT 2.0's xsl:next-match and xsl:apply-imports instructions are two tools that should make the push approach more attractive.


Comment on this articleShare your experience in our forums.
(* You must be a
member of XML.com to use this feature.)
Comment on this Article


Titles Only Titles Only Newest First
  • Design Choice
    2005-09-30 14:50:04 JackParker [Reply]

    I went through all the proposed push/bull/hybrid variantions and it seems that one needs to choose a method based on the degree of control required. With pull you specify the output exactly. On the other hand, if you want the source XML to "drive" the output you can use the push method. If you need "complete control" use pull. This seems to match the declarative/imperative analogy.


    The reason is that with the proper pull method using only <xsl:apply-templates/> you have leave the order of output elements completely up to the source XML doc. If the source XML doc has a <para> element before the <title> then the output is:

    First paragraph


    <h1>Beneath the underdog</h1>

    Next paragraph




    • Design Choice
      2005-09-30 17:34:51 Bob DuCharme [Reply]

      Greater control if you're more comfortable with imperative languages. After that your analogy breaks down. I get all the control I need with a push approach. If the source document may have a title element before a para and I really want the title before any paras in the result, I just have to put this:


      <xsl:apply-templates select="title"/>
      <xsl:apply-templates selet="para"/>


      Some people feel that they have more control with declarative languages. LISP geeks have been insisting on this for decades.


      Bob




  • other benefits from push style
    2005-07-07 03:09:49 Avander_be [Reply]

    I agree with you guys ( Bob and M.Kay) that the power of a template base approach is often underestimated.


    Another benefit of the push style is that your stylesheet is far more resilient to changes, the xml source drives you're stylesheet and subtile changes in the xml won't break your stylesheet, using a pull style will break it for sure...




    • other benefits from push style
      2005-07-07 09:32:08 afettes [Reply]

      Hi Bob,


      First of all, this is an excellent article. For me personally this article would have been very helpful about 5 years ago! Unfortunately I've had to come across these rules by trial and error and some tutelage.


      Another great benefit of the push style of XSLT programming is on debugging the final application. You mentioned this briefly in your article ("If they just rewrote it with more template rules to handle the different source node types, this would be easy to fix."). However, I do not feel that you stressed this importance enough!


      Pull-style XSLT programming is an absolute horror to debug. Tracing the paths through a (pull) XSLT program can be a difficult task, and knowing where the problem occurs is a skill unto itself. Context problems can also occur when performing pull XSLT programming. Often it can be difficult to figure out what the context node is at a particular point in processing. Finally, in pull-style XSLT, any changes that you make could have serious side effects throughout the system. This is not what a good software engineer looks for in a programming practice.


      Push-style XSLT programming is quite the opposite. With numerous short, simple templates you gain a number of advantages: the program becomes easy to understand, finding the location of bugs becomes trivial, and knowing the context node is also trivial. Shorter templates that perform one (or few) tasks are immediately much more easy to understand and conceptualize in ones head. With well-crafted match attributes, knowing the exact context node at the time of processing should be a trivial matter. Finally, changes to the system have a very limited scope. Side effects are not common in systems developed in this fashion. If an error is occurring, it is generally a simple matter to find the offending template and make the appropriate adjustments. This solution of course scales very well.


      In my personal opinion, a mixture of the two styles is appropriate to most situations. A mixture of ~75% push and ~25% pull should suit any development. An example of this from your docbook example could look like this:


      <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
      version="1.0">


      <xsl:template match="book">
      <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
      <title><xsl:value-of select="book/title"/></title>
      </head>
      <body>
      <xsl:apply-templates select="title"/>
      <xsl:apply-templates select="para"/>
      </body>
      </html>
      </xsl:template>


      <xsl:template match="para">

      <xsl:apply-templates/>


      </xsl:template>


      <xsl:template match="title">
      <h1><xsl:apply-templates/></h1>
      </xsl:template>
      </xsl:stylesheet>


      This example would not work with multiple title elements followed by paragraphs, but it gives a good indication of the flexibility in program execution. Using the programming style (as in the match="book" template) I have found to be very beneficial at high levels in the XML document. Once you have delved down a level or two in the tree the pure push style tends to be more appropriate.


      One last comment: it is my understanding that XSLT processors (such as Saxon) have been optimized for template matching. I.e. <xsl:apply-templates match="para"/> would perform faster than <xsl:for-each select="para"></xsl:for-each>. Can anyone comment on this?


      Cheers,
      Alastair Fettes

      • other benefits from push style
        2005-07-08 04:12:03 ajwelch [Reply]


        Remeber that using a select attribute on the xsl:apply-templates element (eg <xsl:apply-templates select="..."/>) is in fact the pull style of processing - the only way to instigate push processing is with a no-select apply-templates (eg <xsl:apply-templates/>). The select attribute is driving the context node, not the document order of the source.


        Therefore, your example uses more pull processing than push.


        cheers
        andrew

        • other benefits from push style
          2005-07-09 00:16:53 afettes [Reply]

          Hi Andrew,


          The point of the example was a combination of push/pull styles. Both are visible in it. Also, it was just a simple example at best.


          Cheers,
          Alastair