Schematron 1.5: Looking Under the Hood
October 6, 2004
Whenever I explain to someone how the evolving ISO standard Schematron can help to address their particular XML data quality issues, I always enjoy the part when they ask, "What software do I need?" Chances are, they already have the necessary software installed: an XSLT processor.
Schematron's reference implementation is written using XSLT stylesheets, and while no knowledge of XSLT is necessary to use Schematron (and implementations in other languages are available), XSLT developers can learn a lot by studying these stylesheets — in fact, I was tempted to call this column "My Favorite XSLT Application."
I like to think of a Schematron schema as a collection of rules about conditions that must be true in the data being checked (stored in assert statements) and conditions that, if true, generate an error message (report elements). Each assert and report statement is expressed in two parts: first, a Boolean expression using XPath to identify the data in question, and second, a natural language description of the condition that was found or not found. An example of the latter is "Invoices for more than $1,000 must be approved by an employee with a job grade higher than 5." This example highlights Schematron's ease at checking that data complies with real business rules, as opposed to the type-checking and structural conditions that most schema languages check for.
From an XSLT point of view, the basic workflow of Schematron usage works like this: An XSLT stylesheet available on the Schematron web site reads your rules file — a "schema" expressed in Schematron's own simple, straightforward syntax — and uses it to generate another stylesheet. Whenever you want to check that your data conforms to the rules in your rules file, you run the generated stylesheet against that data. If you update your rules file, you need to rerun the generating stylesheet against it to keep your rules-checking stylesheet up-to-date. (As we look at more details of how this all works, I'll refer to these stylesheets as the "generating" stylesheet and the "rules-checking" stylesheet.)
Stylesheets Generating Stylesheets
The generating stylesheet is actually a driver stylesheet that includes one or more other stylesheets. In particular, it probably includes the skeleton stylesheet Oliver Becker and Rick Jelliffe wrote (latest release: skeleton1-5) to handle all the important logic. As we'll see, separating out the stylesheet that performs the basic Schematron logic lets you design a customized interface in the "including stylesheet." The Schematron home pages links to several different "including stylesheets" that demonstrate the possibilities.
Before looking at these stylesheets, though, make sure you're comfortable with XSLT's concept of namespace-aliasing, because otherwise, stylesheets that generate stylesheets can be difficult to read. (See this earlier column for an introduction.) To summarize, a stylesheet uses xsl:template elements to look for certain things in a source document and to then add certain things to the result document. But if that stylesheet must add xsl:template elements to the result tree, you need a way to distinguish the xsl:template elements that tell the generating stylesheet what to do from the xsl:template elements that the generating stylesheet adds to the generated stylesheet in the result tree.
You do this by declaring another namespace prefix and a dummy URL in addition to the xmlns:xsl="http://www.w3.org/1999/XSL/Transform" namespace declaration found in most stylesheets. (In Becker and Jelliffe's stylesheets, this additional declaration is xmlns:axsl="http://www.w3.org/1999/XSL/TransformAlias", but you can use any prefix and URL you want.) Your generating stylesheet also includes an xsl:namespace-alias element to show that your new namespace prefix is an alias for the URL assigned to the xsl prefix, and that in the result document this new prefix should actually stand for the xsl prefix's URL. Look at a rules-checking stylesheet generated by Schematron, and you'll see that the XSLT elements all have a namespace prefix of axsl. This is declared at the top of the stylesheet to represent the http://www.w3.org/1999/XSL/Transform namespace and not the http://www.w3.org/1999/XSL/TransformAlias one, making it a legal, functioning XSLT stylesheet.
skeleton1-5.xsl
Becker and Jelliffe's skeleton1-5.xsl stylesheet is the likeliest candidate for inclusion by a stylesheet that drives the generation of a rules-checking stylesheet, because it implements the basic rules-checking logic. Its important template rules follow the same general pattern as this one:
<xsl:template match="sch:assert | assert"> <xsl:if test="not(@test)"> <xsl:message>Markup Error: no test attribute in <assert></xsl:message> </xsl:if> <axsl:choose> <axsl:when test="{@test}"/> <axsl:otherwise> <xsl:call-template name="process-assert"> <xsl:with-param name="role" select="@role"/> <xsl:with-param name="id" select="@id"/> <xsl:with-param name="test" select="normalize-space(@test)" /> <xsl:with-param name="icon" select="@icon"/> <xsl:with-param name="subject" select="@subject"/> <xsl:with-param name="diagnostics" select="@diagnostics"/> </xsl:call-template> </axsl:otherwise> </axsl:choose> </xsl:template>
This template rule looks in the Schematron schema in the source tree for assert elements from either the Schematron namespace or from no namespace at all. In keeping with the Schematron philosophy, the template rule starts by doing a little error checking on your rules file, displaying an error message if it doesn't find the test attribute that is required in an assert element.
Note how the second child of this template rule is an axsl:choose element and not an xsl:choose element. This element doesn't specify logic for skeleton1-5.xsl to execute; it's something from outside of the XSLT namespace, because in skeleton1-5.xsl, the axsl prefix represents the dummy http://www.w3.org/1999/XSL/TransformAlias namespace and not the XSLT http://www.w3.org/1999/XSL/Transform one.
Being from outside of the XSLT namespace, it's something for the generating stylesheet to add to the generated stylesheet in the result tree. Once it's added to the generated stylesheet, it will be in the XSLT namespace for that stylesheet because of the namespace-alias trick described earlier.
This axsl:choose element has a single axsl:when child followed by an axsl:otherwise child to handle a true or false assert statement in the source tree's Schematron rules file. (Because XSLT offers no else element to use with its if element, stylesheets often use a choose element with a single when element followed by an otherwise element to represent if-else logic.) A Schematron assert statement in a Schematron rule describes something that should be true in the data, and it includes an error message to output if the condition is not true.
For example, imagine an element with an orderedQuantity attribute and an amountInStock attribute, and an assert element with a test value of "orderedQuantity <= amountInStock." (Because this condition is stored in the assert element's test attribute value, you need to use entity references for less-than and greater-than operators to keep things well-formed.)
When the "assert" template rule shown above processes this test condition, it takes this test attribute value and plugs it in where you see "{@test}" in the when element that is destined for the generated stylesheet. When this when element's test condition is true, nothing happens, which is what we want: when the generated stylesheet finds an orderedQuantity value that is less than or equal to the same element's amountInStock attribute value, there's no need to generate an error message. (Compare this logic with the template rule for report elements, which follows the one shown above in skeleton1-5.xsl.)
If the test condition isn't true, the generated stylesheet's axsl:otherwise condition is triggered. What happens then? That depends. You can see an xsl:call-template element inside the axsl:otherwise element, but remember that its xsl: prefix means that it won't end up in the generated stylesheet — it's part of the generating stylesheet's logic, calling a named template that will determine what goes inside of the otherwise statement in the generated stylesheet.
This is a recurring pattern in the generating stylesheet: when outputting a message about lack of conformance to a Schematron rule, it calls a separate, named-template rule dedicated to processing a message about this Schematron component. The template rule above calls the "process-assert" named template; the one after it in the stylesheet calls the "process-report" named template; the one after that calls "process-diagnostic," and so on.
Driver Stylesheets and Interfaces to the Core Logic
The "process-assert" named template doesn't do much. It just adds a simple message to the result tree describing what happened. So why bother breaking this out into a named template, separate from the template rule shown above for the process-assert element? The answer is the key to Schematron's API, which allows a callback-style approach to the interaction between a new user interface that you write and skeleton1-5.xsl. You can create a new user interface by writing a stylesheet that imports the skeleton stylesheet and overrides the imported stylesheet's process-assert, process-report, and other named template rules with its own versions.
This way, the new versions get called by template rules like the one shown above. The simplistic versions of these named template rules in the skeleton stylesheet are little more than placeholders, although they can be used as they are. (For information on the parameters that these calls pass to the named templates, see the Schematron API documentation.)
Also in Transforming XML |
|
For two examples of the use of this interface, see the schematron-basic.xsl stylesheet (documented here) and the schematron-report.xsl stylesheet (documented here).
The former doesn't actually redefine process-assert, but it does redefine the process-message named template called by process-assert to specify the message output format. The latter stylesheet does redefine process-assert and the other named template rules, adding several HTML tricks to create an error message file that takes part in a more interactive, frame-based interface to the error report that it generates. This includes links to offending elements as demonstrated here.
This separation of the rules-checking logic from the interface and the resulting ease with which we can develop new interfaces makes it easy to integrate Schematron with a variety of applications. Schematron's home page mentions API implementations in Java, Perl, Python, and .NET C#. I've seen it integrated into the graphical user interface of the XMetaL editor, so that dialog boxes drive the triggering of rules-checking and the display of error messages. The combination of Schematron's power with the simplicity of its API make it worth serious consideration for integration into any ambitious XML application, especially one using XSLT.