Combining RELAX NG and Schematron
February 11, 2004
Embedding Schematron Rules in RELAX NG
This article explains how to integrate two powerful XML schema languages, RELAX NG and Schematron. Embedding Schematron rules in RELAX NG is very simple because a RELAX NG validator ignores all elements not in the RELAX NG namespace (http://relaxng.org/ns/structure/1.0). This means that Schematron rules can be embedded in any element and on any level in a RELAX NG schema.
Here is a very simple RELAX NG schema that only defines one element, Root
:
<?xml version="1.0" encoding="UTF-8"?> <element name="Root" xmlns="http://relaxng.org/ns/structure/1.0"> <text/> </element>
Now if a Schematron rule should have the Root
element as its context, this
rule could be added as an embedded Schematron rule within the element element
that defines the pattern for Root
:
<?xml version="1.0" encoding="UTF-8"?> <element name="Root" xmlns="http://relaxng.org/ns/structure/1.0"> <sch:pattern name="Test constraints on the Root element" xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="Root"> <sch:assert test="test-condition">Error message when the assertion condition is broken...</sch:assert> </sch:rule> </sch:pattern> <text/> </element>
The Schematron rules embedded in a RELAX NG schema are inserted on the pattern level and must be declared in the Schematron namespace (http://www.ascc.net/xml/schematron).
Co-occurrence constraints
Although RELAX NG has better support for co-occurrence constraints than WXS, there are still many types of co-occurrence constraints that cannot be sufficiently defined. An example of such a co-occurrence constraint is when the relationship between two (or more) element/attribute values is expressed as a mathematical expression.
As an example, we use a schema that defines a very simple international purchase order. This purchase order specifies the following:
-
The
date
of the order -
An address to which the purchased products will be delivered
-
The items being purchased including an
id
, aname
, aquantity
, and aprice
with currency information) -
Payment details including
type
of payment andtotal amount
payable withcurrency
information
Here is an example of an XML representation of such a purchase order:
<?xml version="1.0" encoding="UTF-8"?> <purchaseOrder date="2002-10-22"> <deliveryDetails> <name>John Doe</name> <address>123 Morgue Street, Death Valley</address> <phone>+61 2 9546 4146</phone> </deliveryDetails> <items> <item id="123-XY"> <productName>Coffin</productName> <quantity>1</quantity> <price currency="AUD">2300</price> <totalAmount currency="AUD">2300</totalAmount> </item> <item id="112-AA"> <productName>Shovel</productName> <quantity>2</quantity> <price currency="AUD">75</price> <totalAmount currency="AUD">150</totalAmount> </item> </items> <payment type="Prepaid"> <amount currency="AUD">2450</amount> </payment> </purchaseOrder>
A real life purchase order would be much more complex, but for the purpose of this article, this example is sufficient. A RELAX NG schema for the purchase order could look like this:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="purchaseOrder"/> </start> <define name="purchaseOrder"> <element name="purchaseOrder"> <attribute name="date"> <data type="date"/> </attribute> <ref name="deliveryDetails"/> <element name="items"> <oneOrMore> <ref name="item"/> </oneOrMore> </element> <ref name="payment"/> </element> </define> <define name="deliveryDetails"> <element name="deliveryDetails"> <element name="name"><text/></element> <element name="address"><text/></element> <element name="phone"><text/></element> </element> </define> <define name="item"> <element name="item"> <attribute name="id"> <data type="string"> <param name="pattern">\d{3}-[A-Z]{2}</param> </data> </attribute> <element name="productName"><text/></element> <element name="quantity"> <data type="int"/> </element> <element name="price"> <ref name="currency"/> </element> <element name="totalAmount"> <ref name="currency"/> </element> </element> </define> <define name="payment"> <element name="payment"> <attribute name="type"> <choice> <value>Prepaid</value> <value>OnArrival</value> </choice> </attribute> <element name="amount"> <ref name="currency"/> </element> </element> </define> <define name="currency"> <attribute name="currency"> <choice> <value>AUD</value> <value>USD</value> <value>SEK</value> </choice> </attribute> <data type="int"/> </define> </grammar>
This RELAX NG schema makes sure that all the required elements and attributes are present, and that some of these have the correct datatype. For example, all price information must have an integer value; the id of an item must be three digits, followed by a hyphen, followed by two uppercase letters; and the currency value must be one of AUD, USD or SEK. However, in a real world scenario it is more likely that you need to check more than the structure and the datatypes to make sure the purchase order is valid.
For the purchase order, the following constraints cannot be checked by RELAX NG, but they would all be very useful for complete validation of the data:
-
Each item specifies
quantity
,price
and thetotalAmount
for that item. To make sure that the data is valid, the value of thetotalAmount
element must be equal toquantity
*price
. -
Both the
price
element and thetotalAmount
element specify a currency, and for this data to be valid, theprice
andtotalAmount
elements must have the same currency value -
The payments section of the purchase order specifies an
amount
element which value must equal the sum of all the item'stotalAmount
values -
All item's currency value must equal the currency value of the
amount
element in the payments section
Schematron can easily check all of these constraints, and the context definition in
the
language provides a logical grouping of the constraints. The first two rules specify
constraints that apply to each item
element in the purchase order and hence
this element is the context. Here is an example of how you can specify the Schematron
rules
needed to express this constraint:
<sch:pattern name="Check that the pricing and currency of an item is correct." xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/items/item"> <sch:assert test="number(price) * number(quantity) = number(totalAmount)"> The total amount for the item doesn't add up to (quantity * price).</sch:assert> <sch:assert test="price/@currency = totalAmount/@currency"> The currency in price doesn't match the currency in totalAmount. </sch:assert> </sch:rule> </sch:pattern>
The Schematron rule specifies its context as all item
elements with a parent
items
element and a grandparent purchaseOrder
. For each of the
item
elements that match this criterion, the first assertion checks that the
value of the price
child element multiplied by the value of the
quantity
child element match the value of the totalAmount
child
element. The second assertion makes sure that the currency value of the price
child element matches the currency value of the totalAmount
child element.
The last rules both apply to the amount
element in the payment section. This
is also the context for the Schematron rules that will check these two constraints.
Here is
an example of how these rules can be specified:
<sch:pattern name="Check that the total amount is correct and that the currencies match" xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/payment/amount"> <sch:assert test="number(.) = sum(/purchaseOrder/items/item/totalAmount)"> The total purchase amount doesn't match the cost of all items. </sch:assert> <sch:assert test = "not(/purchaseOrder/items/item/totalAmount/@currency != @currency)"> The currency in at least one of the items doesn't match the currency for the total amount. </sch:assert> </sch:rule> </sch:pattern>
The first assertion checks that the sum of all the item
element's
totalAmount
is equal to the value of the context node (which is the
amount
element) by using XPath's sum()
function. The second
assertion makes sure that all
the different item's currency values match the
currency value for the amount
element. Note that the following (similar)
assertion does not
perform the same check:
<sch:assert test = "/purchaseOrder/items/item/totalAmount/@currency = @currency" >...</sch:assert>
This assertion checks that at least one of the item's currency values matches the
currency
in the amount
element. However, in this case we want to make sure that all the
item's currency values match, and hence we negate both the assertion expression (using
XPath's not()
function) and the operator used inside the assertion ('=' becomes
'!='). When writing Schematron rules this technique is often used to express the desired
constraint.
Now that all the Schematron rules are defined, the only remaining task is to insert them into the main RELAX NG schema. As already mentioned, a RELAX NG schema allows any element not in the RELAX NG namespace to appear anywhere in the schema where markup is allowed. However, to keep the RELAX NG schema well organized and easy to read, it is recommended that you embed the Schematron rules in one of two places:
-
Insert all the embedded Schematron rules at the beginning of the RELAX NG schema as a child of the top-level element. Then you always know that if you have embedded rules, they will be specified together and in the same place.
-
Specify each Schematron rule on the element pattern that specifies the context of the embedded rule. In the previous example this means that one of the Schematron rules would be embedded on the element pattern for the
item
element and the other on the element pattern for theamount
element in the payment section.
I prefer to embed each Schematron rule in the element that defines the context, but it is really up to the developer which method to use. Another good rule to follow is to always declare the Schematron namespace on the top-level element in the RELAX NG schema. That way you know that if the top-level element contains a declaration for the Schematron namespace, then the schema contains embedded Schematron rules. The complete RELAX NG schema for the purchase order with embedded Schematron rules might look like this:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:sch="http://www.ascc.net/xml/schematron"> <start> <ref name="purchaseOrder"/> </start> <define name="purchaseOrder"> <element name="purchaseOrder"> <attribute name="date"> <data type="date"/> </attribute> <ref name="deliveryDetails"/> <element name="items"> <oneOrMore> <ref name="item"/> </oneOrMore> </element> <ref name="payment"/> </element> </define> <define name="deliveryDetails"> <element name="deliveryDetails"> <element name="name"><text/></element> <element name="address"><text/></element> <element name="phone"><text/></element> </element> </define> <define name="item"> <element name="item"> <sch:pattern name="Check that the pricing and currency of an item is correct."> <sch:rule context="purchaseOrder/items/item"> <sch:assert test="number(price) * number(quantity) = number(totalAmount)"> The total amount for the item doesn't add up to (quantity * price). </sch:assert> <sch:assert test="price/@currency = totalAmount/@currency"> The currency in price doesn't match the currency in totalAmount. </sch:assert> </sch:rule> </sch:pattern> <attribute name="id"> <data type="string"> <param name="pattern">\d{3}-[A-Z]{2}</param> </data> </attribute> <element name="productName"><text/></element> <element name="quantity"> <data type="int"/> </element> <element name="price"> <ref name="currency"/> </element> <element name="totalAmount"> <ref name="currency"/> </element> </element> </define> <define name="payment"> <element name="payment"> <attribute name="type"> <choice> <value>Prepaid</value> <value>OnArrival</value> </choice> </attribute> <element name="amount"> <sch:pattern name="Check that the total amount is correct and that the currencies match"> <sch:rule context="purchaseOrder/payment/amount"> <sch:assert test="number(.) = sum(/purchaseOrder/items/item/totalAmount)"> The total purchase amount doesn't match the cost of all items. </sch:assert> <sch:assert test="not(/purchaseOrder/items/item/totalAmount/@currency != @currency)"> The currency in at least one of the items doesn't match the currency for the total amount. </sch:assert> </sch:rule> </sch:pattern> <ref name="currency"/> </element> </element> </define> <define name="currency"> <attribute name="currency"> <choice> <value>AUD</value> <value>USD</value> <value>SEK</value> </choice> </attribute> <data type="int"/> </define> </grammar>
Dependency between XML documents
Like most other XML schema languages, RELAX NG lacks the ability to specify constraints between XML instance documents. In many XML applications, this is a very useful functionality. A typical example would be to check if a certain ID reference has a corresponding ID in a different document. For the purchase order example in the preceding section, this could be a simple database file where all the available products are listed. Typically a simple database would contain the following information:
-
Date
when the database was updated -
One or more products
-
Each product have an
id
, aname
, adescription
, aprice
and thenumber of items in stock
A sample XML instance document for the database would look like this:
<?xml version="1.0" encoding="UTF-8"?> <products lastUpdated="2002-10-22"> <product id="123-XY"> <productName>Coffin</productName> <description>Standard coffin, Size 200x80x50cm</description> <numberInStock>4</numberInStock> <price currency="AUD">2300</price> </product> <product id="112-AA"> <productName>Shovel</productName> <description>Plastic grip shovel</description> <numberInStock>2</numberInStock> <price currency="AUD">75</price> </product> </products>
With the corresponding RELAX NG schema:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <ref name="products"/> </start> <define name="products"> <element name="products"> <attribute name="lastUpdated"> <data type="date"/> </attribute> <oneOrMore> <ref name="product"/> </oneOrMore> </element> </define> <define name="product"> <element name="product"> <attribute name="id"> <data type="string"> <param name="pattern">\d{3}-[A-Z]{2}</param> </data> </attribute> <element name="productName"><text/></element> <element name="description"><text/></element> <element name="numberInStock"> <data type="int"/> </element> <element name="price"> <ref name="currency"/> </element> </element> </define> <define name="currency"> <attribute name="currency"> <choice> <value>AUD</value> <value>USD</value> <value>SEK</value> </choice> </attribute> <data type="int"/> </define> </grammar>
Looking back at the purchase order in the preceding section each item purchased was specified as:
<item id="123-XY"> <productName>Coffin</productName> <quantity>1</quantity> <price currency="AUD">2300</price> <totalAmount currency="AUD">2300</totalAmount> </item>
Since there also exists a database for each product available for purchase, there are now at least two more constraints that can be checked for each purchase order:
-
Make sure that each item's
id
exists as a product id in the database -
Make sure that the quantity ordered is less than or equal to the total number of products in stock for each item in the purchase order
Since these constraints require checks between XML documents, they can only be checked
by
Schematron processors that support XSLT's document()
function (or similar
functionality). If a Schematron processor based on XSLT is used, this is not a problem;
but
most XPath implementations of Schematron do not have this type of functionality. If
you use
an XSLT implementation, the Schematron rule for the first constraint can be specified
like
this:
<sch:pattern name="Check that the item exists in the database." xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/items/item"> <sch:assert test = "document('Products.xml')/products/product/@id = @id" >The item doesn't exist in the database.</sch:assert> </sch:rule> </sch:pattern>
Here the document()
function is used to access the XML instance document that
contains the available products. Once the document()
function has retrieved the
external document, you can use normal XPath expressions to select the nodes of interest.
In
this example, the id
of all the product
elements with a parent
products
is compared to the id
of the item
that is
currently being checked. If an item
element's id
value does not
exist in the database (Products.xml
), the assertion will fail.
The easiest way to check the second constraint is to use a different rule where the context is restricted using predicates. Here is an example of how this can be specified:
<sch:pattern name="Check that there are enough items in stock for the purchase." xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]"> <sch:assert test="number(document('Products.xml')/products/product[@id = current()/@id]/numberInStock) >= number(quantity)"> There are not enough items of this type in stock for this quantity. </sch:assert> </sch:rule> </sch:pattern>
This rule is a bit more complicated than the previous ones. The first thing that is
different is that the context specification for this rule is using a predicate to
limit the
number of elements checked. In this case, the predicate is used because instead of
selecting
all the item
elements in the document, only the item
elements with
an id
that exists in the database should be selected. This ensures that when
the processor checks the assertion, it is certain that the item being validated exists
in
the database.
The assertion test itself does in this case specify a predicate in conjunction with
the
document()
function. Here the predicate is used to select the
product
element that has an id
that matches the id
of the item
element that is currently being checked. The assertion then checks
that the numberInStock
child element (of product
) has a value that
is greater than or equal to the value of the quantity
child element (of
item
).
Now we know how the rule selects the context node, and how the assertion performs
the
validation, but what is the reason for the added restriction on which item
elements are selected? Why can't the context simply be all the item
elements in
the document and then the assertion for both the above constraints can be included
in the
same rule?
The answer has its roots in the fact that a Schematron assertion will fire if its test condition evaluates to false. Part of the assertion expression look like this:
document('Products.xml')/products/product[@id = current()/@id]
This part of the assertion is specified to select the product
element from the
database that has the same id
as the item currently being checked. If no such
product exists, the document()
function will not return any element at all, and
this will cause the whole assertion expression to fail. This is not the desired result
since
this assertion should check that there are enough products in stock to make the purchase.
However, by specifying a rule that only selects the item elements that do exist in
the
database, this situation will never occur.
Another important issue when defining the context of a rule is that an element can
only be
used once as the context for each pattern. This means that if more than one rule is
specified in the same pattern with the same context element, only the first matching
rule is
used. If a pattern defines multiple rules with the same context element, the most
restrictive rule must be specified first, followed by the other rules in descending
order,
based on the restrictive features of each rule. For programmers, this is analogous
to how a
long if-else chain is specified: you start with the most restrictive condition and
finish
with the most general condition. If done in reverse order, the first statement will
always
be true and the others will never execute. To illustrate, we will take a look at how
to
specify the above two rules in one pattern, since both rules use the same context
(the
item
element).
<sch:pattern name="Combined pattern." xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/items/item"> ... </sch:rule> <sch:rule context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]"> ... </sch:rule> </sch:pattern>
If the rules were specified in the above order (which is the order in which they were
defined and specified in the example), validation would not be performed correctly.
The
reason is because both rules specify the same context element and in this case the
most
general rule (context="purchaseOrder/items/item"
) is specified first. Since
this rule will match all the item elements, there will not be any item elements left
to
match the second rule. To make this work as expected, the rules must be specified
in the
reverse order (the most restrictive rule first):
<sch:pattern name="Combined pattern." xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]"> ... </sch:rule> <sch:rule context="purchaseOrder/items/item"> ... </sch:rule> </sch:pattern>
Now validation will be performed as expected. Since the most restrictive rule (selects
only
the item
elements that do exist in the database) is specified first, the second
rule will still be applied to all item
elements that do not exist in the
database. This means that the assertion in the second rule can be simplified to always
fail
(test="false()"
) because if the assertion is ever checked, it is certain that
it is an invalid item that does not exist in the database.
Here is the complete specification of the pattern for the two constraints after the appropriate changes have been made:
<sch:pattern name="Check each item against the database." xmlns:sch="http://www.ascc.net/xml/schematron"> <sch:rule context="purchaseOrder/items/item[@id = document('Products.xml')/products/product/@id]"> <sch:assert test="number(document('Products.xml')/products/product[@id = current()/@id]/numberInStock) >= number(quantity)"> There are not enough items of this type in stock for this quantity. </sch:assert> </sch:rule> <sch:rule context="purchaseOrder/items/item"> <sch:assert test="false()" >The item doesn't exist in the database.</sch:assert> </sch:rule> </sch:pattern>
The complete RELAX NG schema with embedded Schematron rules for both co-occurrence constraints and the database checks will look like this:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:sch="http://www.ascc.net/xml/schematron"> <start> <ref name="purchaseOrder"/> </start> <define name="purchaseOrder"> <element name="purchaseOrder"> <attribute name="date"> <data type="date"/> </attribute> <ref name="deliveryDetails"/> <element name="items"> <oneOrMore> <ref name="item"/> </oneOrMore> </element> <ref name="payment"/> </element> </define> <define name="deliveryDetails"> <element name="deliveryDetails"> <element name="name"><text/></element> <element name="address"><text/></element> <element name="phone"><text/></element> </element> </define> <define name="item"> <element name="item"> <sch:pattern name="Validate each item."> <sch:rule context="purchaseOrder/items/item[@id = document( 'Products.xml')/products/product/@id]"> <sch:assert test="number(document('Products.xml') /products/product[@id = current()/@id]/numberInStock) >= number(quantity)"> There are not enough items of this type in stock for this quantity. </sch:assert> <sch:assert test="number(price) * number(quantity) = number(totalAmount)"> The total amount for the item doesn't add up to (quantity * price). </sch:assert> <sch:assert test="price/@currency = totalAmount/@currency" >The currency in price doesn't match the currency in totalAmount. </sch:assert> </sch:rule> <sch:rule context="purchaseOrder/items/item"> <sch:assert test="false()" >The item doesn't exist in the database.</sch:assert> </sch:rule> </sch:pattern> <attribute name="id"> <data type="string"> <param name="pattern">\d{3}-[A-Z]{2}</param> </data> </attribute> <element name="productName"><text/></element> <element name="quantity"> <data type="int"/> </element> <element name="price"> <ref name="currency"/> </element> <element name="totalAmount"> <ref name="currency"/> </element> </element> </define> <define name="payment"> <element name="payment"> <attribute name="type"> <choice> <value>Prepaid</value> <value>OnArrival</value> </choice> </attribute> <element name="amount"> <sch:pattern name="Check that the total amount is correct and that the currencies match"> <sch:rule context="purchaseOrder/payment/amount"> <sch:assert test="number(.) = sum(/purchaseOrder/items/item/totalAmount)"> The total purchase amount doesn't match the cost of all items. </sch:assert> <sch:assert test="not(/purchaseOrder/items/item/totalAmount/@currency != @currency)"> </sch:rule> </sch:pattern> <ref name="currency"/> </element> </element> </define> <define name="currency"> <attribute name="currency"> <choice> <value>AUD</value> <value>USD</value> <value>SEK</value> </choice> </attribute> <data type="int"/> </define> </grammar>
Control over mixed text content
One of WXS's major advantages over previous schema languages is the ability to specify an extensive selection of datatypes for attributes but also for elements with text content. In RELAX NG it is possible to use all the datatypes from WXS by specifying these as the datatype library used. Unfortunately this ability to control the text content of an element disappears if the element is defined to have mixed content (child elements mixed with text content). With the help of embedded Schematron rules it is possible to apply basic text validation even for mixed content elements.
An example of this could be when you have source XML data that should be transformed into high quality PDF documents. A very simple paragraph in the final document can in XML be represented like this:
<p>This is <b>ok</b> but this is<b> not</b> ok</p>
In this case it is very important where the space characters around the b
elements are situated. If the space character is situated inside the b
element
then the bold font will make the space character bigger than what it is supposed to
be. For
this reason it is important that the text content inside the b
element does not
start or end with a space character. For the same reason the text preceding the
b
element should always end with a space character and the text following the
b
element should always start with a space character. In the above example
the space around the first b
element are correctly located while they are wrong
around the second b
element.
The RELAX NG schema for the above example is very simple:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes"> <start> <element name="p"> <mixed> <zeroOrMore> <element name="b"> <text/> </element> </zeroOrMore> </mixed> </element> </start> </grammar>
The Schematron rules that are needed to check the extra constraints on the text content can be implemented like this:
<sch:pattern name="Check spaces around b tags"> <sch:rule context="p/node()[following-sibling::b][preceding-sibling::b][1]"> <sch:assert test="substring(., string-length(.)) = ' '"> A space must be present before the b tag. </sch:assert> <sch:assert test="starts-with(., ' ')"> A space must be present after the b tag. </sch:assert> </sch:rule> <sch:rule context="p/node()[following-sibling::b][1]"> <sch:assert test="substring(., string-length(.)) = ' '"> A space must be present before the b tag. </sch:assert> </sch:rule> <sch:rule context="p/node()[preceding-sibling::b][1]"> <sch:assert test="starts-with(., ' ')"> A space must be present after the b tag. </sch:assert> </sch:rule> <sch:rule context="p/b"> <sch:assert test="not(starts-with(., ' '))"> The text in the b tag cannot start with a space. </sch:assert> <sch:assert test="substring(., string-length(.)) != ' '"> The text in the b tag cannot end with a space. </sch:assert> </sch:rule> </sch:pattern>
The Schematron rules to check this constraint is divided into four parts (each part is one rule with a separate context), which are explained in the order they are declared:
-
For all child nodes of the
p
element where the nearest preceding sibling and nearest following sibling is ab
element, check that a space character is present immediately after the precedingb
element and that a space character is present immediately before the followingb
element. -
For all child nodes of the
p
element where the nearest following sibling is ab
element, check that a space character is present immediately before theb
element. -
For all child nodes of the
p
element where the nearest preceding sibling is ab
element, check that a space character is present immediately after theb
element. -
For all child
b
elements, check that the text content does not begin or end with a space character.
The complete RELAX NG schema with embedded Schematron rules look like this:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" xmlns:sch="http://www.ascc.net/xml/schematron"> <start> <element name="p"> <sch:pattern name="Check spaces around b tags"> <sch:rule context="p/node()[following-sibling::b][preceding-sibling::b][1]"> <sch:assert test="substring(., string-length(.)) = ' '"> A space must be present before the b tag. </sch:assert> <sch:assert test="starts-with(., ' ')"> A space must be present after the b tag. </sch:assert> </sch:rule> <sch:rule context="p/node()[following-sibling::b][1]"> <sch:assert test="substring(., string-length(.)) = ' '"> A space must be present before the b tag. </sch:assert> </sch:rule> <sch:rule context="p/node()[preceding-sibling::b][1]"> <sch:assert test="starts-with(., ' ')"> A space must be present after the b tag. </sch:assert> </sch:rule> <sch:rule context="p/b"> <sch:assert test="not(starts-with(., ' '))"> The text in the b tag cannot start with a space. </sch:assert> <sch:assert test="substring(., string-length(.)) != ' '"> The text in the b tag cannot end with a space. </sch:assert> </sch:rule> </sch:pattern> <mixed> <zeroOrMore> <element name="b"> <text/> </element> </zeroOrMore> </mixed> </element> </start> </grammar>
This is of course a very simple example in which you only check for space characters.
In a
more advanced example you also need to check for other whitespace characters (like
tabs),
and the fact that the last b
element should not be followed by a space if the
immediately following character is a punctuation character. However, the example still
gives
you an idea of the things you can do with Schematron and mixed content.
Embedded Schematron using namespaces
Since Schematron is namespace-aware as is RELAX NG, it is no problem to embed Schematron
rules in a RELAX NG schema that define one or more namespaces for the document. In
the
preceding section, it was shown how Schematron schemas should be set up to use namespaces
by
using the ns
element. For embedded Schematron rules, this works exactly the
same. Instead of only embedding the Schematron rule that defines the extra constraint,
you
also need to embed the ns
elements that define the namespaces used. The same
example that was used in Namespaces and Schematron
is used, but now RELAX NG is used to define the structure, while Schematron checks
the
co-occurrence constraint. The instance example used was:
<ex:Person Title="Mr" xmlns:ex="http://www.topologi.com/example"> <ex:Name>Eddie</ex:Name> <ex:Gender>Male</ex:Gender> </ex:Person>
A RELAX NG schema for the above would look like this:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" ns="http://www.topologi.com/example"> <start> <element name="Person"> <element name="Name"><text/></element> <element name="Gender"> <choice> <value>Male</value> <value>Female</value> </choice> </element> <attribute name="Title"/> </element> </start> </grammar>
The Schematron rule that needs to be embedded to check the co-occurrence constraint
(if
title is "Mr" then the value of element Gender
must be "Male") will look like
this (note the use of the ex
prefix):
<sch:pattern name="Check co-occurrence constraint"> <sch:rule context="ex:Person[@Title='Mr']"> <sch:assert test="ex:Gender = 'Male'"> If the Title is "Mr" then the gender of the person must be "Male". </sch:assert> </sch:rule> </sch:pattern>
If this rule were embedded on its own the Schematron validation would fail because
the
prefix ex
is not mapped to a namespace URI. In order for this to work, the
ns
element that defines this mapping must also be embedded:
<sch:ns prefix="ex" uri="http://www.topologi.com/example" xmlns:sch="http://www.ascc.net/xml/schematron"/>
I always insert these Schematron namespace mappings at the start of the host schema. This means that they are always declared in the same place and it is easy to see which mappings are included without having to search through the entire schema. The complete RELAX NG schema with the embedded rules would then look like this:
<?xml version="1.0" encoding="UTF-8"?> <grammar xmlns="http://relaxng.org/ns/structure/1.0" datatypeLibrary="http://www.w3.org/2001/XMLSchema-datatypes" ns="http://www.topologi.com/example" xmlns:sch="http://www.ascc.net/xml/schematron"> <!-- Include all the Schematron namespace mappings at the top --> <sch:ns prefix="ex" uri="http://www.topologi.com/example"/> <start> <element name="Person"> <sch:pattern name="Check co-occurrence constraint"> <sch:rule context="ex:Person[@Title='Mr']"> <sch:assert test="ex:Gender = 'Male'"> If the Title is "Mr" then the gender of the person must be "Male". </sch:assert> </sch:rule> </sch:pattern> <element name="Name"><text/></element> <element name="Gender"> <choice> <value>Male</value> <value>Female</value> </choice> </element> <attribute name="Title"/> </element> </start> </grammar>
Processing
Since embedded Schematron rules are not part of the RELAX NG specification, most RELAX NG processors will not recognize and perform the validation constraints expressed by the rules. In fact, the embedded Schematron rules will be completely ignored by the processor since they are declared in a different namespace then RELAX NG's. This means that in order to use the Schematron rules for validation this functionality must be added. Currently there exists two options for how this can be achieved:
-
The embedded rules are extracted from the RELAX NG schema and concatenated into a Schematron schema. This schema can then be used for normal Schematron validation of the XML instance document. Since both RELAX NG and Schematron use XML syntax, it is fairly easy to perform this extraction using XSLT. This technique will be described in detail in the following section.
-
The RELAX NG processor can be modified to allow embedded Schematron-like rules and perform the validation as part of the normal RELAX NG validation. This technique is used in Sun's MSV which has an add-on that will validate XML instance documents against RELAX NG schemas annotated with rules and assertions. However, the way the rules are embedded in the RELAX NG schema is slightly different if this option is used compared to the method described in this chapter. Some of these differences include:
- The rules can only be embedded within a RELAX NG element
- The context for each rule or assertion is determined by the element where they are declared in the RELAX NG schema
More information and details about this are provided in the documentation included in the download of the MSV add-on.
It should be noted that the rules and assertion specified using this method doesn't really have anything to do with Schematron more than that they use the same name for the elements.
Validation using Extraction
To extract the embedded Schematron rules from the RELAX NG schema, the RNG2Schtrn.xsl stylesheet can be used. This stylesheet will also extract Schematron rules that have been declared in RELAX NG modules that are included in or referenced from the base schema.
The result from the script is a complete Schematron schema that can be used to validate the XML instance document using a Schematron processor as described in the section Introduction to Schematron. The XML instance document is then validated against the RELAX NG schema using a normal RELAX NG processor that will ignore all the embedded rules. This means that validation results are available from both Schematron validation and RELAX NG validation and if needed the results can be merged into one report. The whole process is described in the following figure:
As shown in the figure, there are two distinct paths in the validation process, which means that if timing requirements are important both paths can be implemented as a separate process and be executed in parallel.
A batch file that would (using the Win32 executable of Jing and Saxon) validate an XML instance document against both a RELAX NG schema and its embedded Schematron rules can look like this:
echo Running Jing validation on Sample.xml... jing PurchaseOrder.rng Sample.xml echo Creating Schematron schema from PurchaseOrder.rng... saxon -o PurchaseOrder.sch PurchaseOrder.rng RNG2Schtron.xsl echo Running Basic Schematron validation on file Sample.xml... saxon -o validate.xsl PurchaseOrder.sch schematron-basic.xsl saxon Sample.xml validate.xsl
So, first, the XML instance document is validated against the RELAX NG schema using Jing, and then it is validated with the embedded Schematron rules using Saxon. An output example could look like this:
Running Jing validation on Sample.xml... Error at URL "file:/C:/Sample.xml", line number 7: unknown element "BogusElement" Creating Schematron schema from PurchaseOrder.rng... Running Basic Schematron validation on file Sample.xml... From pattern "Check that each team is registered in the tournament": Assertion fails: "The item doesn't exist in the database." at /purchaseOrder[1]/items[1]/item[2] <item id="112-AX">...</> Done.
The Topologi Schematron Validator is a free graphical validator that can validate an XML instance document against a RELAX NG schema with embedded Schematron rules.
Summary
Schematron is a very good complement to RELAX NG, and there is little that cannot be validated by the combination of the two. This article has shown how to embed Schematron rules in a RELAX NG schema as well as providing guidelines for how to perform validation. A Java implementation of Schematron that works as a wrapper around Xalan can be downloaded from Topologi. This implementation also contains classes to perform RELAX NG validation (using Jing) with embedded Schematron rules.
It is up to each project and use-case to evaluate if embedding Schematron rules in RELAX NG schemas is a suitable technique to achieve more powerful validation. Following is a list of some advantages to take into account:
-
By combining the power of WXS and Schematron the limit for what can be performed in terms of validation is raised to a new level.
-
Many of the constraints that previously had to be checked in the application can now be moved out of the application and into the schema.
-
Since Schematron lets you provide your own error messages (the content of the assertion elements) you can assure that each message is as explanatory as needed.
And some disadvantages:
-
In time critical applications the time overhead of processing the embedded Schematron rules may be too long. This is especially true if XSLT implementations of Schematron are used in conjunction with the extraction method in the preceding section. Extensive use of XSLT's
document()
function is also very resource demanding and time consuming. -
Since the extraction of Schematron rules from a RELAX NG schema is performed with XSLT, embedded Schematron rules are only supported in RELAX NG schemas that use the full XML syntax.
The ability to combine embedded Schematron rules with a different schema language is not unique to RELAX NG and should be possible in all XML schema languages that use XML syntax and have an extensibility mechanism. The only thing needed is to modify the XSLT extractor stylesheet to accommodate the extension mechanism in the host XML schema language used.
Acknowledgements
I would like to thank Rick Jelliffe and Mike Fitzgerald for comments and suggestions on this article.
Resources
- An Introduction to Schematron (Eddie Robertsson, XML.com, November 2003).
- RELAX NG, Compared (Eric van der Vlist, XML.com, January 2002).