Learning to RELAX

October 16, 2000

Table of Contents

•Overview
•A RELAX document's outer parts
•The Basics: Specifying Elements
•Re-using Specifications
•Attributes
•Context-sensitive Elements
•Controlling Element Models with Attributes
•Summary
•Resources

Overview

In this article, we'll explore some of the more advanced features of the RELAX schema language by using it to create a schema for the XMLNews-Story Markup Language. Although the XMLNews-Story markup language has been superseded by the News Industry Text Format, I've chosen it because it's simple, quite widely used, looks a great deal like HTML, and its RELAX specification will use most of the features we want to focus on.

A RELAX Document's Outer Parts

A RELAX document is enclosed within a <module> element. The opening tag looks like


   <module

      moduleVersion="1.2"

      relaxCoreVersion="1.0"

      targetNamespace=""

      xmlns="http://www.xml.gr.jp/xmlns/relaxCore">

The module tag is followed by the <interface> element which specifies the root element of the document that this RELAX schema is intended to validate. In the case of a news story, this is


   <interface>

      <export label="nitf"/>

   </interface>

If you have two types of documents that are the same except for their root element, you can have one RELAX document that will validate either kind of document.


   <interface>

      <export label="account-receivable"/>

      <export label="account-payable"/>

   </interface>

After the interface is specified, you specify the types of elements to be validated.

The Basics: Specifying Elements

Every element in the target markup language is described by an <elementRule>, which describes its content, and a corresponding <tag>. Each <elementRule> plays a role in the document's structure and has a label to which other elements can refer.

Empty Elements

An empty element, such as <br/>, is specified thus:


   <elementRule role="br" label="br">

      <empty/>

   </elementRule>

   <tag name="br"/>

If the label is omitted, it is presumed to be the same as the role.

Sub-Elements and Element Types

A news story has a <byline>, which in turn includes a <bytag> element that declares who wrote the story. The <bytag> element consists of string data. RELAX specifies this as follows.


   <elementRule role="byline">

      <ref label="bytag"/>

   </elementRule>

   <tag name="byline"/>

  

  <elementRule role="bytag" type="string"/>

  <tag name="bytag"/>

The type attribute specifies an element's datatype. The valid values for a datatype include those taken from the XML Document Type Definition, such as NMTOKEN and ID, as well as those introduced by XML Schema Part 2: Datatypes, such as string, float, nonPositiveInteger, etc.

Multiple Sub-Elements

As in HTML documents, an <nitf> news story element contains <head> and <body> elements respectively. In a news story, they are both required.


   <elementRule role="nitf" label="nitf">

      <sequence>

         <ref label="head"/>

         <ref label="body"/>

      </sequence>

   </elementRule>

On the other hand, some elements may contain sub-elements in any order. News stories, like HTML, can have tables, and their table rows (<tr>) can contain cells (either <td> or <th>) in any order:


   <elementRule role="tr">

      <choice occurs="*">

         <ref label="td"/>

         <ref label="th"/>

      </choice>

   </elementRule>

The occurs attribute can be used with a <choice>, <sequence>, or <ref> tag. It has three possible values.

*	occurs zero or more times
+	occurs one or more times
?	occurs zero or one times

You write a news story description list (<dl>), containin an optional list header, followed by one or more optional description titles, and required descriptive data entries thus:


   <elementRule role="dl">

      <sequence>

         <ref label="lh" occurs="?"/>

         <sequence occurs="+">

            <ref label="dt" occurs="?"/>

            <ref label="dd"/>

         </sequence>

      </sequence>

   </elementRule>

Mixed Content

Some information requires text that isn't between tags. For example, a news story <location> can look like


   The movie was filmed in

   <location>

    <city>Pekin</city>, a

    small city in

    <state>Illinois</state>

   </location>.

The bold red text above is inside the <location> tag, but it isn't part of any sub-element. That makes <location> a mixed element, specified (in part) like


  <elementRule role="location">

     <mixed>

        <choice occurs="*">

           <ref label="city"/>

           <ref label="state"/>

           <ref label="region"/>

           <!-- etc. -->

        </choice>

     </mixed>

  </elementRule>

Re-using Specifications

If you look at the specification for a news story, you'll find that a paragraph (<p>) is a mixed element that can contain, among others, <br>, <person>, <location>, and quoted phrase (<q>) sub-elements.

Likewise a quoted phrase can contain exactly the same sub-elements. Rather than specify the common sub-elements twice, RELAX allows you to specify a hedge rule:


   <hedgeRule label="common.elements">

      <choice>

         <ref label="br"/>

         <ref label="person"/>

         <ref label="location"/>

         <ref label="q"/>

      </choice>

   </hedgeRule>

Once a hedge rule is established, other specifications can refer to it by using the <hedgeRef> tag.


   <elementRule role="p">

      <mixed>

         <hedgeRef label="common.elements" occurs="*"/>

      </mixed>

   </elementRule>



   <elementRule role="q">

      <mixed>

         <hedgeRef label="common.elements" occurs="*"/>

      </mixed>

   </elementRule>

Note that the <hedgeRule> can only describe elements; it cannot be <mixed>. That's why each <elementRule> has its own <mixed> tag in the example above.

You may have noticed that the specification says that a quoted phrase can contain another quoted phrase. While this may be unusual in a news story, it's technically possible, and RELAX is not bothered by this at all.

Of course, XML does not consist of elements alone, as we'll see in the next section.

Attributes

XML tags can have attributes, and RELAX allows you to specify them in great detail. A news story, like HTML, can include an <img> tag which has a required src and optional width and height attributes. RELAX treats attributes as part of the tag, so the full specification of an image is as follows:


<elementRule role="img">

<sequence>

<!-- its sub-elements -->

</sequence>

</elementRule>



<tag name="img">

<attribute name="src" required="true" type="string"/>

<attribute name="width" type="positiveInteger"/>

<attribute name="height" type="positiveInteger"/>

</tag>

Notice that RELAX lets you specify that an image's width and height must be positive integers.

Just as it was possible to create a re-usable element specification, so it is possible to create a set of attributes that can be reused by many tags. For example, both the table body (<tbody>) and table header (<th>) elements have identical attributes for determining their horizontal and vertical alignment. This makes those attributes a perfect candidate for an attribute pool.


<attPool role="alignment">

<attribute name="align" type="string">

<enumeration value="left"/>

<enumeration value="center"/>

<enumeration value="right"/>

<enumeration value="justify"/>

</attribute>

<attribute name="valign" type="string">

<enumeration value="top"/>

<enumeration value="middle"/>

<enumeration value="bottom"/>

<enumeration value="baseline"/>

</attribute>

</attPool>

And now this attribute pool may be used in multiple tags.


<tag name="tbody">

<ref role="alignment"/>

</tag>



<tag name="th">

<ref role="alignment"/>

<attribute name="rowspan" type="integer">

<minInclusive value="1"/>

</attribute>

<attribute name="colspan" type="integer">

<minInclusive value="1"/>

</attribute>

</tag>

There are two things to note about this example:

The <th> tag has attributes in addition to those included via the reference to the attribute pool;
It's possible to set a range of valid values for numeric attributes.

Context-sensitive Elements

Practically everything we've done up to this point is possible with DTD specifications. Now let's examine something that RELAX can do that other methods can't do.

As the definition of a news story currently stands, both a list item (<li>) and the <body.content> tag may contain an information block (<block>) element. An information block may contain, among other things, <p>, <ul>, <ol> and <img> elements.

Let's say that we would like to set up news stories so that a block in the main body content can contain all these elements, but a block inside of list items may not contain images. To do this, we first set up the element rule for a <block> in the body content and the corresponding tag. Note that we have a label that is different from the role.


<elementRule role="block" label="block-in-content">

<mixed>

<choice occurs="*">

<ref label="p"/>

<ref label="ul"/>

<ref label="ol"/>

<ref label="img"/>

</choice>

</mixed>

</elementRule>



<tag name="block"/>

We then add another rule for the element that plays the role of a block, but this rule is labeled for use in a list.


<elementRule role="block" label="block-in-list">

<mixed>

<choice occurs="*">

<ref label="p"/>

<ref label="ul"/>

<ref label="ol"/>

</choice>

</mixed>

</elementRule>

Both of these rules are for elements that play the role of a “block.” In the definition of the <body.content> element, we refer to the appropriate <block> rule.


<elementRule role="body.content">

<ref label="block-in-content"/>

</elementRule>

And in the definition of a list item, we refer to the other rule.


<elementRule role="li">

<ref label="block-in-list"/>

</elementRule>

The <block> tag will now be validated differently, depending on the context in which it appears.

Controlling Element Models with Attributes

A news story can use the <money> tag to indicate that a number is a monetary amount. Its RELAX definition looks like


<elementRule role="money" type="decimal"/>

<tag name="money">

<attribute name="unit" type="string"/>

</tag>

Let's say we'd like to extend this definition so that there are two types of <money> elements, one for costs and another for balances. An item's cost is always positive; a company's balance can be either positive or negative. In other words, we'd like to be able to specify that


After paying  

<money unit="dollar" usage="cost">5</money> dollars,

the account had a balance of

<money unit="dollar" usage="balance">-10</money> dollars.

In this case, rather than using the context to choose the content of a tag, we want to use one of the tag's attributes to determine what content that tag may contain. To accomplish this, we'll need two element rules.


<!-- costs are greater than or equal to zero -->

<elementRule role="money-cost" label="money" type="decimal">

<minInclusive value="0"/>

</elementRule>



<!-- balances are unrestricted -->

<elementRule role="money-balance" label="money" type="decimal"/>

In the previous section, we had element rules with the same role and different labels; here we have element rules with the same label and different roles. Now we define elements that refer to the different roles:


<tag name="money" role="money-cost">

<attribute name="unit" type="string"/>

<attribute name="usage" type="string">

<enumeration value="cost"/>

</attribute>

</tag>



<tag name="money" role="money-balance">

<attribute name="unit" type="string"/>

<attribute name="usage" type="string">

<enumeration value="balance"/>

</attribute>

</tag>

Thus, the <money> tag plays the money-cost role when the usage attribute is cost; and it plays the money-balance role when its usage is balance.

Summary

Resources

•RELAX Home Page
• Detailed RELAX documentation: "How to RELAX"
•RELAX FAQ
•RELAX Mailing List
•RELAX news from XMLhack
•Relaxer, Java class generator for RELAX
•RELAX verifier for XSLT

RELAX is a powerful markup language that permits you to specify how other XML documents are to be validated. You may, as with other specification methods

specify an element with an ordered sequence of sub-elements;
specify an element with a choice of sub-elements;
permit mixed content (text outside of tags); and
specify attributes for tags.

Additionally, RELAX gives you the power to

specify in great detail the type and range of data that an element may contain;
specify the type and range of values that an attribute may have;
permit a single tag to have different content depending upon the context in which it is used; and
permit a tag to have different content depending upon the value of that tag's attributes