Learning to RELAX
October 16, 2000
Overview
In this article, we'll explore some of the more advanced features of the RELAX schema language by using it to create a schema for the XMLNews-Story Markup Language. Although the XMLNews-Story markup language has been superseded by the News Industry Text Format, I've chosen it because it's simple, quite widely used, looks a great deal like HTML, and its RELAX specification will use most of the features we want to focus on.
A RELAX Document's Outer Parts
A RELAX document is enclosed within a <module>
element. The opening tag
looks like
<module moduleVersion="1.2" relaxCoreVersion="1.0" targetNamespace="" xmlns="http://www.xml.gr.jp/xmlns/relaxCore">
The module tag is followed by the <interface>
element which specifies
the root element of the document that this RELAX schema is intended to validate. In
the case
of a news story, this is
<interface> <export label="nitf"/> </interface>
If you have two types of documents that are the same except for their root element, you can have one RELAX document that will validate either kind of document.
<interface> <export label="account-receivable"/> <export label="account-payable"/> </interface>
After the interface is specified, you specify the types of elements to be validated.
The Basics: Specifying Elements
Every element in the target markup language is described by an
<elementRule>
, which describes its content, and a corresponding
<tag>
. Each <elementRule>
plays a role
in the document's structure and has a label
to which other elements can refer.
Empty Elements
An empty element, such as <br/>
, is specified thus:
<elementRule role="br" label="br"> <empty/> </elementRule> <tag name="br"/>
If the label is omitted, it is presumed to be the same as the role
.
Sub-Elements and Element Types
A news story has a <byline>
, which in turn includes a
<bytag>
element that declares who wrote the story. The
<bytag>
element consists of string data. RELAX specifies this as
follows.
<elementRule role="byline"> <ref label="bytag"/> </elementRule> <tag name="byline"/> <elementRule role="bytag" type="string"/> <tag name="bytag"/>
The type
attribute specifies an element's datatype. The valid values for a
datatype include those taken from the XML Document Type Definition, such as
NMTOKEN
and ID
, as well as those introduced by XML Schema Part 2: Datatypes, such as
string
, float
, nonPositiveInteger
, etc.
Multiple Sub-Elements
As in HTML documents, an <nitf>
news story element contains
<head>
and <body>
elements respectively. In a news
story, they are both required.
<elementRule role="nitf" label="nitf"> <sequence> <ref label="head"/> <ref label="body"/> </sequence> </elementRule>
On the other hand, some elements may contain sub-elements in any order. News stories,
like
HTML, can have tables, and their table rows (<tr>
) can contain cells
(either <td>
or <th>
) in any order:
<elementRule role="tr"> <choice occurs="*"> <ref label="td"/> <ref label="th"/> </choice> </elementRule>
The occurs
attribute can be used with a <choice>
,
<sequence>
, or <ref>
tag. It has three possible
values.
* | occurs zero or more times |
+ | occurs one or more times |
? | occurs zero or one times |
You write a news story description list (<dl>
), containin an optional
list header, followed by one or more optional description titles, and required descriptive
data entries thus:
<elementRule role="dl"> <sequence> <ref label="lh" occurs="?"/> <sequence occurs="+"> <ref label="dt" occurs="?"/> <ref label="dd"/> </sequence> </sequence> </elementRule>
Mixed Content
Some information requires text that isn't between tags. For example, a news story
<location>
can look like
The movie was filmed in <location> <city>Pekin</city>, a small city in <state>Illinois</state> </location>.
The bold red text above is inside the <location>
tag, but it isn't part
of any sub-element. That makes <location>
a mixed element,
specified (in part) like
<elementRule role="location"> <mixed> <choice occurs="*"> <ref label="city"/> <ref label="state"/> <ref label="region"/> <!-- etc. --> </choice> </mixed> </elementRule>
Re-using Specifications
If you look at the specification for a news story, you'll find that a paragraph
(<p>
) is a mixed element that can contain, among others,
<br>
, <person>
, <location>
, and
quoted phrase (<q>
) sub-elements.
Likewise a quoted phrase can contain exactly the same sub-elements. Rather than specify the common sub-elements twice, RELAX allows you to specify a hedge rule:
<hedgeRule label="common.elements"> <choice> <ref label="br"/> <ref label="person"/> <ref label="location"/> <ref label="q"/> </choice> </hedgeRule>
Once a hedge rule is established, other specifications can refer to it by using the
<hedgeRef
> tag.
<elementRule role="p"> <mixed> <hedgeRef label="common.elements" occurs="*"/> </mixed> </elementRule> <elementRule role="q"> <mixed> <hedgeRef label="common.elements" occurs="*"/> </mixed> </elementRule>
Note that the <hedgeRule>
can only describe elements; it cannot be
<mixed>
. That's why each <elementRule>
has its own
<mixed>
tag in the example above.
You may have noticed that the specification says that a quoted phrase can contain another quoted phrase. While this may be unusual in a news story, it's technically possible, and RELAX is not bothered by this at all.
Of course, XML does not consist of elements alone, as we'll see in the next section.
Attributes
XML tags can have attributes, and RELAX allows you to specify them in great detail.
A news
story, like HTML, can include an <img>
tag which has a required
src
and optional width
and height
attributes. RELAX
treats attributes as part of the tag, so the full specification of an image is as
follows:
<elementRule role="img"> <sequence> <!-- its sub-elements --> </sequence> </elementRule> <tag name="img"> <attribute name="src" required="true" type="string"/> <attribute name="width" type="positiveInteger"/> <attribute name="height" type="positiveInteger"/> </tag>
Notice that RELAX lets you specify that an image's width and height must be positive integers.
Just as it was possible to create a re-usable element specification, so it is possible
to
create a set of attributes that can be reused by many tags. For example, both the
table body
(<tbody>
) and table header (<th>
) elements have
identical attributes for determining their horizontal and vertical alignment. This
makes
those attributes a perfect candidate for an attribute pool.
<attPool role="alignment"> <attribute name="align" type="string"> <enumeration value="left"/> <enumeration value="center"/> <enumeration value="right"/> <enumeration value="justify"/> </attribute> <attribute name="valign" type="string"> <enumeration value="top"/> <enumeration value="middle"/> <enumeration value="bottom"/> <enumeration value="baseline"/> </attribute> </attPool>
And now this attribute pool may be used in multiple tags.
<tag name="tbody"> <ref role="alignment"/> </tag> <tag name="th"> <ref role="alignment"/> <attribute name="rowspan" type="integer"> <minInclusive value="1"/> </attribute> <attribute name="colspan" type="integer"> <minInclusive value="1"/> </attribute> </tag>
There are two things to note about this example:
- The
<th>
tag has attributes in addition to those included via the reference to the attribute pool; - It's possible to set a range of valid values for numeric attributes.
Context-sensitive Elements
Practically everything we've done up to this point is possible with DTD specifications. Now let's examine something that RELAX can do that other methods can't do.
As the definition of a news story currently stands, both a list item
(<li>
) and the <body.content>
tag may contain an
information block (<block>
) element. An information block may contain,
among other things, <p>
, <ul>
, <ol>
and <img>
elements.
Let's say that we would like to set up news stories so that a block in the main body
content can contain all these elements, but a block inside of list items may not
contain images. To do this, we first set up the element rule for a
<block>
in the body content and the corresponding tag. Note that we
have a label
that is different from the role
.
<elementRule role="block" label="block-in-content"> <mixed> <choice occurs="*"> <ref label="p"/> <ref label="ul"/> <ref label="ol"/> <ref label="img"/> </choice> </mixed> </elementRule> <tag name="block"/>
We then add another rule for the element that plays the role of a block
, but
this rule is labeled for use in a list.
<elementRule role="block" label="block-in-list"> <mixed> <choice occurs="*"> <ref label="p"/> <ref label="ul"/> <ref label="ol"/> </choice> </mixed> </elementRule>
Both of these rules are for elements that play the role of a “block.” In the
definition of the <body.content>
element, we refer to the appropriate
<block>
rule.
<elementRule role="body.content"> <ref label="block-in-content"/> </elementRule>
And in the definition of a list item, we refer to the other rule.
<elementRule role="li"> <ref label="block-in-list"/> </elementRule>
The <block>
tag will now be validated differently, depending on the
context in which it appears.
Controlling Element Models with Attributes
A news story can use the <money>
tag to indicate that a number is a
monetary amount. Its RELAX definition looks like
<elementRule role="money" type="decimal"/> <tag name="money"> <attribute name="unit" type="string"/> </tag>
Let's say we'd like to extend this definition so that there are two types of
<money>
elements, one for costs and another for balances. An item's
cost is always positive; a company's balance can be either positive or negative. In
other
words, we'd like to be able to specify that
After paying <money unit="dollar" usage="cost">5</money> dollars, the account had a balance of <money unit="dollar" usage="balance">-10</money> dollars.
In this case, rather than using the context to choose the content of a tag, we want to use one of the tag's attributes to determine what content that tag may contain. To accomplish this, we'll need two element rules.
<!-- costs are greater than or equal to zero --> <elementRule role="money-cost" label="money" type="decimal"> <minInclusive value="0"/> </elementRule> <!-- balances are unrestricted --> <elementRule role="money-balance" label="money" type="decimal"/>
In the previous section, we had element rules with the same role and different labels; here we have element rules with the same label and different roles. Now we define elements that refer to the different roles:
<tag name="money" role="money-cost"> <attribute name="unit" type="string"/> <attribute name="usage" type="string"> <enumeration value="cost"/> </attribute> </tag> <tag name="money" role="money-balance"> <attribute name="unit" type="string"/> <attribute name="usage" type="string"> <enumeration value="balance"/> </attribute> </tag>
Thus, the <money>
tag plays the money-cost
role when the
usage
attribute is cost
; and it plays the
money-balance
role when its usage
is balance
.
Summary
Resources |
•RELAX Home Page |
RELAX is a powerful markup language that permits you to specify how other XML documents are to be validated. You may, as with other specification methods
- specify an element with an ordered sequence of sub-elements;
- specify an element with a choice of sub-elements;
- permit mixed content (text outside of tags); and
- specify attributes for tags.
Additionally, RELAX gives you the power to
- specify in great detail the type and range of data that an element may contain;
- specify the type and range of values that an attribute may have;
- permit a single tag to have different content depending upon the context in which it is used; and
- permit a tag to have different content depending upon the value of that tag's attributes