Valid Frustrations
September 26, 2001
Q: I need help with two DTD questions...
I am having trouble with a DTD not enforcing some rules I'm trying to create.
Rule 1: Enforce a range of occurrences of one element inside another
For example, a fruit_basket
element must contain between 9 and 11
banana
elements. I think the following works:
<!ELEMENT fruit_basket (
(banana, banana, banana, banana,
banana, banana, banana, banana, banana) |
(banana, banana, banana,
banana, banana, banana, banana, banana, banana, banana) |
(banana,
banana, banana, banana, banana, banana, banana, banana, banana, banana, banana)>
Is there a better way to do this?
A: No. Frustrating, isn't it?
What you're working with here is called a content model for (in this case) the
fruit_basket
element type. This is a very simple example; constructing a
content model is even worse for, say, a hypothetical month
element in even a
simple calendar application: some months may legally contain 31 days, some 30, and
one
either 28 or 29, depending on the year.
As you probably know (or can guess from the name), a content model specifies what child elements, their sequence, and how many of each a given element may contain. It's the "how many" specification which is giving you fits here. The only shortcuts available are the following special characters, which may be appended to a child element name in the content model:
Character | Meaning |
---|---|
(none) | This child must occur only once |
+
|
This child may occur one or more times |
?
|
This child may occur once, or not at all |
*
|
Any number of occurrences of this child is legitimate (the "0 or more" option) |
For instance, you can require that a fruit_basket
element must have at least
one banana
element like this:
<!ELEMENT fruit_basket (banana+)>
This limitation of DTD content models is one which XML Schema is designed to fix.
I don't
have space to provide details of that spec here, but, in general, an element type's
content
model is built by declaring that element type with an xsd:complexType
element;
children of this element include various xsd:element
elements, each of which
may have a minOccurs
and a maxOccurs
attribute. The values of
these attributes are integers, representing respectively the minimum and maximum number
of
times which that child element type may appear within that parent. The default value
for
both is 1, which is consistent with DTD syntax.
Thus, a simple XML Schema declaration of the fruit_basket
element type, with
your desired number of banana
children, might look like
<xsd:complexType name="fruit_basket">
<xsd:element
name="banana" minOccurs="9" maxOccurs="11"/>
</xsd:complexType>
Using XML Schema may not solve all your problems: the spec is still so new that it's not as widely supported as DTDs. But it at least gets you in the right ballpark.
Rule 2: Enforce a numbering scheme on child elements to distinguish them from their siblings
Example: Suppose a fruit_basket
contains three banana
s. Each
banana
should be numbered, 1 through 3; this can be done as an attribute or
an element. But I don't know how to do this using either method. If there were only
one
fruit_basket
, I could use ID-type attributes. But those IDs need to be unique
over the entire document, and my document will contain multiple fruit_basket
s.
What if each needs to have a "banana #1"?
A: The answer to this question is the same as the answer to your first: you're asking DTDs to do something they can't do.
What you're after here is some way to constrain the document's content, not its structure.
DTDs absolutely cannot constrain an element's text (#PCDATA) content. (The XML spec
itself
loosely constrains that content: it must fall within certain specified ranges of Unicode
values, and it may not include unescaped markup-significant characters like
<
and &
.) That leaves you with the "constrain via
attribute values" approach.
You can approximate, yet still be frustratingly far away from, an answer using an
ATTLIST
declaration for the banana
element which restricts the attribute to values 1,
2, or 3. For instance, your DTD might look something like this:
<!ELEMENT fruit_basket (banana*)>
<!ELEMENT banana EMPTY>
<!ATTLIST banana banana_number (1 | 2 | 3) "1" >
Again, though, this isn't a complete (or even very satisfying) solution:
![]() |
|
Also in XML Q&A |
|
- You still can't limit the number of
banana
s in afruit_basket
in a useful way. - You can't guarantee that there's a relationship between the actual number of
banana
children and theirbanana_number
attribute values. (This DTD allows you to have 25banana
children in afruit_basket
, for instance -- each with abanana_number
whose value is 1.)
The kinds of problems you're struggling to solve here might be amenable to using XML Schema. But there's another, often overlooked approach to validating document content (of both elements and attributes) which stands completely outside the normal DTD-vs.-XML Schema axis: validate with an XSLT stylesheet.
But I don't want to minimize how much work may be involved, especially if you aren't
already comfortable with XSLT. Still, here's a stylesheet which tests for both the
number of
banana
elements and the correspondence between the
banana_number
attribute's value and that banana
element's ordinal
position within its fruit_basket
parent:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns="http://www.w3.org/1999/xhtml" version="1.0">
<!-- Process fruit_basket element(s) -->
<xsl:template match="fruit_basket">
<html>
<body>
<!-- Validate number of
banana children in fruit_basket -->
<xsl:choose>
<!-- Note escaped
form of boolean > and < operators -->
<xsl:when
test="count(banana) > 8 and count(banana) < 12">
<h3># of banana children OK</h3>
</xsl:when>
<xsl:otherwise>
<h3>Whoops! # of banana children is <xsl:value-of
select="count(banana)"/></h3>
</xsl:otherwise>
</xsl:choose>
<!-- Set up table of info
about banana children -->
<table border="1">
<tr>
<th>banana #</th>
<th>banana_number</th>
</tr>
<!-- Process all
banana children of fruit_basket -->
<xsl:apply-templates
select="banana"/>
</table>
</body>
</html>
</xsl:template>
<!-- Process banana element(s) -->
<xsl:template match="banana">
<!-- Each
banana element goes in its own table row -->
<tr>
<th><xsl:value-of select="position()"/></th>
<td>
<!-- Test for banana's position matching banana_number attribute
value-->
<xsl:choose>
<xsl:when
test="position() = @banana_number">
OK
</xsl:when>
<xsl:otherwise>
<strong>Whoops!</strong>... <xsl:value-of select="@banana_number"/>
</xsl:otherwise>
9557xnbo
</xsl:choose>
</td>
</tr>
</xsl:template>
</xsl:stylesheet>
This stylesheet "transforms" the source document into an XHTML document, displaying
the
result of the validation process. (If your XSLT processor supports it, you can use
the
xsl:message
element to notify the source document's author of the document's
validity, instead of transforming to XHTML.)
Assume the following simple document:
<fruit_basket>
<banana banana_number="8"/>
<banana banana_number="2"/>
</fruit_basket>
With this document as its source tree,
the style sheet produces XHTML which looks like the figure at right when viewed in
a
browser.
Note that this approach to validation is codified in the Schematron project. It's an extremely powerful (and cool) way to perform almost any "validation" you can think of, without the limitations of either DTDs or XML Schema.