Using W3C XML Schema - Part 2
December 13, 2000
Advanced W3C XML Schema
This is the second part of our comphrensive tutorial and reference on W3C XML Schemas. If you have not already read the first installment of Using XML Schemas, we advise you to do so before reading this article.
Content Types
Table of Contents |
•Content Types |
In the first part of this series we examined the default content type behavior, modeled after data-oriented documents, where complex type elements are element and attribute only, and simple type elements are character data without attributes.
The W3C XML Schema Definition Language also supports the definition of empty content elements, and simple content elements (those that contain only character data) with attributes.
Empty content elements are defined using a regular xsd:complexType
construct
and by purposefully omitting the definition of a child element. The following construct
defines an empty book element accepting an isbn attribute:
<xsd:element name="book"> <xsd:complexType> <xsd:attribute name="isbn" type="isbnType"/> </xsd:complexType> </xsd:element>
Simple content elements, i.e. character data elements with attributes, can be derived
from
simple types using xsd:simpleContent
. The book element defined above can thus
be extended to accept a text value using:
<xsd:element name="book"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute name="isbn" type="isbnType"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>
Note the location of the attribute definition, showing that the extension is achieved through the addition of the attribute. This definition will accept the following XML element:
<book isbn="0836217462"> Funny book by Charles M. Schulz. Its title (Being a Dog Is a Full-Time Job) says it all ! </book>
W3C XML Schema supports mixed content though the mixed attribute in the
xsd:complexType
element. Consider
<xsd:element name="book"> <xsd:complexType mixed="true"> <xsd:all> <xsd:element name="title" type="xsd:string"/> <xsd:element name="author" type="xsd:string"/> </xsd:all> <xsd:attribute name="isbn" type="xsd:string"/> </xsd:complexType> </xsd:element>
which will validate an XML element such as
<book isbn="0836217462"> Funny book by <author>Charles M. Schulz</author>. Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all ! </book>
Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the sub-elements, which can be expressed in the same way as simple content models. While this is a significant improvement over XML 1.0 DTDs, note that the values of the character data, and its location relative to the child elements, cannot be constrained.
Constraints
Table of Contents |
•Content
Types |
W3C XML Schema provides several flexible XPath-based features for describing uniqueness
constraints and corresponding references constraints. The first of these, a simple
uniqueness declaration, is declared with the xsd:unique
element. The following
declaration, within the context of our book document, indicates that the character
name must be unique.
<xsd:unique name="charNameMustBeUnique"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:unique>
This location of the xsd:unique
element in the schema gives the context node
in which the constraint holds. By inserting xsd:unique
under our book
element, we specify that the character has to be unique in the context of a book only.
The two XPaths defined in the uniqueness constraint are evaluated relative to the
context
node. The first of these paths is defined by the selector
element. The purpose
is to define the element which has the uniqueness constraint -- the node to which
the
selector points must be an element node.
The second path, specified in the xsd:field
element. is evaluated relative to
the element identified by the xsd:selector
and can be an element or an
attribute node. This is the node whose value will be checked for uniqueness. Uniqueness
over
a combination of several values can be specified by adding other xsd:field
elements within xsd:unique
.
Keys
The second constraint construct, xsd:key
, is similar to
xsd:unique
, except that the value specified as unique can be used as a
key. This means that it has to be non-null, and that it can be referenced. To use
the character name as a key, we can replace the xsd:unique
by
xsd:key
.
<xsd:key name="charNameIsKey"> <xsd:selector xpath="character"/> <xsd:field xpath="name"/> </xsd:key>
The third construct, xsd:keyref
, allows us to define a reference to a key. To
show its usage, we introduce the friend-of
element, to be used against
characters.
<character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character>
To indicate that friend-of
needs to refer to a character from the same book,
we write, at the same level as we defined our key constraint, the following:
<xsd:keyref name="friendOfIsCharRef" refer="charNameIsKey"> <xsd:selector xpath="character"/> <xsd:field xpath="friend-of"/> </xsd:keyref>
These capabilities are nearly independent of the other features in a schema. They are disconnected from the definition of the datatypes. The only point anchoring them to the schema is the place where they are defined, which establishes the scope of the uniqueness constraints.
Building Usable -- and Reusable -- Schemas
Table of Contents |
•Content
Types |
Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides an alternative to XML comments and processing instructions that might be easier to handle for supporting tools.
Human readable documentation can be defined by xsd:documentation
elements,
while information targeted at applications should be included in xsd:appinfo
elements. Both elements must be included in an xsd:annotation
element. They
accept optional xml:lang
and source
attributes. The
source
attribute is a URI reference that can be used to indicate the purpose
of the appinfo to the processing application.
The xsd:annotation
elements can be added at the beginning of most schema
constructs as shown in example below. The appinfo section demonstrates how custom
namespaces
and schemes might allow the binding of an element to a Java class from within the
schema.
<xsd:element name="book"> <xsd:annotation> <xsd:documentation xml:lang="en"> Top level element. </xsd:documentation> <xsd:documentation xml:lang="fr"> Element racine. </xsd:documentation> <xsd:appinfo source="http://example.com/foo/"> <bind xmlns="http://example.com/bar/"> <class name="Book"/> </bind> </xsd:appinfo> </xsd:annotation> ...
Composing schemas from multiple files
For those who want to define a schema using several XML documents -- either to split up a large schema or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for including external schemas.
The first, xsd:include
, is similar to a copy and paste of the definitions of
the included schema: it's an inclusion, and as such it doesn't allow any overriding
of
definitions of the included schema. It can be used in this way:
<xsd:include schemaLocation="character.xsd"/>
The second inclusion mechanism, xsd:redefine
, is similar to
xsd:include
, except that it lets you redefine the declarations from the
included schema.
<xsd:redefine schemaLocation="character12.xsd"> <xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="40"/> </xsd:restriction> </xsd:simpleType> </xsd:redefine>
Note that the declarations that are redefined must be placed in the
xsd:redefine
element.
We've already seen many features that can be used together with xsd:include
and xsd:redefine
to create libraries of schemas. We've seen how we can
reference previously defined elements; how we can define datatypes by derivation and
use
them; and how we can define and use groups of attributes. We've also seen the parallel
between elements and objects and datatypes and classes. There are other features borrowed
from object oriented design that can be used to create reusable schemas.
Abstract types
The first feature derived from object oriented design is the substitution group. Unlike
the
features we've seen so far, a substitution group isn't defined explicitly through
a W3C XML
Schema element but through referencing a common element (called the head), using a
substitutionGroup
attribute. The head element doesn't hold any specific
declaration but must be global. All the elements within a substitution group need
to have a
type that is either the same type as the head element, or can be derived from it.
Then they
can all be used in place of the head element. In the following example the element
"surname"
can be used anywhere an element "name" has been defined.
<xsd:element name="name" type="xsd:string"/> <xsd:element name="surname" type="xsd:string" substitutionGroup="name" />
Now we can also define a generic "name-elt" element, head of a substitution group,
that
couldn't be used directly but should be used in one of its derived forms. This is
done
through declaring the element as abstract, analagously to abstract classes in object
oriented languages. The following example defines name-elt
as an abstract
element that should be replaced by either name or surname everywhere it is referenced.
<xsd:element name="name-elt" type="xsd:string" abstract="true"/> <xsd:element name="name" type="xsd:string" substitutionGroup="name-elt" /> <xsd:element name="surname" type="xsd:string" substitutionGroup="name-elt" />
Final types
We could, on the other hand, wish to control derivation performed on a datatype. W3C
XML
Schema supports this though the final
attribute in an
xsd:complexType
or xsd:element
element. This attribute can take
the values restriction, extension and #all to block derivation by
restriction, extension or any derivation. The following snippet would, for instance,
forbid
any derivation of the characterType complex type.
<xsd:complexType name="characterType" final="#all">
The final
attribute can operate only on elements and complex types. W3C XML
Schema provides a fine-grained mechanism that operates on each facet to control the
derivation of simple types. This attribute is called fixed
, and when its value
is set to true, the facet cannot be further modified (but other facets can still be
added or modified). The following prevents the size of our nameType simple type from
being redefined.
<xsd:simpleType name="nameType"> <xsd:restriction base="xsd:string"> <xsd:maxLength value="32" fixed="true"/> </xsd:restriction> </xsd:simpleType>
Namespaces
Table of Contents |
•Content
Types |
Namespace support in W3C XML Schema is flexible yet straightforward. It not only allows the use of any prefix in instance documents (unlike DTDs), but also lets you open your schemas to accept unknown elements and attributes from known or unknown namespaces.
Each W3C XML Schema document is bound to a specific namespace through the
targetNamespace
attribute or to the absence of namespace through the lack of
such an attribute. We need at least one schema document per namespace we want to define
(elements and attributes without namespaces can be defined in any schema, though).
Until now we have omitted the targetNamespace
attribute, which means that we
were working without namespaces. To get into namespaces, let's imagine that our example
belongs to a single namespace.
<book isbn="0836217462" xmlns="http://example.org/ns/books/">
The least intrusive way to adapt our schema is to add more attributes to our
xsd:schema
element.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" xmlns="http://example.org/ns/books/" targetNamespace="http://example.org/ns/books/" elementFormDefault="qualified" attributeFormDefault="unqualified" >
The namespace declarations play an important role. The first
(xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"
) says not only that we've
chosen to use the prefix xsd
to identify the elements that will be W3C XML
Schema instructions, but also that we will prefix the W3C XML Schema predefined datatypes
with xsd
, as we have done in all our examples thus far. Understand that we
could have chosen any prefix instead of xsd
. We could even make
http://www.w3.org/2000/10/XMLSchema our default namespace. In this case, we would
not have prefixed the W3C XML Schema elements.
Since we are working with the http://example.org/ns/books/ namespace, we define it as our default namespace. This means that we won't prefix the references to objects (datatypes, elements, attributes, etc.) belonging to this namespace. Again we could have chosen any prefix to identify this namespace.
The targetNamespace
attribute lets you define, independently of the namespace
declarations, which namespace is described in this schema. If you need to reference
objects
belonging to this namespace, which is usually the case except when using a pure Russian
Doll
design, you need to provide a namespace declaration in addition to the
targetNamespace
.
The final two attributes in the example, (elementFormDefault
and
attributeFormDefault
), are a facility provided by W3C XML Schema to control,
within a single schema, whether attributes and elements are considered by default
to be
qualified (in a namespace). This differentiation between qualified and unqualified
can be
indicated by specifying the default values, as above, but also when defining the element
or
attribute, by adding a form
attribute of value qualified or
unqualified.
It is important to note that only local elements and attributes can be specified as unqualified. All globally defined elements and attributes must always be qualified.
Importing definitions from external namespaces
W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some attributes to identify the namespace of data types, elements, attributes, etc. For instance, we've used this feature all along in our examples to identify the W3C XML Schema predefined datatypes. This mechanism can be extended to import definitions from any other namespace and so reuse them in our schemas.
Reusing definitions from other namespaces is done through a three-step process. This
process needs to be done even for the XML 1.0 namespace in order to declare attributes
such
as xml:lang
. First, the namespace must be defined as usual.
<xsd:schema xmlns:xsd="http://www.w3.org/2000/10/XMLSchema" targetNamespace="http://example.org/ns/books/" xmlns:xml="http://www.w3.org/XML/1998/namespace" elementFormDefault="qualified" >
Then W3C XML Schema needs to be informed of the location at which it can find the
schema
corresponding to the namespace. This is done using an xsd:import
element.
<xsd:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="myxml.xsd"/>
W3C XML Schema now knows that it should attempt to find any reference belonging to the XML namespace in a schema located at myxml.xsd. We can now use the external definition.
<xsd:element name="title"> <xsd:complexType> <xsd:simpleContent> <xsd:extension base="xsd:string"> <xsd:attribute ref="xml:lang"/> </xsd:extension> </xsd:simpleContent> </xsd:complexType> </xsd:element>
You may wonder why we've chosen to reference the xml:lang
attribute from the
XML namespace rather than creating an attribute with a type xml:lang
. We've
done so because there is an important difference between referencing an attribute
(or an
element) and referencing a datatype when namespaces are concerned.
- Referencing an element or an attribute imports the whole thing with its name and namespace.
- Referencing a datatype imports only its definition, leaving you with the task of giving a name to the element or attribute you're defining, and places your definition in the target namespace (or no namespace if your attribute or element is unqualified).
Including unknown elements
To finish this section about namespaces, we need to see how, as promised in the
introduction, we can open our schema to unknown elements, attributes and namespaces.
This is
done using xsd:any
and xsd:anyAttribute
, allowing, respectively,
the inclusion of any element or attribute.
For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare
<xsd:complexType name="descType" mixed="true"> <xsd:sequence> <xsd:any namespace="http://www.w3.org/1999/xhtml" minOccurs="0" maxOccurs="unbounded" processContents="skip"/> </xsd:sequence> </xsd:complexType>
The xsd:anyAttribute
gives the same functionality for attribute
definitions.
The type descType is now mixed content and accepts an unbounded number of any
elements from the http://www.w3.org/1999/xhtml namespace. The
processContents
attribute is set to skip, telling a W3C XML Schema
processor that no validation of these elements should be attempted. The other permissible
values for this attribute are strict, asking to validate these elements, or
lax, asking the processor to validate them when possible. The
namespace
attribute accepts a whitespace-separated list of URIs, as well as
the special values ##any (any namespace), ##local (non-qualified elements),
##targetNamespace (the target namespace) or ##other (any namespace other
than the target).
W3C XML Schema and Instance Documents
Table of Contents |
•Content
Types |
We've now covered most of the features of W3C XML Schema, but we still need to have
a
glance at some extensions that you can use within your instance documents. In order
to
differentiate these other features, a separate namespace,
http://www.w3.org/2000/10/XMLSchema-instance, is used, usually associated with the
prefix xsi
.
The xsi:schemaLocation
and xsi:noNamespaceSchemaLocation
attributes allow you to tie a document to its W3C XML Schema. This link is not
mandatory, and other indications can be given using application-dependent mechanisms
(such
as a parameter on a command line), but it does help W3C XML Schema aware tools to
locate a
schema.
Dependent on using namespaces, the link will be either
<book isbn="0836217462" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:library.xsd">
Or, as below (noting the syntax, with a URI for the namespace and the URI of the schema separated by a whitespace in the same attribute)
<book isbn="0836217462" xmlns="http://example.org/ns/books/" xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance" xsi:schemaLocation="http://example.org/ns/books/ file:library.xsd">
The other use of xsi
attributes is to provide information about how an element
corresponds to a schema. These attributes are xsi:type
, which lets you define
the simple or complex type of an element, and xsi:null
, which lets you specify
a null value for an element (that has to be defined as nullable="true"
in the
schema). You don't need to declare these attributes in your schema to be able to use
them in
an instance document.