Using W3C XML Schema
October 17, 2001
The W3C XML Schema Definition Language is an XML language for describing and constraining the content of XML documents. W3C XML Schema is a W3C Recommendation.
This article is an introduction to using W3C XML Schemas, and also includes a comprehensive reference to the Schema datatypes and structures.
(Editor's note: this tutorial has been updated since its first publication in 2000, to reflect the finalization of W3C XML Schema as a Recommendation.)
Introducing our First Schema
Let's start by having a look at this simple document which describes a book:
<?xml version="1.0" encoding="UTF-8"?> <book isbn="0836217462"> <title> Being a Dog Is a Full-Time Job </title> <author>Charles M. Schulz</author> <character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character> <character> <name>Peppermint Patty</name> <since>1966-08-22</since> <qualification>bold, brash and tomboyish</qualification> </character> </book>
Get a copy of library1.xml for reference.
To write a schema for this document, we could simply follow its structure and define
each
element as we find it. To start, we open a xs:schema
element:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> .../... </xs:schema>
The schema
element opens our schema. It can also hold the definition of the
target namespace and several default options, of which we will see some of them in
the
following sections.
To match the start tag for the book
element, we define an element named
book
. This element has attributes and non text children, thus we consider it
as a complexType
(since the other datatype, simpleType
is reserved
for datatypes holding only values and no element or attribute sub-nodes. The list
of
children of the book element is described by a sequence
element:
<xs:element name="book"> <xs:complexType> <xs:sequence> .../... </xs:sequence> .../... </xs:complexType> </xs:element>
The sequence
is a "compositor" that defines an ordered sequence of
sub-elements. We will see the two other compositors, choice
and
all
in the following sections.
Now we can define the title and author elements as simple types -- they don't have
attributes or non-text children and can be described directly within a degenerate
element
element. The type (xs:string
) is prefixed by the
namespace prefix associated with XML Schema, indicating a predefined XML Schema
datatype:
<xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/>
Now, we must deal with the character
element, a complex type. Note how its
cardinality is defined:
<xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> .../... </xs:sequence> </xs:complexType> </xs:element>
Unlike other schema definition languages, W3C XML Schema lets us define the cardinality
of
an element (i.e. the number of its possible occurrences) with some precision. We can
specify
both minOccurs
(the minimum number of occurences) and maxOccurs
(the maximum number of occurrences). Here maxOccurs
is set to
unbounded
which means that there can be as many occurences of the character
element as the author wishes. Both attributes have a default value of one.
We specify then the list of all its children in the same way:
<xs:element name="name" type="xs:string"/> <xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="since" type="xs:date"/> <xs:element name="qualification" type="xs:string"/>
And we terminate its description by closing the complexType
,
element
and sequence
elements.
We can now declare the attributes of the document elements, which must always come last. There appears to be no special reason for this, but the W3C XML Schema Working Group has considered that it was simpler to impose a relative order to the definitions of the list of elements and attributes within a complex type, and that it was more natural to define the attributes after the elements.
<xs:attribute name="isbn" type="xs:string"/>
And close all the remaining elements.
That's it! This first design, sometimes known as "Russian Doll Design" tightly follows the structure of our example document.
One of the key features of such a design is to define each element and attribute within its context and to allow multiple occurrences of a same element name to carry different definitions.
Complete listing of this first example:
<?xml version="1.0" encoding="utf-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="character" minOccurs="0" maxOccurs="unbounded"> <xs:complexType> <xs:sequence> <xs:element name="name" type="xs:string"/> <xs:element name="friend-of" type="xs:string" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="since" type="xs:date"/> <xs:element name="qualification" type="xs:string"/> </xs:sequence> </xs:complexType> </xs:element> </xs:sequence> <xs:attribute name="isbn" type="xs:string"/> </xs:complexType> </xs:element> </xs:schema>
Download this schema: library1.xsd
The next section explores how to subdivide schema designs to make them more readable and maintainable.
Slicing the Schema
While the previous design method is very simple, it can lead to a depth in the embedded definitions, making it hardly readable and difficult to maintain when documents are complex. It also has the drawback of being very different from a DTD structure, an obstacle for human or machine agents wishing to transform DTDs into XML Schemas, or even just use the same design guides for both technologies.
The second design is based on a flat catalog of all the elements available in the instance document and, for each of them, lists of child elements and attributes. This effect is achieved through using references to element and attribute definitions that need to be within the scope of the referencer, leading to a flat design:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- definition of simple type elements --> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="name" type="xs:string"/> <xs:element name="friend-of" type="xs:string"/> <xs:element name="since" type="xs:date"/> <xs:element name="qualification" type="xs:string"/> <!-- definition of attributes --> <xs:attribute name="isbn" type="xs:string"/> <!-- definition of complex type elements --> <xs:element name="character"> <xs:complexType> <xs:sequence> <xs:element ref="name"/> <xs:element ref="friend-of" minOccurs="0" maxOccurs="unbounded"/> <xs:element ref="since"/> <xs:element ref="qualification"/> <!-- the simple type elements are referenced using the "ref" attribute --> <!-- the definition of the cardinality is done when the elements are referenced --> </xs:sequence> </xs:complexType> </xs:element> <xs:element name="book"> <xs:complexType> <xs:sequence> <xs:element ref="title"/> <xs:element ref="author"/> <xs:element ref="character" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attribute ref="isbn"/> </xs:complexType> </xs:element> </xs:schema>
Download this schema: library2.xsd
Using a reference to an element or an attribute is somewhat comparable to cloning an object. The element or attribute is defined first, and it can be duplicated at another place in the document structure by the reference mechanism, in the same way an object can be cloned. The two elements (or attributes) are then two instances of the same class.
The next section shows how we can define such classes, called "types," that enables us to re-use element definitions.
Defining Named Types
We have seen that we can define elements and attributes as we need them (Russian doll design), or create them first and reference them (flat catalog). W3C XML Schema gives us a third mechanism, which is to define data types (either simple types that will be used for PCDATA elements or attributes or complex types that will be used only for elements) and to use these types to define our attributes and elements.
This is achieved by giving a name to the simpleType
and
complexType
elements, and locating them outside of the definition of elements
or attributes. We will also take the opportunity to show how we can derive a datatype
from
another one by defining a restriction over the values of this datatype.
For instance, to define a datatype named nameType
, which is a string with a
maximum of 32 characters, we will write:
<xs:simpleType name="nameType"> <xs:restriction base="xs:string"> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType>
The simpleType
element holds the name of the new datatype. The
restriction
element expresses the fact that the datatype is derived from the
string
datatype of the W3C XML Schema namespace (attribute base
)
by applying a restriction, i.e. by limiting the number of possible values. The
maxLength
element that, called a facet, says that this restriction is a
condition on the maximum length to be 32 characters.
Another powerful facet is the pattern
element, which defines a regular
expression that must be matched. For instance, if we do not care about the "-
"
signs, we can define an ISBN datatype as 10 digits thus:
<xs:simpleType name="isbnType"> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> </xs:restriction> </xs:simpleType>
Facets, and the two other ways to derive a datatype (list and union) are covered in the next sections.
Complex types are defined as we've seen before, but given a name.
Full listing:
<?xml version="1.0" encoding="UTF-8"?> <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema"> <!-- definition of simple types --> <xs:simpleType name="nameType"> <xs:restriction base="xs:string"> <xs:maxLength value="32"/> </xs:restriction> </xs:simpleType> <xs:simpleType name="sinceType"> <xs:restriction base="xs:date"/> </xs:simpleType> <xs:simpleType name="descType"> <xs:restriction base="xs:string"/> </xs:simpleType> <xs:simpleType name="isbnType"> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> </xs:restriction> </xs:simpleType> <!-- definition of complex types --> <xs:complexType name="characterType"> <xs:sequence> <xs:element name="name" type="nameType"/> <xs:element name="friend-of" type="nameType" minOccurs="0" maxOccurs="unbounded"/> <xs:element name="since" type="sinceType"/> <xs:element name="qualification" type="descType"/> </xs:sequence> </xs:complexType> <xs:complexType name="bookType"> <xs:sequence> <xs:element name="title" type="nameType"/> <xs:element name="author" type="nameType"/> <xs:element name="character" type="characterType" minOccurs="0"/> <!-- the definition of the "character" element is using the "characterType" complex type --> </xs:sequence> <xs:attribute name="isbn" type="isbnType" use="required"/> </xs:complexType> <!-- Reference to "bookType" to define the "book" element --> <xs:element name="book" type="bookType"/> </xs:schema>
Download this schema: library3.xsd
The next page shows how grouping, compositors and derivation can be used to further promote re-use and structure in schemas.
Groups, Compositors and Derivation
Groups
W3C XML Schema also allows the definition of groups of elements and attributes.
<!-- definition of an element group --> <xs:group name="mainBookElements"> <xs:sequence> <xs:element name="title" type="nameType"/> <xs:element name="author" type="nameType"/> </xs:sequence> </xs:group> <!-- definition of an attribute group --> <xs:attributeGroup name="bookAttributes"> <xs:attribute name="isbn" type="isbnType" use="required"/> <xs:attribute name="available" type="xs:string"/> </xs:attributeGroup>
These groups can be used in the definition of complex types, as shown below.
<xs:complexType name="bookType"> <xs:sequence> <xs:group ref="mainBookElements"/> <xs:element name="character" type="characterType" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> <xs:attributeGroup ref="bookAttributes"/> </xs:complexType>
These groups are not datatypes but containers holding a set of elements or attributes that can be used to describe complex types.
Compositors
So far, we have seen the xs:sequence
compositor which defines ordered groups
of elements (in fact, it defines ordered group of particles, which can also be groups
or other compositors). W3C XML Schema supports two additional compositors that can
be mixed
to allow various combinations. Each of these compositors can have minOccurs
and
maxOccurs
attributes to define their cardinality.
The xs:choice
compositor describes a choice between several possible elements
or groups of elements. The following group --compositors can appear within groups,
complex
types or other compositors-- will accept either a single name
element or a
sequence of firstName, an optional middleName and a lastName:
<xs:group name="nameTypes"> <xs:choice> <xs:element name="name" type="xs:string"/> <xs:sequence> <xs:element name="firstName" type="xs:string"/> <xs:element name="middleName" type="xs:string" minOccurs="0"/> <xs:element name="lastName" type="xs:string"/> </xs:sequence> </xs:choice> </xs:group>
The xs:all
compositor defines an unordered set of elements. The following
complex type definition allows its contained elements to appear in any order:
<xs:complexType name="bookType"> <xs:all> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> <xs:element name="character" type="characterType" minOccurs="0" maxOccurs="unbounded"/> </xs:all> <xs:attribute name="isbn" type="isbnType" use="required"/> </xs:complexType>
In order to avoid combinations that could become ambiguous or too complex to be solved
by
W3C XML Schema tools, a set of restrictions has been added to the xs:all
particle:
- they can appear only as a unique child at the top of a content model
- and their children can be only xs:element definitions or references and cannot have a cardinality greater than one.
Derivation of simple types
Simple datatypes are defined by derivation of other datatypes, either predefined and identified by the W3C XML Schema namespace or defined elsewhere in your schema.
We have already seen examples of simple types derived by restriction (using
xs:restriction
elements). The different kind of restrictions that can be
applied on a datatype are called facets. Beyond the xs:pattern
(using a regular
expression syntax) and xs:maxLength
facets shown already, many facets allow
constraints on the length of a value, an enumeration of the possible values, the minimal
and
maximal values, its precision and scale, etc.
Two other derivation methods are available that allow to define white space separated
lists
and union of datatypes. The following definition uses xs:union
extends the
definition of our type for isbn to accept the values TDB and NA:
<xs:simpleType name="isbnType"> <xs:union> <xs:simpleType> <xs:restriction base="xs:string"> <xs:pattern value="[0-9]{10}"/> </xs:restriction> </xs:simpleType> <xs:simpleType> <xs:restriction base="xs:NMTOKEN"> <xs:enumeration value="TBD"/> <xs:enumeration value="NA"/> </xs:restriction> </xs:simpleType> </xs:union> </xs:simpleType>
The union
has been applied on the two embedded simple types to allow values
from both datatypes, our new datatype will now accept the values from an enumeration
with
two possible values (TBD and NA).
The following example type (isbnTypes) uses xs:list
to define a
whitespace-separated list of ISBN values. It also derives a type (isbnTypes10) using
xs:restriction
that accept between 1 and 10 values, separated by a
whitespace:
<xs:simpleType name="isbnTypes"> <xs:list itemType="isbnType"/> </xs:simpleType> <xs:simpleType name="isbnTypes10"> <xs:restriction base="isbnTypes"> <xs:minLength value="1"/> <xs:maxLength value="10"/> </xs:restriction> </xs:simpleType>
Content Types
In the first part of this article, we examined the default content type behavior, modeled after data-oriented documents, where complex type elements are element and attribute only and simple type elements are character data without attributes.
The W3C XML Schema Definition Language also supports defining empty content elements and simple content (those that contain only character data) with attributes.
Empty content elements are defined using a regular xs:complexType
construct
and purposefully omitting to define a child element. The following construct defines
an
empty book element accepting an isbn attribute:
<xs:element name="book"> <xs:complexType> <xs:attribute name="isbn" type="isbnType"/> </xs:complexType> </xs:element>
Simple
content elements, i.e. character data elements with attributes, can be derived from
simple
types using xs:simpleContent
. The book element defined above can thus be
extended to accept a text value by:
<xs:element name="book"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute name="isbn" type="isbnType"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
Note the location of the attribute definition, showing that the extension is done through the addition of the attribute. This definition will accept the following XML element:
<book isbn="0836217462"> Funny book by Charles M. Schulz. Its title (Being a Dog Is a Full-Time Job) says it all ! </book>
W3C XML Schema supports mixed content though the mixed
attribute in the
xs:complexType
elements. Consider
<xs:element name="book"> <xs:complexType mixed="true"> <xs:all> <xs:element name="title" type="xs:string"/> <xs:element name="author" type="xs:string"/> </xs:all> <xs:attribute name="isbn" type="xs:string"/> </xs:complexType> </xs:element>
which will validate an XML element such as:
<book isbn="0836217462"> Funny book by <author>Charles M. Schulz</author>. Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all ! </book>
Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the sub-elements, which can be expressed in the same way as simple content models. While this is a significant improvement over XML 1.0 DTDs, note that the values of the character data and its location relative to the child elements, cannot be constrained.
Constraints
Unique
W3C XML Schema provides several flexible XPath-based features for describing uniqueness
constraints and corresponding references constraints. The first of these, a simple
uniqueness declaration, is declared with the xs:unique
element. The following
declaration done under the declaration of our book element indicates that the character
name
must be unique:
<xs:unique name="charName"> <xs:selector xpath="character"/> <xs:field xpath="name"/> </xs:unique>
This location of the xs:unique
element in the schema gives the context node in
which the constraint holds. By inserting xs:unique
under our book element, we
specify that the character has to be unique within the context of this book only.
The two XPaths defined in the uniqueness constraint are evaluated relative to the
context
node. The first of these paths is defined by the selector
element. The purpose
is to define the element which has the uniqueness constraint -- the node to which
the
selector points must be an element node.
The second path, specified in the xs:field
element is evaluated relative to
the element identified by the xs:selector
, and can be an element or an
attribute node. This is the node whose value will be checked for uniqueness. Combinations
of
values can be specified by adding other xs:field
elements within
xs:unique
.
Keys
The second construct, xs:key
, is similar to xs:unique
except that
the value has to be non null (note that xs:unique
and xs:key
can
both be referenced). To use the character name as a key, we can just replace the
xs:unique
by xs:key
:
<xs:key name="charName"> <xs:selector xpath="character"/> <xs:field xpath="name"/> </xs:key>
Keyref
The third construct, xs:keyref
, allows us to define a reference to a
xs:key
or a xs:unique
. To show its usage, we will introduce the
friend-of
element, to be used against characters:
<character> <name>Snoopy</name> <friend-of>Peppermint Patty</friend-of> <since>1950-10-04</since> <qualification> extroverted beagle </qualification> </character>
To indicate that friend-of
needs to refer to a character from this same book,
we will write, at the same level as we defined our key constraint, the following:
<xs:keyref name="charNameRef" refer="charName"> <xs:selector xpath="character"/> <xs:field xpath="friend-of"/> </xs:keyref>
These capabilities are almost independent of the other features in a schema. They are disconnected from the definition of the datatypes. The only point anchoring them to the schema is the place where they are defined, which establishes the scope of the uniqueness constraints.
Building Usable -- and Reusable -- Schemas
Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides an alternative to XML comments (for humans) and processing instructions (for machines) that might be easier to handle for supporting tools.
Human readable documentation can be defined by xs:documentation
elements,
while information targeted to applications should be included in xs:appinfo
elements. Both elements need to be included in an xs:annotation
element and
accept optional xml:lang
and source attributes and any content type. The source
attribute is a URI reference that can be used to indicate the purpose of the comment
documentation or application information.
The xs:annotation
elements can be added at the beginning of most schema
constructions, as shown in the example below. The appinfo section demonstrates how
custome
namespaces and schemes might allow the binding of an element to a Java class from
within the
schema.
<xs:element name="book"> <xs:annotation> <xs:documentation xml:lang="en"> Top level element. </xs:documentation> <xs:documentation xml:lang="fr"> Element racine. </xs:documentation> <xs:appinfo source="http://example.com/foo/"> <bind xmlns="http://example.com/bar/"> <class name="Book"/> </bind> </xs:appinfo> </xs:annotation>
Composing schemas from multiple files
For those who want to define a schema using several XML documents -- either to split a large schema, or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for including external schemas.
The first one, xs:include
, is similar to a copy and paste of the definitions
of the included schema: it's an inclusion and as such it doesn't allow to override
the
definitions of the included schema. It can be used this way:
<xs:include schemaLocation="character.xsd"/>
The second inclusion mechanism, xs:redefine
, is similar to
xs:include
, except that it lets you redefine declarations from the included
schema.
<xs:redefine schemaLocation="character12.xsd"> <xs:simpleType name="nameType"> <xs:restriction base="xs:string"> <xs:maxLength value="40"/> </xs:restriction> </xs:simpleType> </xs:redefine>
Note that the declarations that are redefined must be placed in the
xs:redefine
element.
We've already seen many features that can be used together with xs:include
and
xs:redefine
to create libraries of schemas. We've seen how we can reference
previously defined elements, how we can define datatypes by derivation and use them,
how we
can define and use groups of attributes. We've also seen the parallel between elements
and
objects, and datatypes and classes. There are other features borrowed from object
oriented
designs that can be used to create reusable schemas.
Abstract types
The first of these features derived from object oriented design is the substitution
group.
Unlike the features we've seen so far, a substitution group is not defined explicitly
through a W3C XML Schema element, but through referencing a common element (called
the
head) using a substitutionGroup
attribute.
The head element doesn't hold any specific declaration but must be global. All the
elements
within a substitution group need to have a type that is either the same type as the
head
element or can be derived from it. Then they can all be used in place of the head
element.
In the following example, the element surname
can be used anywhere an element
name
has been defined.
<xs:element name="name" type="xs:string"/> <xs:element name="surname" type="xs:string" substitutionGroup="name" />
Now, we can also define a generic name-elt
element, head of a substitution
group, that shouldn't be used directly, but in one of its derived forms. This is done
through declaring the element as abstract, analogous to abstract classes in object
oriented
languages. The following example defines name-elt
as an abstract element that
should be replaced either by name
or surname
everywhere it is
referenced.
<xs:element name="name-elt" type="xs:string" abstract="true"/> <xs:element name="name" type="xs:string" substitutionGroup="name-elt"/> <xs:element name="surname" type="xs:string" substitutionGroup="name-elt"/>
Final types
We could, on the other hand, wish to control derivation performed on a datatype. W3C
XML
Schema supports this through the final
attribute in a
xs:complexType
, xs:simpleType
or xs:element
element. This attribute can take the values restriction
, extension
and #all
to block derivation by restriction, extension or any derivation. The
following snippet would, for instance, forbid any derivation of the
characterType
complex type.
<xs:complexType name="characterType" final="#all"> <xs:sequence> <xs:element name="name" type="nameType"/> <xs:element name="since" type="sinceType"/> <xs:element name="qualification" type="descType"/> </xs:sequence> </xs:complexType>
In addition to final
, a more fine-grained mechanism is provided to control the
derivation of simple types that operate on each facet. Here, the attribute is called
fixed
, and when its value is set to true
, the facet cannot be
further modified (but other facets can still be added or modified). The following
example
prevents the size of our nameType simple type to be redefined:
<xs:simpleType name="nameType"> <xs:restriction base="xs:string"> <xs:maxLength value="32" fixed="true"/> </xs:restriction> </xs:simpleType>
Namespaces
Namespaces support in W3C XML Schema is flexible yet straightforward. It not only allows the use of any prefix in instance documents (unlike DTDs) but also lets you open your schemas to accept unknown elements and attributes from known or unknown namespaces.
Each W3C XML Schema document is bound to a specific namespace through the
targetNamespace
attribute, or to the absence of namespace through the lack of
such an attribute. We need at least one schema document per namespace we want to define
(elements and attributes without namespaces can be defined in any schema, though).
Until now we have omitted the targetNamespac
attribute, which means that we
were working without namespaces. To get into namespaces, let's first imagine that
our
example belongs to a single namespace:
<book isbn="0836217462" xmlns="http://example.org/ns/books/"> .../... </book>
The least intrusive way to adapt our schema is to add some more attributes to our
xs:schema
element.
<xs:schema targetNamespace="http://example.org/ns/books/" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:bk="http://example.org/ns/books/" elementFormDefault="qualified" attributeFormDefault="unqualified"> .../... </xs:schema>
The namespace declarations play an important role. The first one
(xmlns:xs="http://www.w3.org/2001/XMLSchema"
) says not only that we've chosen
to use the prefix xs
to identify the elements that will be W3C XML Schema
instructions, but also that we will prefix the W3C XML Schema predefined datatypes
with
xs
as we have done all over the examples thus far. Understand that we could
have chosen any prefix instead of xs
. We could even make
http://www.w3.org/2001/XMLSchema our default namespace and in this case, we
wouldn't have prefixed the W3C XML Schema elements nor its datatypes.
Since we are working with the http://example.org/ns/books/ namespace, we define it
(with a bk
prefix). This means that we will now prefix the references to
"objects" (datatypes, elements, attributes, ...) belonging to this namespace with
bk:
. Again, we could have chosen any prefix to identify this namespace or
even have made it our default namespaces (note that the XPath expressions used in
xs:unique
, xs:key
and xs:keyref
do not use a
default namespace, though).
The targetNamespace
attribute lets you define, independently of the namespace
declarations, which namespace is described in this schema. If you need to reference
objects
belonging to this namespace, which is usually the case except when using a pure "Russian
doll" design, you need to provide a namespace declaration in addition to the
targetNamespace
.
The final two attributes (elementFormDefault
and
attributeFormDefault
) are a facility provided by W3C XML Schema to control,
within a single schema, whether attributes and elements are considered by default
to be
qualified (in a namespace). This differentiation between qualified and unqualified
can be
indicated by specifying the default values, as above, but also when defining the elements
and attributes, by adding a form
attribute of value qualified
or
unqualified
.
It is important to note that only local elements and attributes can be specified as unqualified. All globally defined elements and attributes must always be qualified.
Importing definitions from external namespaces
W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some attributes to identify the namespace of data types, elements, attributes, atc. For instance, we've used this feature all along our examples to identify the W3C XML Schema predefined datatypes. This mechanism can be extended to import definitions from any other namespace and so reuse them in our schemas.
Reusing definitions from other namespaces is done through a three-step process. This
process needs to be done even for the XML 1.0 namespace, in order to declare attributes
such
as xml:lang
. First, the namespace must be defined as usual.
<xs:schema targetNamespace="http://example.org/ns/books/" xmlns:xml="http://www.w3.org/XML/1998/namespace" xmlns:bk="http://example.org/ns/books/" xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified" attributeFormDefault="qualified"> .../... </xs:schema>
Then W3C XML Schema needs to be informed of the location at which it can find the
schema
corresponding to the namespace. This is done using an xs:import
element.
<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="myxml.xsd"/>
W3C XML Schema now knows that it should attempt to find any reference belonging to the XML namespace in a schema located at myxml.xsd. We can now use the external definition.
<xs:element name="title"> <xs:complexType> <xs:simpleContent> <xs:extension base="xs:string"> <xs:attribute ref="xml:lang"/> </xs:extension> </xs:simpleContent> </xs:complexType> </xs:element>
You may wonder why we have chosen to reference the xml:lang
attribute from the
XML namespace, rather than creating an attribute with a type xml:lang
. We've
done so because there is an important difference between referencing an attribute
(or an
element) and referencing a datatype when namespaces are concerned:
- Referencing an element or an attribute imports the whole thing with its name and namespace,
- Referencing a datatype imports only its definition, leaving you with the task of giving a name to the element and attribute you're defining and using the target namespace (or no namespace if your attribute or element is unqualified).
Including unknown elements
To finish this section about namespaces, we need to see how, as promised in our
introduction, we can open our schema to unknown elements, attributes and namespaces.
This is
done using xs:any
and xs:anyAttribute
, allowing, respectivly, to
include any elements or attributes.
For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare:
<xs:complexType name="descType" mixed="true"> <xs:sequence> <xs:any namespace="http://www.w3.org/1999/xhtml" processContents="skip" minOccurs="0" maxOccurs="unbounded"/> </xs:sequence> </xs:complexType>
The xs:anyAttribute
gives the same functionality for attributes.
The type descType
is now mixed content and accepts an unbounded number of any
element from the http://www.w3.org/1999/xhtml namespace. The
processContents
attribute is set to skip
telling a W3C XML
Schema processor that no validation of these elements should be attempted. The other
permissible values could are strict
asking to validate these elements or
lax
asking to validate them when possible. The namespace
attribute accepts a whitespace-separated list of URIs and the special values
##local
(non qualified elements) and ##targetNamespace
(the
target namespace) that can be included in the list and ##other
(any namespace
other than the target) or ##any
(any namespace) that can replace the list. It
is not possible to specify any namespace except those from a list.
W3C XML Schema and Instance Documents
We've now covered most of the features of W3C XML Schema, but we still need to have
a
glance on some extensions that you can use within your instance documents. In order
to
differentiate these other features, a separate namespace,
http://www.w3.org/2001/XMLSchema-instance, usually associated with the prefix
xsi
.
The xsi:noNamespaceSchemaLocation
and xsi:schemaLocation
attributes allow you to tie a document to its W3C XML Schema. This link is not mandatory,
and other indications can be given at validation time, but it does help W3C XML Schema-aware
tools to locate a schema.
Dependent on using namespaces, the link will be either
<book isbn="0836217462" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="file:library.xsd">
Or, as below (noting the syntax with a URI for the namespace and the URI of the schema, separated by whitespace in the same attribute):
<book isbn="0836217462" xmlns="http://example.org/ns/books/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation= "http://example.org/ns/books/ file:library.xsd">
The other use of xsi
attributes is to provide information about how an element
corresponds to a schema.These attributes are xsi:type
, which lets you define
the simple or complex type of an element and xsi:nil
, which lets you specify a
nil (null) value for an element (that has to be defined as nillable
in the
schema using a nillable=true
attribute). You don't need to declare these
attributes in your W3C XML Schema to be able to use them in an instance document.