Using W3C XML Schema

Using W3C XML Schema - Part 2

December 13, 2000

Advanced W3C XML Schema

This is the second part of our comphrensive tutorial and reference on W3C XML Schemas. If you have not already read the first installment of Using XML Schemas, we advise you to do so before reading this article.

Content Types

Table of Contents

•Content Types
•Constraints
•Building Usable and Reusable Schemas
•Namespaces
•W3C XML Schema and Instance Documents

In the first part of this series we examined the default content type behavior, modeled after data-oriented documents, where complex type elements are element and attribute only, and simple type elements are character data without attributes.

The W3C XML Schema Definition Language also supports the definition of empty content elements, and simple content elements (those that contain only character data) with attributes.

Empty content elements are defined using a regular xsd:complexType construct and by purposefully omitting the definition of a child element. The following construct defines an empty book element accepting an isbn attribute:


<xsd:element name="book">

 <xsd:complexType>

  <xsd:attribute name="isbn" type="isbnType"/>

 </xsd:complexType>

</xsd:element>

Simple content elements, i.e. character data elements with attributes, can be derived from simple types using xsd:simpleContent. The book element defined above can thus be extended to accept a text value using:


<xsd:element name="book">

 <xsd:complexType>

  <xsd:simpleContent>

   <xsd:extension base="xsd:string">

    <xsd:attribute name="isbn" type="isbnType"/>

   </xsd:extension>

  </xsd:simpleContent>

 </xsd:complexType>

</xsd:element>

Note the location of the attribute definition, showing that the extension is achieved through the addition of the attribute. This definition will accept the following XML element:

<book isbn="0836217462">

	Funny book by Charles M. Schulz.

	Its title (Being a Dog Is a Full-Time Job) says it all !

</book>

W3C XML Schema supports mixed content though the mixed attribute in the xsd:complexType element. Consider

<xsd:element name="book">

 <xsd:complexType mixed="true">

  <xsd:all>

   <xsd:element name="title" type="xsd:string"/>

   <xsd:element name="author" type="xsd:string"/>

  </xsd:all>

  <xsd:attribute name="isbn" type="xsd:string"/>

 </xsd:complexType>

</xsd:element>

which will validate an XML element such as

<book isbn="0836217462">

	Funny book by <author>Charles M. Schulz</author>.

	Its title (<title>Being a Dog Is a Full-Time Job</title>) says it all !

</book>

Unlike DTDs, W3C XML Schema mixed content doesn't modify the constraints on the sub-elements, which can be expressed in the same way as simple content models. While this is a significant improvement over XML 1.0 DTDs, note that the values of the character data, and its location relative to the child elements, cannot be constrained.

Constraints

Table of Contents

•Content Types
•Constraints
•Building Usable and Reusable Schemas
•Namespaces
•W3C XML Schema and Instance Documents

W3C XML Schema provides several flexible XPath-based features for describing uniqueness constraints and corresponding references constraints. The first of these, a simple uniqueness declaration, is declared with the xsd:unique element. The following declaration, within the context of our book document, indicates that the character name must be unique.


<xsd:unique name="charNameMustBeUnique"> 

  <xsd:selector xpath="character"/>

  <xsd:field xpath="name"/>

</xsd:unique>

This location of the xsd:unique element in the schema gives the context node in which the constraint holds. By inserting xsd:unique under our book element, we specify that the character has to be unique in the context of a book only.

The two XPaths defined in the uniqueness constraint are evaluated relative to the context node. The first of these paths is defined by the selector element. The purpose is to define the element which has the uniqueness constraint -- the node to which the selector points must be an element node.

The second path, specified in the xsd:field element. is evaluated relative to the element identified by the xsd:selector and can be an element or an attribute node. This is the node whose value will be checked for uniqueness. Uniqueness over a combination of several values can be specified by adding other xsd:field elements within xsd:unique.

Keys

The second constraint construct, xsd:key, is similar to xsd:unique, except that the value specified as unique can be used as a key. This means that it has to be non-null, and that it can be referenced. To use the character name as a key, we can replace the xsd:unique by xsd:key.


<xsd:key name="charNameIsKey">

  <xsd:selector xpath="character"/>

  <xsd:field xpath="name"/>

</xsd:key>

The third construct, xsd:keyref, allows us to define a reference to a key. To show its usage, we introduce the friend-of element, to be used against characters.

<character>

  <name>Snoopy</name>

  <friend-of>Peppermint Patty</friend-of>

  <since>1950-10-04</since>

  <qualification>

    extroverted beagle

  </qualification>

 </character>

To indicate that friend-of needs to refer to a character from the same book, we write, at the same level as we defined our key constraint, the following:


<xsd:keyref name="friendOfIsCharRef" refer="charNameIsKey">

  <xsd:selector xpath="character"/>

  <xsd:field xpath="friend-of"/>

</xsd:keyref>

These capabilities are nearly independent of the other features in a schema. They are disconnected from the definition of the datatypes. The only point anchoring them to the schema is the place where they are defined, which establishes the scope of the uniqueness constraints.

Building Usable -- and Reusable -- Schemas

Table of Contents

•Content Types
•Constraints
•Building Usable and Reusable Schemas
•Namespaces
•W3C XML Schema and Instance Documents

Perhaps the first step in writing reusable schemas is to document them. W3C XML Schema provides an alternative to XML comments and processing instructions that might be easier to handle for supporting tools.

Human readable documentation can be defined by xsd:documentation elements, while information targeted at applications should be included in xsd:appinfo elements. Both elements must be included in an xsd:annotation element. They accept optional xml:lang and source attributes. The source attribute is a URI reference that can be used to indicate the purpose of the appinfo to the processing application.

The xsd:annotation elements can be added at the beginning of most schema constructs as shown in example below. The appinfo section demonstrates how custom namespaces and schemes might allow the binding of an element to a Java class from within the schema.

<xsd:element name="book">

  <xsd:annotation>

    <xsd:documentation xml:lang="en">

      Top level element.

    </xsd:documentation>

    <xsd:documentation xml:lang="fr">

      Element racine.

    </xsd:documentation>

    <xsd:appinfo source="http://example.com/foo/">

      <bind xmlns="http://example.com/bar/">

        <class name="Book"/>

      </bind>

    </xsd:appinfo>

  </xsd:annotation>

  ...

Composing schemas from multiple files

For those who want to define a schema using several XML documents -- either to split up a large schema or to use libraries of schema snippets -- W3C XML Schema provides two mechanisms for including external schemas.

The first, xsd:include, is similar to a copy and paste of the definitions of the included schema: it's an inclusion, and as such it doesn't allow any overriding of definitions of the included schema. It can be used in this way:

<xsd:include schemaLocation="character.xsd"/>

The second inclusion mechanism, xsd:redefine, is similar to xsd:include, except that it lets you redefine the declarations from the included schema.

<xsd:redefine schemaLocation="character12.xsd">

<xsd:simpleType name="nameType">

 <xsd:restriction base="xsd:string">

  <xsd:maxLength value="40"/>

 </xsd:restriction>

</xsd:simpleType>

</xsd:redefine>

Note that the declarations that are redefined must be placed in the xsd:redefine element.

We've already seen many features that can be used together with xsd:include and xsd:redefine to create libraries of schemas. We've seen how we can reference previously defined elements; how we can define datatypes by derivation and use them; and how we can define and use groups of attributes. We've also seen the parallel between elements and objects and datatypes and classes. There are other features borrowed from object oriented design that can be used to create reusable schemas.

Abstract types

The first feature derived from object oriented design is the substitution group. Unlike the features we've seen so far, a substitution group isn't defined explicitly through a W3C XML Schema element but through referencing a common element (called the head), using a substitutionGroup attribute. The head element doesn't hold any specific declaration but must be global. All the elements within a substitution group need to have a type that is either the same type as the head element, or can be derived from it. Then they can all be used in place of the head element. In the following example the element "surname" can be used anywhere an element "name" has been defined.

<xsd:element name="name" type="xsd:string"/>

<xsd:element name="surname" type="xsd:string" substitutionGroup="name" />

Now we can also define a generic "name-elt" element, head of a substitution group, that couldn't be used directly but should be used in one of its derived forms. This is done through declaring the element as abstract, analagously to abstract classes in object oriented languages. The following example defines name-elt as an abstract element that should be replaced by either name or surname everywhere it is referenced.

<xsd:element name="name-elt" type="xsd:string" abstract="true"/>

<xsd:element name="name" type="xsd:string" substitutionGroup="name-elt" />

<xsd:element name="surname" type="xsd:string" substitutionGroup="name-elt" />

Final types

We could, on the other hand, wish to control derivation performed on a datatype. W3C XML Schema supports this though the final attribute in an xsd:complexType or xsd:element element. This attribute can take the values restriction, extension and #all to block derivation by restriction, extension or any derivation. The following snippet would, for instance, forbid any derivation of the characterType complex type.

<xsd:complexType name="characterType" final="#all">

The final attribute can operate only on elements and complex types. W3C XML Schema provides a fine-grained mechanism that operates on each facet to control the derivation of simple types. This attribute is called fixed, and when its value is set to true, the facet cannot be further modified (but other facets can still be added or modified). The following prevents the size of our nameType simple type from being redefined.

<xsd:simpleType name="nameType">

 <xsd:restriction base="xsd:string">

  <xsd:maxLength value="32" fixed="true"/>

 </xsd:restriction>

</xsd:simpleType>

Namespaces

Table of Contents

•Content Types
•Constraints
•Building Usable and Reusable Schemas
•Namespaces
•W3C XML Schema and Instance Documents

Namespace support in W3C XML Schema is flexible yet straightforward. It not only allows the use of any prefix in instance documents (unlike DTDs), but also lets you open your schemas to accept unknown elements and attributes from known or unknown namespaces.

Each W3C XML Schema document is bound to a specific namespace through the targetNamespace attribute or to the absence of namespace through the lack of such an attribute. We need at least one schema document per namespace we want to define (elements and attributes without namespaces can be defined in any schema, though).

Until now we have omitted the targetNamespace attribute, which means that we were working without namespaces. To get into namespaces, let's imagine that our example belongs to a single namespace.

<book isbn="0836217462" xmlns="http://example.org/ns/books/">

The least intrusive way to adapt our schema is to add more attributes to our xsd:schema element.

<xsd:schema

     xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"

	 xmlns="http://example.org/ns/books/"

	 targetNamespace="http://example.org/ns/books/"

	 elementFormDefault="qualified"

	 attributeFormDefault="unqualified" >

The namespace declarations play an important role. The first (xmlns:xsd="http://www.w3.org/2000/10/XMLSchema") says not only that we've chosen to use the prefix xsd to identify the elements that will be W3C XML Schema instructions, but also that we will prefix the W3C XML Schema predefined datatypes with xsd, as we have done in all our examples thus far. Understand that we could have chosen any prefix instead of xsd. We could even make http://www.w3.org/2000/10/XMLSchema our default namespace. In this case, we would not have prefixed the W3C XML Schema elements.

Since we are working with the http://example.org/ns/books/ namespace, we define it as our default namespace. This means that we won't prefix the references to objects (datatypes, elements, attributes, etc.) belonging to this namespace. Again we could have chosen any prefix to identify this namespace.

The targetNamespace attribute lets you define, independently of the namespace declarations, which namespace is described in this schema. If you need to reference objects belonging to this namespace, which is usually the case except when using a pure Russian Doll design, you need to provide a namespace declaration in addition to the targetNamespace.

The final two attributes in the example, (elementFormDefault and attributeFormDefault), are a facility provided by W3C XML Schema to control, within a single schema, whether attributes and elements are considered by default to be qualified (in a namespace). This differentiation between qualified and unqualified can be indicated by specifying the default values, as above, but also when defining the element or attribute, by adding a form attribute of value qualified or unqualified.

It is important to note that only local elements and attributes can be specified as unqualified. All globally defined elements and attributes must always be qualified.

Importing definitions from external namespaces

W3C XML Schema, not unlike XSLT and XPath, uses namespace prefixes within the value of some attributes to identify the namespace of data types, elements, attributes, etc. For instance, we've used this feature all along in our examples to identify the W3C XML Schema predefined datatypes. This mechanism can be extended to import definitions from any other namespace and so reuse them in our schemas.

Reusing definitions from other namespaces is done through a three-step process. This process needs to be done even for the XML 1.0 namespace in order to declare attributes such as xml:lang. First, the namespace must be defined as usual.

<xsd:schema

  xmlns:xsd="http://www.w3.org/2000/10/XMLSchema"

  targetNamespace="http://example.org/ns/books/"

  xmlns:xml="http://www.w3.org/XML/1998/namespace"

  elementFormDefault="qualified" >

Then W3C XML Schema needs to be informed of the location at which it can find the schema corresponding to the namespace. This is done using an xsd:import element.

<xsd:import namespace="http://www.w3.org/XML/1998/namespace" 

  schemaLocation="myxml.xsd"/>

W3C XML Schema now knows that it should attempt to find any reference belonging to the XML namespace in a schema located at myxml.xsd. We can now use the external definition.


<xsd:element name="title">

  <xsd:complexType>

    <xsd:simpleContent>

      <xsd:extension base="xsd:string">

        <xsd:attribute ref="xml:lang"/>

      </xsd:extension>

    </xsd:simpleContent>

  </xsd:complexType>

</xsd:element>

You may wonder why we've chosen to reference the xml:lang attribute from the XML namespace rather than creating an attribute with a type xml:lang. We've done so because there is an important difference between referencing an attribute (or an element) and referencing a datatype when namespaces are concerned.

Referencing an element or an attribute imports the whole thing with its name and namespace.
Referencing a datatype imports only its definition, leaving you with the task of giving a name to the element or attribute you're defining, and places your definition in the target namespace (or no namespace if your attribute or element is unqualified).

Including unknown elements

To finish this section about namespaces, we need to see how, as promised in the introduction, we can open our schema to unknown elements, attributes and namespaces. This is done using xsd:any and xsd:anyAttribute, allowing, respectively, the inclusion of any element or attribute.

For instance, if we want to extend the definition of our description type to any XHTML tag, we could declare

<xsd:complexType name="descType" mixed="true">

  <xsd:sequence>

    <xsd:any namespace="http://www.w3.org/1999/xhtml"

      minOccurs="0" maxOccurs="unbounded"

      processContents="skip"/>

  </xsd:sequence>

</xsd:complexType>

The xsd:anyAttribute gives the same functionality for attribute definitions.

The type descType is now mixed content and accepts an unbounded number of any elements from the http://www.w3.org/1999/xhtml namespace. The processContents attribute is set to skip, telling a W3C XML Schema processor that no validation of these elements should be attempted. The other permissible values for this attribute are strict, asking to validate these elements, or lax, asking the processor to validate them when possible. The namespace attribute accepts a whitespace-separated list of URIs, as well as the special values ##any (any namespace), ##local (non-qualified elements), ##targetNamespace (the target namespace) or ##other (any namespace other than the target).

W3C XML Schema and Instance Documents

Table of Contents

•Content Types
•Constraints
•Building Usable and Reusable Schemas
•Namespaces
•W3C XML Schema and Instance Documents

We've now covered most of the features of W3C XML Schema, but we still need to have a glance at some extensions that you can use within your instance documents. In order to differentiate these other features, a separate namespace, http://www.w3.org/2000/10/XMLSchema-instance, is used, usually associated with the prefix xsi.

The xsi:schemaLocation and xsi:noNamespaceSchemaLocation attributes allow you to tie a document to its W3C XML Schema. This link is not mandatory, and other indications can be given using application-dependent mechanisms (such as a parameter on a command line), but it does help W3C XML Schema aware tools to locate a schema.

Dependent on using namespaces, the link will be either

<book isbn="0836217462"

	xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"

	xsi:noNamespaceSchemaLocation="file:library.xsd">

Or, as below (noting the syntax, with a URI for the namespace and the URI of the schema separated by a whitespace in the same attribute)

<book isbn="0836217462" xmlns="http://example.org/ns/books/"

	xmlns:xsi="http://www.w3.org/2000/10/XMLSchema-instance"

	xsi:schemaLocation="http://example.org/ns/books/ file:library.xsd">

The other use of xsi attributes is to provide information about how an element corresponds to a schema. These attributes are xsi:type, which lets you define the simple or complex type of an element, and xsi:null, which lets you specify a null value for an element (that has to be defined as nullable="true" in the schema). You don't need to declare these attributes in your schema to be able to use them in an instance document.