Enforcing Association Cardinality
June 26, 2002
If you're like me, XML document design brings out your darker side. We like a schema whose heart is stone -- a schema that's just itching to call a candidate document on the carpet for the slightest nonconformity. None of this any-element-will-do namespace validation for us. We enjoy the dirty work: schemas, we think, are best built to be aggressive in ferreting out the little mistakes in the information set that could later confuse the more sheltered and constructive-minded XML application.
In this article, we'll practice a bit of that merciless science. Specifically, we'll look at ways to control the cardinality of associations between XML elements. The basic implementation, which we'll review momentarily, of an association between two types is simple enough but is only sufficient for many-to-one relationships. What if multiple references, or zero references, to an item are unacceptable? What if a referenced item may or may not be present? These variations will require other techniques, and these are essential for the truly draconian schema author.
This article will use a simple UML notation to illustrate patterns and examples. Knowledge of both XML Schema 1.0 (Part 1 in particular) and UML is assumed, although in developing our notation we'll have a chance to review a little of both.
A Simple UML Notation
The Unified Modeling Language (UML) provides a basis for a simple notation which will serve our needs in identifying rudimentary design patterns and in illustrating specific examples. (Note that many UML-to-Schema mappings are possible; see the "XMI Production for XML Schema" specification from the OMG for one much more formal option.)
First, let's say that a single complex type is what UML calls a class, with XML
attributes and simple-type child elements represented by UML attributes -- we're
also borrowing the XPath
@
shorthand to distinguish an XML attribute from a simple-type child
element.
<xs:complexType name="Part"> <xs:sequence> <xs:element name="Name" /> <xs:element name="Price" /> </xs:sequence> <xs:attribute name="PartID" /> <xs:attribute name="InStock" /> </xs:complexType> |
Expressing Compositions: Content Models
The above example also represents the most basic sort of composition, in which all the components are of simple type. Composition of one complex type in terms of another in UML is most literally denoted as an aggregation by value; note the use of a name on the composing type and the depiction of cardinality:
<xs:complexType name="Dealership"> <xs:sequence> <xs:element name="InventoryItem" type="Part" minOccurs="0" maxOccurs="unbounded" /> <!-- Other content omitted. --> </xs:sequence> </xs:complexType> |
One brief note on composition: it is so basic to the XML information model that it's easy to overuse. It's an excellent habit to challenge the assumption that composition is the best characterization of a given relationship, and with XML Schema, as we're about to see, there's no practical reason to avoid implementing a looser association where it's appropriate to the design.
Expressing Associations: Key Relationships
For types that are related other than by composition the UML association is a useful concept. It declares that between two types there is a clear relationship, which may be named on either side or on both, and usually that there is a clear understanding of cardinality as well. The association also may be navigable in either or both directions; to navigate the association is to find an associated instance based on knowledge of an associating instance.
Although association is an OO concept, its implementation in XML Schema relies on
a classic
relational database feature, the foreign-key relationship. A key must be defined to allow unique instances of the associated type
(Salesman
above) to be identified; and the instances of the associating
type (Car
) rely on a key reference to point to referenced items. These two constraints are very close in
spirit to RDBMS primary and foreign keys, respectively.
<xs:element name="Dealership"> <xs:complexType> <xs:sequence> <!-- Referenced types Car and Salesman not shown. --> <xs:element ref="Car" minOccurs="0" maxOccurs="unbounded" /> <xs:element ref="Salesman" minOccurs="0" maxOccurs="unbounded" /> </xs:sequence> </xs:complexType> <xs:key name="SalesmanKey"> <xs:selector xpath="Salesman" /> <xs:field xpath="@SellerID" /> </xs:key> <xs:keyref name="CarToSalesman" refer="SalesmanKey"> <xs:selector xpath="Car" /> <xs:field xpath="@SoldBy" /> </xs:keyref> </xs:element> |
The precise validation rules that are realized by these two components are as follows:
- Each element of type
Salesman
must be found to be unique among its direct siblings (not in the whole document), as evaluated by the fields listed in the governing key. - Each element of type
Car
that has the fields identified in the governing key reference (it might not, depending on composition cardinality!), the values in those fields must be found to be equal to exactly one corresponding set of values in the referenced key.
The basic implementation of an association, shown above, expresses many-to-one cardinality:
multiple A to singular B. Let's speak generally now of associations "from A to B".
By nature, the xs:keyref
simply assures that a B exists wherever an A refers to
it; it guarantees that there are no dangling references, so to speak. Validation of
a
document against a key reference does not make any checks as to the number of elements
that
might reference the same item. The following sections lay out techniques for deviating
from
the default association cardinality on each side of the relationship.
Controlling Association Cardinality: Optional or Multiple Associated Type
The cardinality of the associated type is the easier of the two: this is really a
matter of
controlling the composition of referencing fields. Think of a referencing field
(such as @SoldBy
in the previous example) as a proxy for the associated object.
This proxy can be made a part of the associating element by composition, and thus
all the
rules thereof apply exactly. So, for singular cardinality on type B ("for each A there
must
be exactly one B"), the referencing fields will be required attributes or child elements;
in
fact in this case it may be the value of the associating element itself. For optional
cardinality ("for each A there may be one B"), use an optional attribute or child
element.
For multiple cardinality ("for each A there may be zero, one or many Bs"), use child
elements and an "unbounded" maximum occurrence constraint.
For example, consider a model a newspaper might use for printing movie listings:
Here, a Theatre
makes reference to various Movie
s it might be
showing. The Movies schema shows how to do
this by combining an xs:keyref
with occurrence constraints. The key reference
is in no way special, and by itself describes many-to-one cardinality, which is also
desirable for the Showing
type. See also a valid instance document.
But, wait, where's the "multiple" B type? The diagram above maps the XML schema
construction exactly, and in doing so lays out the Showing
type as what in an
RDB would be a link table, and what can be represented in UML as a link
attribute. The alternate notation below uses the link attribute to clarify the role
of Showing
, and also states the cardinality of the fundamental relationship
between Theatre
and Movie
more accurately:
Controlling Association Cardinality: Optional Associating Type
On the associating side, we can implement optional cardinality by insisting on the
uniqueness of the referencing fields: in other words, multiple A elements referencing
the
same B are disallowed. For this purpose a third XML Schema construct can be used:
the uniqueness constraint, implemented using the xs:unique
component. So
we'll assert a key on B, a key reference on A, and an additional uniqueness constraint
on A,
using the same fields found in the key reference.
Let's say a restaurant keeps its reservation book as an XML document. It lists its tables, organizes each evening into planned seating times, and then captures each reservation as a name and number of people related to table and time:
Clearly, though each reservation references a table and seating time, we don't want
double-bookings. The Restaurant schema
enforces this constraint by laying an xs:unique
assertion over the
xs:keyref
, so that each combination of table number and time must be unique.
A valid instance document is Book.xml.
A more practical version of this design would include dates, rather than repeating all that table and seating information in multiple documents. There are a few approaches to this:
- We might make the date another attribute of
Reservation
, and add it to the uniqueness constraint. See RestaurantWithDates.xsd and BookWithDates.xml. - A more natural content model might call for an
Evening
element composed ofReservation
s. XML Schema identity constraints really shine here, because they can leverage the natural partitions created via element composition. The constraint can be placed at the appropriate scope, forbidding unique table-time pairs within a givenEvening
while allowing them within the broader document. See RestaurantByDates.xsd and BookByDates.xml. Note that thexs:key
andxs:keyref
components are unchanged, and that thexs:unique
has simply moved to theEvening
scope and lost itsDate
field.
Controlling Association Cardinality: Singular Associating Type
To achieve singular cardinality on A is to insist that for every B there is some referencing A. This is the exact inverse of the key reference that's already in place, and the way to implement this cardinality is to create a symmetrical key and key reference. So, for a one-to-one association, there would be two keys and two key references, one each on types A and B.
Notice that this will work even when the cardinality of B is not singular. The idea of "exactly one A optionally referencing a B" may fit into the human brain only with some pushing and twisting, but it is both conceptually sound and technically feasible. This sort of relationship often will make more sense at every level if it is restated as "for every B there is exactly one A", and certainly this is simpler to implement. However, there may be tension between ease of implementation and the correct naming and navigability as designed: "inversion" to the implementer may be "perversion" to the designer. Keep in mind that the effort here is all in the schema; there is no burden on the instance documents, and if anything they may be most readable and intuitive according to the original design.
Summary
The following table summarizes the techniques for various multiplicity requirements for each side of the relationship:
Multiplicity | Technique for A | Technique for B |
---|---|---|
Multiple (0-n) | xs:keyref alone |
Multiple child elements |
Optional (0-1) | Add xs:unique |
Optional attribute or child element |
Singular (1) | 2nd xs:key and xs:keyref in the other direction |
Required attribute or child element, or the value of B itself |
Will Provost is an independent consultant in software architecture and design. He has been focused for the last few years on distributed systems using J2EE, CORBA, and XML. He is also the coordinator and primary author of Object Innovations' courseware on Java and XML topics.