A Smoother Change to Version 2.0
April 11, 2007
"Ch-ch-ch-changes" sang David Bowie, "Just gonna have to be a different man. Time may change me but I can't trace time." It's a great idea for a song, but when moving to the next version of an XML-based exchange, we would like a transition with less stutter. However, this is not easy: on one hand, we want old version processors to accept new messages, the way older browsers can display newer HTML by ignoring unknown tags. On the other hand, those "mustIgnoreUnknown" semantics are often unwanted. In financial transactions, medical care, justice, etc., certain parts must always be understood -- we'd rather have a physician grab the phone and call us than have him ignore those parts about lethal allergies that his client software did not understand. SOAP provides a mechanism for SOAP headers: a "mustUnderstand" flag, which indicates which parts may not be ignored. Unfortunately, this mechanism is rather inflexible.
This article will outline a design pattern that makes a version transition much easier, and that is both more powerful and simpler than SOAP-style "mustUnderstand" semantics. In a nutshell, the next version of an XML vocabulary can be backward- or forward-compatible with the previous. We know the language version we use when we create a message, and all previous language versions that are forward-compatible with this message. So we put the list of those versions in the message. We cannot know which message versions the receiver supports, but if it supports only earlier versions, it can decide from the list whether it will process our message or not. If the receiver supports later versions, it can decide whether the message is backward-compatible with the receiver's version. In all cases, the receiver can make the best decision possible. Let's study this "Capability Compatibility Design Pattern" in more detail.
On Compatibility
Figure 1. (In)compatible language
First, some background. Read David Orchard's "A Theory of Compatible Versions" or the W3C TAG's Versioning Finding for more. With traditional software -- say, a word processor -- things are simple: if your V2 word processor can read documents created with your V1 word processor, the V2 processor is backward-compatible with the V1 processor. If the V1 processor can read V2 documents, it is forward-compatible with V2. Of course no one will buy a V2 word processor, write documents, and then buy a V1 word processor and try to access the V2 documents, so this is uncommon in traditional software applications. It does happen when we email documents and the receiver has an older word processor than we do. It also happens a lot on the Web, where older web browsers may encounter markup that wasn't known when the browser was built. HTML tackles this problem by requiring software to ignore all unknown tags, and just display the content (if possible).
For two language versions -- L1 and L2 -- for message exchange, we can summarize:
- If L2 applications can read all L1 documents, L1 and L2 are backward-compatible
- If L1 applications accept all L2 documents, L1 and L2 are forward-compatible
Figure 2. Forward and backward compatibility
IgnoreUnknown and MustUnderstand Semantics
If two language versions, L1 and L2, are forward-compatible, we do not expect L1 to process all L2 syntax. Like HTML, we just expect the earlier application to accept documents in later versions of the language, and show what can be shown. This is what we call "IgnoreUnknown" semantics, and this is where "MustUnderstand" comes in. Some information simply may not be ignored. This is frequently the case with information related to security. SOAP provides a mechanism for SOAP headers to achieve this:
<my:security-header soap:mustUnderstand = "1">
If the mustUnderstand
attribute is set to "1", an application may only process
the message if it understands the semantics of this header. MustUnderstand
overrides IgnoreUnknown
.
IgnoreUnknown
works well for browsers, but sometimes understanding is simply
mandatory. Again, this is true for nearly everything related to security, and much
of
reliable messaging and transactioning as well. It is also often true in environments
such as
health care or finance: if you do not understand the information I sent you, I'd rather
have
you reject the message and call me than ignore dosage in the medical prescription
I sent, or
the maximum on the stock order I submitted. Some things need to be understood. SOAP
mustUnderstand
semantics are not very flexible, however:
mustUnderstand
works only for SOAP headers. It could be extended to cover
elements in SOAP:Body
as well, but this potentially adds an attribute to every
element in the tree -- yuck! It also only works on the level of an entire element.
There
must be a better way.
The Capability Compatibility Design Pattern
One of the principles that follows from the discussion of compatibility is that a sender knows which language version was used to create a message, and the capabilities of that language and of earlier versions. So the sender can put this information in the message itself. Of course the language version L4 that was used to produce a message is suitable for understanding it, so any receiver that understands L4 may process it. The sender can also know whether this particular message uses any new items introduced in version L4. Maybe it uses only items already in the previous language version, L3. So the sender could indicate in the message that any L3 receiver may process it. Ditto for L2 and L1.
If the message does contain new L4 items, and those items can be safely ignored, the sender can also list L3 as sufficient for processing. If the message contains items from L4 that must be understood, the sender will list only the L4 capability as sufficient for processing the message. The receiver knows which version was used to build the receiving software, its capabilities, and the capabilities of earlier versions. So if the receiver is built using language version L5, it will know whether it can process L4 messages (it usually will -- but sometimes language changes will not be backward-compatible). If it can, L5 receivers will simply know they can safely process L4 messages. So if we put the version information into the message itself, the receiver can calculate whether it may process the message or not -- in the latter case, the receiver can return an error message.
Figure 3. L3 and L4 compatibility
This "Capability Compatibility Design Pattern" extends well beyond elements. Of course
any
attribute in a particular language version can be handled in exactly the same way.
More than
this, the Pattern easily handles element content as well. If we have an L4 code list
with
values "Standard" and "Handle with Care," and then L5 introduces a code "Unknown,"
it can be
ignored, and L4 receivers may process it. If L5 contains a new code "Hazardous," this
may
not be ignored -- only L5 receivers may use such a message for subsequent transport
of
associated goods. In fact, the Capability Compatibility Design Pattern can handle
any type
of change in the language. And instead of requiring mustUnderstand
attributes
sprinkled throughout the entire document, a single list with a couple of language
versions
required for processing is sufficient.
Let's do a walkthrough of the Capability Compatibility Design Pattern. In each example a new type of version change is shown, as well as the way it is handled by the Capability Compatibility Design Pattern.
The Medication Example
We'll start with a language used by physician to send medication prescriptions to
apothecaries. Here is version 1:
<?xml version="1.0" encoding="UTF-8"?> <message version="1"> <require> <version>1</version> </require><prescription> <medication>aspirin</medication><amount>24</amount> </prescription> </message>
We'll ignore all details such as patient IDs, namespaces, etc., and focus on the medication
and the versioning information. In a <require>
element we list the
versions that may accept our message -- just version 1 for version 1 of the language.
Normally, using a URI to identify the version would be the thing to do, but for brevity,
I've used just integers in the examples. L1 processors will also need the capability
to
ignore unknown tags. I've supplied a XSLT script that
transforms an Lx document to L1 by removing all unknown elements in the
<prescription>
element. There are other mechanisms -- using NVDL, authoring XML Schemas with wildcards, doing this in
Java or C on the server -- but this will do here. The important thing is that any
language
that uses the Capability Compatibility Design Pattern must have a mechanism for
ignoring unknown content. The processing model for the language is:
- Check the required versions.
- If not available, return an error.
- Strip unknown content with stylesheet.
- Validate against schema.
- Dispatch for further processing.
Here's a zip file with all sample XML, all "ignore unknown" stylesheets, and schemas for the examples in the article.
Adding an IgnoreUnknown Element
For version 2 of the language, we'll add an <packaging>
element. This is
the advised packaging of the medicine. Understanding it is not required; apothecaries
are
specialized enough to select the best packaging, and the element contains merely an
advisement, not a prescription:
# Language version 2 # added administration, mustUnderstand = false element message { attribute version { xsd:integer }, element require { element version { xsd:integer }+ }, element prescription { element medication { xsd:string }, element amount { xsd:integer }, element packaging { xsd:string }? } }
The change in language 2 -- L2 for short -- is backward-compatible: since the
<packaging>
element is optional, any L1 document (such as the one
above) will be valid in L2. Because the language is backward-compatible, L2 receivers
have
the capability to understand L1 and L2 messages. L2 also gets its own
stylesheet for ignoring unknown tags; this one will also retain the new
packaging
element.
L2 capabilities: read L1 L2 write L2
An L2 instance may contain the new element:
<?xml version="1.0" encoding="UTF-8"?> <message version="2"> <require> <version>1</version> <version>2</version> </require> <prescription> <medication>aspirin</medication> <amount>24</amount> <packaging>box</packaging> </prescription> </message>
The L2 instance lists the receivers that may process this message: version 1 and version 2 receivers. If an L1 receiver gets this message, it will conclude that it is safe to process this message. It will then do its "ignore unknown" magic and remove unknown elements, which will yield message version 1 above. We thus have the desired forward-compatibility.
Applying MustUnderstand Semantics
Next we'll go for an element that must be understood. We'll expand the language and
enable
the physician to instruct the apothecary to send the medication by mail to the patient's
home address. We'll introduce an <element delivery { "mail" | "standard" }?
.
The receiver must understand this element, otherwise the medication would never be
sent to
the patient. However, the element is optional, so understanding is obviously only
necessary
when the element is present. We now get two flavors of instances:
<?xml version="1.0" encoding="UTF-8"?> <message version="3"> <require> <version>3</version> </require> <prescription> <medication>aspirin</medication> <amount>24</amount> <delivery>mail</delivery> </prescription> </message>
The first flavor does have the <delivery>
element. The only receivers
that may process it are version 3 receivers. Version 1 and 2 receivers will recognize
that
they are not allowed to process it, and must return an error. It amounts to the same
as a
SOAP-style "mustUnderstand" flag on the <delivery>
element, but without
the need for such flags on every element that must be understood.
The second flavor does not have the delivery
element:
<?xml version="1.0" encoding="UTF-8"?> <message version="3"> <require> <version>1</version> <version>2</version> <version>3</version> </require> <prescription> <medication>aspirin</medication> <amount>24</amount> </prescription> </message>
It basically is the same message 1 again. Receivers that support
either L1, L2, or L3 may process it. This highlights a principle that every writer
application should adhere to: maintain a list of versions that may consume the produced
instance. For L3, the default list is L1, L2, L3. But whenever a
<delivery>
element is inserted, the list should be restricted to the
L3-level receivers minimum.
Removing Obsolete Features
Sometimes backward-incompatible changes are made. A common case is when an ill-conceived part is replaced by a better alternative; the original is marked as "obsolete" for some versions, then removed. Such a removal is not backward-compatible. Let's take a detailed look:
<?xml version="1.0" encoding="UTF-8"?> <message version="1"> <require> <version>4</version> </require> <prescription> <medication>aspirin</medication> <quantity unit="pcs">24</quantity> </prescription> </message>
Now of course the idea of an <amount>
element was ill-conceived. Not all
medication comes in countable pieces. Sometimes prescriptions are in milliliters or
milligrams. So we decided to remove amount and introduce <quantity>
, with
a unit attribute. We won't remove the obsolete <amount>
element after
several versions -- we'll do it right away. Receivers now must support L4: if older
processors try to process the message, amount
would lack and
quantity
would be stripped, making the prescription incomplete.
Furthermore, version 4 of the language could refuse to accept documents with the
<amount>
element:
L4 capabilities: read L4 write L4
In this case, messages from older senders would be rejected with an "Obsolete version,
please upgrade" error. This seems a bit harsh for the <amount>
example,
but if a security leak were discovered in the older versions, such a policy would
be
advisable for sensitive messages. And even for simple features, after enough time
it makes
sense to require all parties in a professional environment to support a specific minimum
level of a language specification.
Attributes and Code Lists
The Capability Compatibility Design Pattern supports not only new and removed elements,
but
attributes as well. Of course it depends a bit on the "ignore unknown" implementation,
but
supposing we remove not only unknown elements, but unknown attributes in known elements
as
well. The mechanics for attributes are no different than those sketched above. What's
more,
we can require support for some version of the language based on the code value in an
enumeration. Above, we introduced the <delivery>
tag:
element delivery { "mail" | "standard" }?
We can change it to:
element delivery { "mail" | "standard" | "personal" | "any" }?
Now "any" could mean it's up to the apothecary to decide how to deliver the medication. The value can safely be ignored by older processors.
<?xml version="1.0" encoding="UTF-8"?> <message version="5"> <require> <version>4</version> <version>5</version> </require> <prescription> <medication>aspirin</medication> <quantity unit="pcs">24</quantity> <delivery>any</delivery> </prescription> </message>
The stylesheet for version 4 will remove the
<delivery>
element with the unknown value. But the value "personal"
means the physician insists that the medication may only be given to the patient in
person,
not to anybody else. This value may not be ignored, so version 5 processors should
require a
minimum of version 5 whenever they insert the personal
value:
<?xml version="1.0" encoding="UTF-8"?> <message version="5"> <require> <version>5</version> </require> <prescription> <medication>aspirin</medication> <quantity unit="pcs">24</quantity> <delivery>personal</delivery> </prescription> </message>
Namespaces and multiple languages in a single document make things more complicated than can be shown here, but the same principles apply.
Conclusions
The Capability Compatibility Design Pattern is a very flexible and powerful way to
control
changes in versions of languages for exchanges over the Internet. It goes beyond SOAP-style
mustUnderstand
headers and easily supports IgnoreUnknown
and
mustUnderstand
semantics for elements, attributes, and enumerations. It does
this all by adhering to two simple principles:
- List all versions, including older ones, that you know may process your message inside the message you make.
- Know all versions you support, and check whether they fit requirements in incoming messages.