Integrating Services with XSLT
September 30, 2003
For all the magic that XML, SOAP, and WSDL seem to offer in allowing businesses to interoperate, they do not solve the more traditional problems of integrating data models and message formats. Analysts and developers must still plod through the traditional process of resolving differences between models before the promise of XML-based interoperability is even relevant.
Happily, there's more magic out there: having committed to XML, companies can take great advantage of XSLT to address integration problems. With XSLT one can adapt one model to another, which is a tried-and-true integration strategy, implemented in a language optimized for this precise purpose. In this article I'll discuss issues and techniques involved in bringing XSLT into web service scenarios and show how to combine it with application code to build SOAP intermediaries that reduce or eliminate the stress between cooperating data structures.
Integration Problem Patterns
Model mismatches come in many fun colors and flavors. I won't try to enumerate them, but let's consider a few common patterns for our purposes. Different patterns pose different challenges, not all of which, in fact, can be met by adaptation and transformation alone.
Translating Content
Often, two types will carry the same information but will organize that information differently. This can appear in both simple and complex types:
- Simple types often encode multiple values -- this is most common with strings, where
delimiters such as dots and slashes separate multiple tokens:
<Name>William W. Provost</Name> <Class>com.oi.step.Application</Class> versus <Name>Provost, William W/</Name> <Class>com/oi/step/Application</Class>
- Complex types may be designed with flat or multilevel structures to carry exactly
the
same fields in the same order. The SOAP array shown at right is just one example of
a
two-level hierarchy:
<Stuff> <Junk>A</Junk> <Thing>One</Thing> <Thing>Two</Thing> </Stuff> versus <Stuff> <Junk>A</Junk> <ManyThings soap:arrayType="Thing[2]"> <Thing>One</Thing> <Thing>Two</Thing> </ManyThings> </Stuff>
Narrowing and Widening
Often the fundamental problem is translating between two fields or structures, both of which hold the same data, but one of which is wider than the other. Again, simple as well as complex types can exhibit this disparity:
- Two fields, both of which model a serial number, but whose value spaces differ:
wider: <xs:element name="HowMany" type="xs:long" /> <xs:element name="Sex" type="xs:string" /> narrower: <xs:element name="HowMany" type="xs:int" /> <xs:element name="Sex" > <xs:simpleType> <restriction base="xs:string" > <enumeration value="Male" /> <enumeration value="Female" /> </restriction> </xs:simpleType> </element>
- Two record types, one of which carries all of the information of the other but adds
a
few fields of its own -- a supertype/subtype relationship, if it were designed as
such:
wider: <Song> <Title>Weeping Statues</Title> <Author>Graham Parker</Author> </Song> narrower: <Song> <Title>Weeping Statues</Title> <Author>Graham Parker</Author> <AlbumTitle>Struck By Lightning</AlbumTitle> <Published>1991</Published> </Song>
Solution Patterns at the Business Level
In a recent article, I described three broad categories of solutions to the general service integration problem. Two of these work purely at the business level and have no interesting technical component. The third poses the challenge for this discussion: in this scenario the parties agree that both models must persist and that adaptation between them will be necessary at runtime.
However it may be wrapped in SOAP, WSDL, and other web services garb, the problem of adapting one XML model to another is fundamentally a problem of transformation. As such, XSLT is just about the perfect tool; we ought to be able to derive behavior like this:
Adapter Design
How should a component be shaped to allow data and messages from two different models to be interchanged cleanly? A few high-level questions quickly arise: Who builds it? How vertical or horizontal is it? How widely available? How static or dynamic?
The first is more a business question than a technical one; practically speaking, message transformation performed either request-side or service-side is workable. The answer to this question will imply some lower-level choices. We'll mostly consider the service-side approach, which will be more common.
The other questions involve quantities that will usually be correlated, so that there's almost a single scale representing typical solutions: from more private, vertical, and static to more public, horizontal and dynamic. A very horizontal solution might be heavily parameterized to support a broad range of adaptations at runtime. XML and XSLT both support dynamic binding and decision-making quite well.
Generally, the best solutions will be standalone SOAP intermediaries. Message handlers and other service-proximate components are not sufficiently independent, even for more service-specific designs. (There is also the practical problem for RPC-style services that most mapping tools and runtimes will assume that the adapter and the service itself expect the same vocabulary of messages, which can cause problems if the runtime is validating messages en route against the same schema it uses to generate service code.)
However vertical or horizontal, static or dynamic, the adapter will need to know a few things to do its job:
- The incoming and outgoing service semantics, expressed in WSDL and WXS
- The mapping between them, which means transformations in both directions
- Some message-routing information, which at its simplest will be a single URI for forwarding messages to a service endpoint
As with most intermediaries, it's going to be best to build this one using low-level APIs that allow the programmer direct access to the XML stream. High-level APIs and tools that perform data binding will fail to expose the raw XML and without that it's quite hard to get the full value of XSLT. Also, where these transformations are being performed, it may be a good idea to validate both incoming and outgoing messages, and so the component may invoke a WXS-validating parser as well as an XSLT engine.
Transformation Design
In adapting a web service using XSLT, one can encounter just as broad a range of problems as when building any XML-to-XML transformation. We can identify a few patterns that are specific to the web service case, and these are considered below.
Selective Replacement
Adapters are most often called upon to fix up what's different about messages that are mostly the same. If the two models are completely different, then what would motivate anyone to try to get them to collaborate in the first place? Usually the schema in question will have a great deal in common, even to fine type details, but will require adaptation over some small part of the total model.
All this indicates that the transform will usually spend more of its time copying
nodes
than it will making any changes. As such, a good framework for the transform will
rely on
generic templates that match patterns such as "*"
and "@*|text
()"
, copying and recursing as appropriate but checking for certain element and
attribute names as they do. Specific content will be exempted from the generic rules
and
will truly be transformed to effect the adapter's mapping. Note that it is not a good
idea
to rely on XSLT's built-in rules for this, as they will copy only the character data
and
lose the XML markup. So a good starting point for any service adapter might be something
like this:
<?xml version="1.0" encoding="UTF-8"?> <xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" > <xsl:output method="xml" /> <xsl:template match="*" > <xsl:choose> <!-- <xsl:when>s will go here to override for certain elements. --> <xsl:otherwise> <xsl:copy> <xsl:apply-templates select="*|@*|text ()" /> </xsl:copy> </xsl:otherwise> </xsl:choose> </xsl:template> <xsl:template match="@*|text ()" > <xsl:copy-of select="." /> </xsl:template> </xsl:transform>
Porting to a New Namespace
The two models in question will invariably occupy different namespaces. Adaptation between services in the same namespace is not impossible, but quite uncommon, even within a corporation there will be different namespaces for different applications, projects, etc. So most transforms will need to port messages whole from one namespace to another.
One simple technique combines a namespace declaration for the target namespace in
the
<xsl:transform>
element with a trap for the source namespace URI in the
element-matching template. When namespace-uri ()
matches the source namespace,
the element is rewritten with the root-declared prefix for the target namespace and
the same
local name:
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:in="NewNamespaceURI" > <xsl:output method="xml" /> <xsl:template match="*" > <xsl:choose> <xsl:when test="namespace-uri () = 'OldNamespaceURI' " > <xsl:element name="in:{local-name ()}" > <xsl:apply-templates select="*|@*|text ()" /> </xsl:element> </xsl:when> ...
This will assure that all elements from the old namespace are now named within the new namespace. It does not deal with attributes, nor does it clean up old namespace prefixes; these are both solvable problems which aren't solved here.
Preserving Information
One might be excused for being a bit giddy at this point. Used as directed, XSLT can make many annoying integration problems go away and with relatively low effort at that. To sober up, just remember that almost all integration issues will require bidirectional transformation. That is, data that's transformed on its way in, and perhaps stored somewhere, will eventually be requested and sent back out, and it will have to look right to the requester.
Form is not the only problem here. It is important to avoid the trap of inbound transformations that produce redundant results for different inputs. In other words, there must be a one-to-one mapping between the external and internal value spaces. Precisely preserving information is key to service adaptation, and this is not always so simple. Specifically, recall the problem patterns from the beginning of this article. The first, content translation, usually occurs between two naturally analogous value spaces -- for instance plain text and compressed or encrypted text -- and thus isn't usually a problem.
Widening and narrowing, by their nature, present a real challenge, as the whole point of this pattern is that the value spaces are of different sizes or even dimensions. Programming-language compilers complain about "narrowing conversions" for exactly this reason: translating to a smaller value space introduces the risk of lost information. These sorts of problems often require some sort of business deal to solve; someone has to agree to a loss of precision. The example shown at the top might seem to belie this statement: couldn't we find a more efficient encoding of the printable-ASCII values and preserve 8 characters in 6? Yes and no. The real trick there is in finding unused values in the 8- and 6-byte value spaces.
When to Stop
Finally, as wonderful as XSLT is, it's not designed to solve all possible transformation problems. Generally, it's strong on structural work using node sets and progressively weaker working with single values and their components. String arithmetic, algorithms, and math are notable weak points.
As such, adapter implementations often blend XSLT with the programming language at hand. In the following example, Java code uses the JAXP to trigger an XSLT transformation in each direction, in and out, but for the outgoing messages the XSLT can only do part of the job; direct coding using the SAAJ (Java's low-level SOAP API) is required to complete the transformation.
Example
Additional design and implementation issues abound, and a complete doctrine for creating service adapters is impossible here. I'll conclude with a simple example that adapts a hypothetical standard for tracking package shipments to company X's private model, which for legacy reasons let's say they must keep in place. The standard model is
Company X has a very similar system and, perhaps, has even bent some of its model to the standard already, but they encode tracking numbers in single tokens, keep histories as whitespace-separated lists, and include the current tracking number in the history instead of keeping it separate:
Company X already has a tracking service in place, but of course it expects messages in the internal vocabulary. It implements an adapter that accepts the standard vocabulary, transforms the request using Incoming.xsl, forwards it to the existing service, and transforms the response before returning it. (The outgoing XSLT transform Outgoing.xsl is combined with Java code, as mentioned earlier.)
An incoming track request would travel as follows -- showing just the transformed content here, and fudging the whitespace for presentation:
External XML | Adapter | Internal XML | Service |
---|---|---|---|
Request:
<trackingNumber> <shipperID>Speedy</shipperID> <shipmentNumber> ZX5-20030294-731 </shipmentNumber> </trackingNumber> |
<trackingNumber> Speedy:ZX5-20030294-731 </trackingNumber> |
||
Response:
<trackingNumbers> <trackingNumber> <shipperID>Speedy</shipperID> <shipmentNumber> ZX5-20030294-731 </shipmentNumber> </trackingNumber> <trackingNumber> <shipperID>Acme</shipperID> <shipmentNumber> Previous0987 </shipmentNumber> </trackingNumber> </trackingNumbers> |
<history> Speedy:ZX5-20030294-731 Acme:Previous0987 </history> |
The example happens to be synchronous over HTTP, implemented in Java, and hosted as a standalone service on the same host as the legacy service; these are all choices of convenience for this context and not necessary in general. You can delve into details by downloading the working code.