SOAP Encodings, WSDL, and XML Schema Types
February 20, 2002
Martin Gudgin and Timothy Ewald
Using a web service involves a sender and a receiver exchanging at least one XML message. The format of that message must be defined so that the sender can construct it and the receiver can process it. The format of a message includes the overall structure of the tree, the local name and namespace name of the elements and attributes used in the tree, and the types of those elements and attributes.
The name and types of the element and attributes contained in the message can be defined in a schema. The Web Services Description Language (WSDL) can use a schema in this way. And if a WSDL description of the web service is the start point, then the message format is known before a line of code is written. However, in many cases, the code that is to be exposed as a web service already exists. In other cases, developers are reluctant to start with WSDL, preferring to start with some programmatic data structure. Even in these cases, some description of the web service is needed in order for clients to correctly construct request messages and destructure responses. Ideally that description would still be WSDL, otherwise clients will have to learn to read and understand multiple description languages.
So in cases where a schema and associated WSDL are not the starting point, how is the WSDL to be generated and what format do the XML messages have? Many of the SOAP implementations that exist today will happily take a programmatic data type, typically a class definition of some sort, and serialize that type into XML. But in the absence of a schema, how do these implementations decide whether to use elements or attributes? How do they decide what names to give to those constructs and what the overall structure of the tree should be? The answer can be found in the SOAP Encoding section of Part 2 of the SOAP 1.2 specification.
SOAP Encoding
The SOAP encoding defines a set of rules for mapping programmatic types to XML. This includes rules for mapping compound data structures, array types, and reference types. With respect to compound data structures, the approach taken is reasonably straightforward; all data is serialized as elements, and the name of any given element matches the name of the data field in the programmatic type. For example, given the following Java class,
class Person { String name; float age; }
the name
and age
fields would be serialized using elements whose
local names where name
and age
respectively. Both elements would
be unqualified, that is, their namespace name would be empty. In cases where the name
of a
field would not be a legal XML name Appendix A of the spec provides a mapping algorithm.
The mapping of reference types is more complicated. It involves serializing the instance
and marking it with an unqualified attribute whose local name is id
. All other
references to that instance are then serialized as empty elements with an unqualified
attribute whose local name is href
. The value of the href
is a URI
that references the relevant serialized instance via its id
attribute. This
mechanism provides a way to serialize graphs, including cyclic graphs in XML.
Mapping Types
The SOAP Encoding also provides mappings from programmatic data types to the data types found in XML Schema Part 2: Datatypes. Thus given a programmatic data structure, the name and type of each element in the serialized XML can be determined. People have observed that the SOAP Encoding rules are as much about mapping between type systems as they are about mapping between instance formats.
Given a web service that accepts Person
data structures as input, perhaps to
add them to some list of people that it maintains, a SOAP message to that web service
might
look like this:
<soap:Envelope xmlns:soap="http://www.w3.org/2001/12/soap-envelope" > <soap:Body> <pre:Add xmlns:pre="http://example.org/lists" soap:encodingStyle="http://www.w3.org/2001/12/soap-encoding" > <person> <name>Hayley</name> <age>30</age> </person> </pre:Add> </soap:Body> </soap:Envelope>
The value of the encodingStyle
attribute states that the SOAP Encoding rules
were followed when serializing the data. This enables the deserializer at the other
end of
the pipe to deserialize the message correctly. Other encoding styles can be used with
SOAP
in which case the encodingStyle
attribute would have a different URI value.
Type casting
More from Rich Salz |
It is worth noting that the foregoing message carries enough information for the receiver to figure out the type of all the elements. This is because the type is tied to the element name. Both sender and receiver know what the names of the elements are. And they also know the names and types of the fields in the programmatic data type. Given that the element name is so closely tied to the field name, the type of the element can also be determined.
There are some cases where exact type information may not be known until runtime.
One is
the case of a web service which accepts data in a similar fashion to the COM
VARIANT
, CORBA any
, or Java Object
. Such a service
specifies nothing about the type of the data at design time. Rather type information
must be
provided at runtime. Such services in reality do not accept absolutely any type but
work on
a reasonably small subset of types, generating errors when unknown types are encountered.
Another case is where further classes are derived from the Person
class, for
example, RacingDriver
and FootballPlayer
. Assuming the web service
understands these classes, they could be submitted in request messages.
In both cases, the "totally" polymorphic element and the more specific case of explicitly derived types, the element name is no longer enough to fully identify the type of the element. Something more is needed.
The SOAP Encoding rules allow the use of the type
attribute from the
http://www.w3.org/2001/XMLSchema-instance
namespace to be used to specify
that a particular type is being passed at runtime. The person
element in the
message would then appear look like
<person xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="RacingDriver" > <name>Martin</name> <age>34</age> <bestplace>6th</bestplace> </person>
It is worth noting that the xsi:type
attribute is of type QName
.
Thus, strictly speaking, the above example refers to an unqualified
RacingDriver
type. In reality a namespace should probably be assigned to
types to avoid name clashes. Also, xsi:type
is only needed when the exact type
is not known until runtime. In cases where both sides know the types in advance, the
most
common case, xsi:type
, is redundant.
Conclusions
Whenever messages are sent some type information is known in advance. In some cases
all
types are completely known and further information beyond the element names is not
needed.
In other cases, more specific type information may be communicated at runtime. In
such
cases, the xsi:type
attribute is used and the types really need to be assigned
to namespaces.
It would seem that whenever and however we define message formats for a given web service exchange we are really defining a schema for those messages. Thus the SOAP encoding is really about mapping from programmatic type systems to an XML type system, that of XML Schema. Some aspects of that mapping work very well; other aspects, such as references, do not map particularly well, due to the tree nature of XML. Given that the serialization format is XML, and XML is a tree, serious thought should be given to whether more esoteric programmatic constructs such as references need to be directly modeled in SOAP. If such constructs really are needed, an XML Schema friendly approach should be taken.