Generating SOAP
June 12, 2002
Introduction
Last month we used the Google web services API to point out some warts in WSDL. This month we'll use the same API to walk through the steps involved in building an application which uses Google.
We'll do the implementation in Python. Python is open source and runs on all the popular platforms. Python is the kind of language that's very well-suited to SOAP and XML processing: it's object-oriented, so you can build large-scale programs; it allows rapid development cycles, and it has powerful text manipulation primitive and libraries, including comprehensive Unicode support. It also provides automatic memory management, good support for introspection (i.e., a program can examine its code and datatypes), and has an active XML community.
We have a couple of choices for the SOAP stack, each choice bringing its own set of features:
- SOAP.py -- a small, streaming (SAX-based) parser
- SOAPy -- includes basic WSDL (and Schema) support
- ZSI -- emphasis on native datatype support; DOM based
We'll use ZSI because of the emphasis it places on typing, and because I wrote it. ZSI is a pure-Python open source SOAP implementation, available at pywebsvcs, a meta-project which serves as an umbrella for several Python web services projects.
Implementing a Google and SOAP Application with Python
Our approach will be to create a local object with fields (Python calls them attributes)
which map to the Google Search request. Recall that the message definition from the
GoogleSearch.wsdl
looks like this:
<message name="doGoogleSearch"> <part name="key" type="xsd:string"/> <part name="q" type="xsd:string"/> <part name="start" type="xsd:int"/> <part name="maxResults" type="xsd:int"/> <part name="filter" type="xsd:boolean"/> <part name="restrict" type="xsd:string"/> <part name="safeSearch" type="xsd:boolean"/> <part name="lr" type="xsd:string"/> <part name="ie" type="xsd:string"/> <part name="oe" type="xsd:string"/> </message>
The key
is a Google-provided authentication token. It serves several purposes,
and we'll return to it below. The q
is the query string, which is basically the
usual URL-encoded query. The search only returns a subset of the results; thus,
start
and maxResults
can be used as a cursor to walk through the
results a section at a time. The default is to return the first ten results. The
filter
, restrict
, safeSearch
, and lr
(language restriction) fields are used to specify whether and how results should be
filtered; ie
and oe
fields specify input and output character set
encodings respectively.
Defining a Python object which has a constructor that sets the defaults is fairly straightforward.
## Pound sign introduces a comment. ## Blocks are identified by indentation class Search: typecode = tcGoogleSearch('g:doGoogleSearch', typed=0) ## __init__ is the constructor; self is like C++'s this def __init__(self, query, key): self.key = key self.q = query self.start = 0 self.maxResults = 10 self.filter = 1 self.restrict = '' self.safeSearch = 0 self.lr = '' self.ie = 'latin1' self.oe = 'latin1'
Once we have a search object, we'll use ZSI to make a SOAP message and serialize that into a string.
s = Search('rich+salz', 'No,I.am.not.going.to.give.my.key') buff = StringIO.StringIO() sw = ZSI.SoapWriter(buff, nsdict={'g': 'urn:GoogleSearch'}) sw.serialize(s, oname='doGoogleSearch') request = buff.getvalue()
Making an HTTP request out of the SOAP message is straightforward. We get the target
host
and URL from the WSDL service
element.
<!-- Endpoint for Google Web APIs --> <service name="GoogleSearchService"> <port name="GoogleSearchPort" binding="typens:GoogleSearchBinding"> <soap:address location="http://api.google.com/search/beta2"/> </port> </service>
The value of the SOAPAction
header comes from the definition for the
doGoogleSearch
operation; while the WSDL file specifies a value, it appears
that Google doesn't check. Which is a good thing, since SOAP 1.2 deprecates the use
of the
SOAPAction
header.
Next, we need to put those items together and make an HTTP post. The only nuisance
is that
we had to create the SOAP message so that we could create a Content-Length
header:
import httplib conn = httplib.HTTPConnection('api.google.com', 80) conn.connect() conn.putrequest('POST', '/search/beta2') conn.putheader('Content-Length', str(len(request))) conn.putheader('Content-type', 'text/xml; charset="utf-8"') conn.putheader('SOAPAction', 'urn:GoogleSearchAction') conn.endheaders() conn.send(request)
It's not hard to see how almost everything is boilerplate; almost everything can be generated from a single WSDL file, from the local datatypes, up to and out onto the network.
The careful reader may realize that we've glossed over how the serialize
function works. Most SOAP toolkits require access to the data definition -- in this
case,
the XML Schema defined in the WSDL -- in order to generate serialization code. How
do we get
from the Python Search
object to the SOAP message shown in listing 1? While we don't want to get bogged down in
the details of a particular SOAP implementation, we'll take a brief look at ZSI's
mechanism,
in order to get an understanding of some of the issues involved.
SOAP and Serialization
ZSI uses typecodes
to describe the data. There are primitives for all the
standard XML Schema primitive types, including dates, integers, strings, and so on,
as well
as constructors to build aggregated types such as complexTypes
, which can often
map directly into something like a classic C struct
.
Let's look at an individual search result, which has the following schema definition:
<xsd:complexType name="GoogleSearchResult">gt; <xsd:all> <xsd:element name="documentFiltering" type="xsd:boolean"/> <xsd:element name="searchComments" type="xsd:string"/> <xsd:element name="estimatedTotalResultsCount" type="xsd:int"/> <xsd:element name="estimateIsExact" type="xsd:boolean"/> <xsd:element name="resultElements" type="typens:ResultElementArray"/> <xsd:element name="searchQuery" type="xsd:string"/> <xsd:element name="startIndex" type="xsd:int"/> <xsd:element name="endIndex" type="xsd:int"/> <xsd:element name="searchTips" type="xsd:string"/> <xsd:element name="directoryCategories" type="typens:DirectoryCategoryArray"/> <xsd:element name="searchTime" type="xsd:double"/> </xsd:all> </xsd:complexType>
The all
says that the sub-elements can be in any order, but, except for
directoryCategories
, they are all basic primitive types. In ZSI we define a
new class, tcSearchResult
, which is derived from ZSI's Struct
class. Generic
is the name of the local class that ZSI will create when it
parses a search result message. This class lets you set any class attributes. More
complicated uses would likely need special classes which set defaults, enforced additional
validity constraints, and so on.
class tcSearchResult(ZSI.TC.Struct): def __init__(self, pname=None, **kw): ZSI.TC.Struct.__init__(self, Generic, [ ZSI.TC.String('summary', unique=1), ZSI.TC.String('URL', unique=1), ZSI.TC.String('snippet', unique=1), ZSI.TC.String('title', unique=1), ZSI.TC.String('cachedSize', unique=1), ZSI.TC.Boolean('relatedInformationPresent'), ZSI.TC.String('hostName', unique=1), tcDirCat('directoryCategory'), ZSI.TC.String('directoryTitle', unique=1), ], pname, inorder=0, **kw)
The pname
is used to specify the parameter name, which is basically what name
the element will have. As you can see, the bulk of the code is creating a list --
indicated
by the square brackets -- which define the items appearing within the search results
element. The inorder=0
parameter specifies that the ZSI parser should not
require the elements to appear in any specific order, analogous to the XML Schema
any
element.
But what about those unique=1
parameters? They are additional metadata which
tell ZSI that pointer aliasing is not important. As part of its support for local
datatypes and legacy RPC systems (DCE/DCOM in particular), SOAP RPC encoding defines
mechanisms used to preserve aliased pointers -- those pointing to the same block of
memory,
as opposed to having the same value.
For example, if p
and q
are C character pointers, then the
following fragments all have different semantics:
/* Different pointers, same value. */ p = strdup("hello"); q = strdup("hello"); /* Different pointers, different value. */ p = strdup("hello"); q = strdup("hangup"); /* Aliased pointers. */ p = strdup("hello"); q = p;
Suppose we now invoke the following subroutine on the different values of p
--
what would the value of q
be?
void up1(char* s) { s[0] = 'H'; }
Using SOAP RPC encoding, it's possible to preserve this behavior even if up1
is invoked on a remote machine. To do this, you can tag an instance of the data with
an XML
id attribute, and aliased instances use the href attribute to point to the other
instance.
Normally, the values must appear after the proper body of the SOAP message, that is, as succeeding elements in the SOAP body:
<soap-env:Body> <tns:pandq> <p href="#pval"/> <q href="#pval"/> </tns:pandq <tns:node1 id="pval"> hello </tns:node1> </soap-env:Body>
All of which opens a can of worms known as "serialization roots", which we happily ignore.
By special dispensation, however, strings can be inlined at one of their uses and
not
inlined other times. This gives us the following more common ways of encoding p
and q
:
<tns:pandq> <p id="#pval">hello</p> <q href="#pval"/> </tns:pandq>
As you might expect, Google doesn't care if any of the strings are aliased, if only because they are input parameters. We direct ZSI to avoid the aliasing by indicating that each string is unique. That should probably be the default; if not a bug, it's at least a misfeature.
Having taken a brief tour through all the automation possible with WSDL-defined SOAP RPC messages, next month we'll show how to use SOAP headers to build our own value-added services, moving from "wizards generating code" to "interesting distributed application design."