Implementing XPath for Wireless Devices, Part II
July 17, 2002
In the first part of this article, we introduced XPath and discussed various XPath queries ranging from simple to complex. By applying XPath queries to sample XML files, we elaborated upon various important definitions of XPath such as location step, context node, location path, axes, and node-test. We then discussed complex XPath queries that combine more than one simple query. We also discussed the abstract structure of Wireless Binary XML (WBXML), which is the wireless counterpart of XML. Finally we presented the design of a simple XPath processing engine.
In this part, we will discuss the features of XPath which allow for complex search operations on an XML file. We will discuss predicates or filtered queries and the use of functions in XPath. We will present various XPath queries for the processing of WSDL and WML. We will also enhance the simple design of our XPath engine to include support for predicates, functions, and different data types.
Filtered Queries and Predicates
Let's start with a simple query which will return the root node of any XML file:
./node()
We can take this further with another simple query, which selects all the immediate children of the root node:
./node()/*
What if you want to find all the nodes that are the immediate children of the root
node
and have a type
attribute? The following query will help:
./node()/*[attribute::type]
This query will return the binding element from Listing 1. This shows that the code
attribute::query
written within square brackets acts as a filter. Filters in
XPath are called predicates and are written inside square brackets. A predicate acts
on a
node-set -- in this example, the node-set consists of all immediate children of the
root
node -- and applies the filtering condition -- here: the node must have a type attribute
--
to the node-set. The result is a reduced, that is, filtered node-set.
Predicates can range from simple to very complex. Perhaps the simplest form of XPath
predicate is just a number as shown in the following query which returns the second
child
(message
element) of the root element:
./node()/*[2]
The query, ./node()/message[attribute::name="TotalBill"]/text()
will look for
a particular message
child of the root element whose attribute
name
has a value TotalBill
. The query will return all text nodes
of the particular message
element. This query will return the second of the two
message
elements of Listing
1.
XPath Functions
Suppose you want to answer following questions about the WSDL file in Listing 1:
1. What is the value of the name attribute of last operation element?
2. How many
message child elements does the definitions element have?
3. What is the name of the
first child element of the root element?
The last()
Function
The last()
function will always point to the last node in the node set. The
following query, when applied to the WSDL file in Listing 1, will return the second
message
element (i.e. the message
element whose name is
TotalBill
):
./node()/message[last()]
Note that the following query also returns the same message element:
./node()/message[2]
The only difference between the two queries is that we have replaced the
last()
method with a number two (2). It is correct to conclude that the
last()
function in this case is actually returning the number 2 (the number
of nodes in the node set of the particular location step). Apply the same two queries
to the
WSDL file of Listing 2 (you may use the
XPath Tester application mentioned in the resources) and you will see that this time
the two
queries do not return the same result. There are three message elements in Listing 2, so the last()
function is now returning the number 3.
Notice from this discussion that the last()
function always returns a number.
The position()
Function
If you apply the following queries to the WSDL file in Listing 2,
./node()/message[1]/part
./node()/message[2]/part
./node()/message[3]/part
they will return the part children of the first, second, and third message
elements respectively. This shows that there is a proximity position of each node
in the
node set. The proximity position of the first node is one, the second node is two
and so on.
What if you want to find all the message
elements except the second? You can
use the position()
function which works on the proximity position of a context
node. The following query will return the first and third message
elements of
Listing 2:
./node()/message[position()!=2]
The position()
function simply returns the proximity position of the context
node being evaluated. The predicate [position()!=2]
will compare the proximity
position with the number 2 and include the context node in the node-set only if proximity
position is not equal to two.
The count()
Function
How many message
children does the portType
element in Listing 1 have? Count them and you will find
two message
elements. Specifying a "how many" question in XPath is a two-step
procedure. First write an XPath query that will find all those elements that you wish
to
count. Then pass the XPath query to the count()
function as shown below:
Step1: ./node()/message
Step 2: count(./node()/message)
The count()
function calculates and returns the number of nodes in the
resulting node-set of the XPath query.
The name()
, local-name()
and namespace-uri()
Functions
What does the following query return when applied to the WSDL file of Listing 1?
./node()/*[5]
It returns the fifth child (the service
element) of the root element. The
service
element itself is a complete structure and contains child elements.
Therefore, the returned value of this XPath query is actually an XML node and not
just the
name of an element.
The name()
function returns the name of the XML node in question. For
example, the following query will return the string "service" when applied to Listing 1:
name(./node()/*[5])
Similarly, the following query will return the string "wsd:definitions" (fully qualified name of the root element with the namespace prefix):
name(./node())
The local-name()
and namespace-uri()
functions are similar to
the name()
function, except that the local-name method returns only the local
name of the element without the namespace prefix, and the namespace-uri function returns
only the namespace URI. For example, try the following queries on Listing 1:
local-name(./node())
namespace-uri(./node())
The first query returns a string "definitions", while the second returns "http://schemas.xmlsoap.org/wsdl/".
String
Functions
We have seen that the name()
, local-name()
, and
namespace-uri()
functions return strings. XPath offers several functions for
the processing of strings, such as string()
, substring()
,
substring-before()
, substring-after()
, concat()
,
starts-with()
etc. For example the following query demonstrates how to use
the string()
function:
string(./node()/*[2]/part/attribute::name)
The above query will look for the second child of the root element, then it will
find all
the part child elements of the root's second child. It will then look for the name
attribute
of the part child elements, and, as a last step, it will convert the value of the
name
attribute to a string form. When applied to Listing 1, it will yield bill
.
XPath also provides several functions that return true or false (Boolean data type). Consider the following query:
boolean(./node()/message)
It returns true when applied to Listing 1.
That's because the boolean()
function checks whether a node-set resulting from
an XPath query is empty or not (in our case, it contains two message
children
of the root element). If it is empty, the boolean()
function returns false,
otherwise true.
A Comprehensive WSDL Processing Example
The following WSDL processing scenario uses all the XPath concepts which we've discussed so far. The search requirement for the scenario is as follows:
Find aservice
element which is a direct child ofdefinitions
(root) element and whosename
attribute matches with thename
attribute of thedefinitions
element. Then look into thatservice
element and find aport
element whosebinding
attribute matches thename
attribute of abinding
element, which is a direct child of thedefinitions
(root) element.
This WSDL processing can be fulfilled in four steps:
1. Find the value of the name
attribute of the definitions
(root) element. The following XPath query (which returns the string
BillingService
from Listing
1) performs this job:
string(//node()[1]/@name)
2. Then find the service
element whose name
attribute matches
the name
of the definitions
element. The following query contains
the query of point 1 in a predicate and will return the required service
element:
./node()[1]/service[@name=string(//node()[1]/@name)]
3. Then find the value of the name
attribute of the binding
element:
string(//node()[1]/binding/@name)
4. Finally look for the required port
element (whose binding
attribute matches the name
of the binding
element of point 3)
inside the service
element of point 2:
./node()[1]/service[@name=string(//node()[1]/@name)] /port[@binding=string(//node()[1]/binding/@name)]
This example demonstrates that XPath predicates can contain simple logical conditions, function calls or even complete XPath queries.
WML Processing with XPath
WML is an XML language defined by the WAP Forum. WML provides a presentation format for small-device displays. WML is to a small-device display what HTML is to a personal computer.
Imagine a WML file consisting of a deck of cards, where each card is wrapped by a
card
element. Listing 3 is a
simple WML file that contains two card
elements.
The following XPath query will return all p
(paragraph) elements contained
within the first card
(the card element whose id is "first") of Listing 3:
./node()/card[string(@id)="first"]/p
The next query returns the textual contents of the first paragraph of the second card:
string(./node()/card[string(@id)="second"]/p[1]/text())
Implementing XPath Predicates and Functions
We will now see how to include the support of predicates and Functions in the simple design of our XPath Engine.
The four pseudo-code classes XPathExpression
(Listing 4), XPathLocationStep
(Listing 5), XPathResult
(Listing 6), and Predicate
(Listing 7) form the updated design that
includes support of predicates and functions. We have introduced the following enhancements
to the classes presented in part 1:
1. XPath can return various types of data. Examples of data types XPath may return
include
nodes
, strings
, numbers
, and Boolean
s.
Our XPath engine design supported only XML nodes as return data types. We have now
provided
a generic class named XPathResult
(Listing 6) to support the different data types. Implementations based on our design
will need to extend XPathResult
for each data type separately.
2. The updated design now includes an architecture to support functions. A function call may occur at the beginning of an XPath query or inside any XPath location step. Therefore, both the XPathExpression Listing 4 and XPathLocationStep (Listing 5) classes now have added support for function calls.
3. We have provided a separate class for predicates (Listing 7). A predicate may consist of only a logical condition or an entire XPath query. Therefore, the Predicate class constructor will check whether the predicate is a complete query or just a condition. If it is a complete XPath query, the Predicate expression will instantiate a new XPathExpression object, otherwise it will just evaluate the logical condition to evaluate the filtered results.
Summary
In the preceding, we discussed the syntax and use of predicates and functions in XPath. We presented various WSDL and WML processing examples and demonstrated how to form complex XPath queries. Finally, we enhanced the design of the XPath engine introduced in the first article.
Resources
- Check out the first part of this series: Implementing XPath for Wireless Devices.
- Read the official XPath specification at W3.org.
- Download the XPath Tester and try XPath queries on XML files.
- On XML.com, visit Bob DuCharme's Transforming XML column for more articles on XPath.
- Check the XPath tutorial at ZVON.org
- Consult the book Definitive XSLT and XPath by G. Ken Holman (Prentice Hall).
- WAP Site Authoring will help you learn WML.