XQuery, the Server Language
June 6, 2007
Over the years, I've had the chance to program in a lot of different server-side scripting languages--C, Perl, ASP, JSP, PHP, ASP.NET, Python, Ruby, among others. It seems every few years that we programmers decide that we're bored with the current language and would like to try something new just to try something new. Admittedly, the sensibilities that come with the successes and limitations of preceding languages also tend to shape the new, so that each iteration tends to become, if not more successful, then certainly at least less unwieldy than its predecessors.
As an XML developer, one of the problems that I come across almost invariably within these languages is the fact that they are shaped by people who view XML as something of an afterthought, a small subset of the overall language that's intended to satisfy those strange people who think in angle brackets. However, one side effect of this viewpoint is that a rather disturbing amount of server code is still being written with HTML content (and often badly formed HTML at that) being written inline as successive lines of composed strings. For instance, it's not at all unusual to see inline PHP that looks something like:
$buf ="<html><head><title>".$myTitle; $buf += "</title><body>"; $buf += "<h1>This is a test.</h1>"; $buf += "<p>If this were an actual emergency, we'd be out of here by now."; echo $buf;
Not surprisingly, with this particular approach, your ability to create modular code is virtually nil, the likelihood that you as the developer of this particular page will spend many late hours trying to figure out why your table fails to render properly after the twelfth row (and causes the browser to crash after the 200th) is correspondingly high, and maintaining it after three months well nigh impossible.
This is perhaps one of the biggest benefits of working with XML, even though its not necessarily sold as such--an XML (or XHTML) document has its own internal integrity. You deal with XML at the level of the document, not the level of a string, and as such you can effectively abstract away the complications of creating markup content. Indeed, in a number of successful XML pipelines (such as the Apache Cocoon project), most if not all of the processing can take place through one XML operation (such as an XSLT transformation) or another, so that, in essence, your need to drop out of the XML world back into procedural languages drops off dramatically.
In February 2007, the XQuery specification became a formal W3C Recommendation, after nearly six years of development. As a language, XQuery can best be thought of as a way to turn the integrated language used to retrieve sets of nodes from an XML document, XPath, into a standalone language. To do so, XQuery adds a number of features--command and control structures (such as for expressions), the ability to create intermediate date variables (the let keyword), conditional handling (if/then/else), and the like to the XPath 2.0 language. Perhaps more significantly, however, XQuery also adds the ability to create modules consisting of collections of XQuery functions, and provides a way to subscribe to external functions within their own respective namespaces.
The original intent of the developers of XQuery was to use it, not surprisingly, as an XML-oriented query language. XQuery is not itself XML based (nor for that matter is XPath), but all of its operations are designed to work with XML documents or XML databases to provide a way of filtering or manipulating that XML to produce some form of output, most typically as XML or HTML.
Intriguingly, as a filter on XML, XQuery has seen only limited success. Part of this has to do with the fact that a significant number of the databases currently in use are SQL based, not XML based, so the benefits to gained by using an XML query filter are offset by the need to convert relational data into XML in the first place.
However, a few recent XML databases have taken XQuery to heart, and use it as the primary mechanism for accessing the XML database content. One in particular, the open source eXist project, has gone somewhat further, by inverting the normal sequence of working with XML where the XML object or data store is passed in as an object to the XQuery filter within the context of a server session. In the case of eXist, the various session objects--request, response, server, and so forth--are instead brought into the XQuery engine as externally defined XQuery methods.
In other words, in this situation, the server-side scripting language is not PHP or
ASP.NET
or JSP, it's XQuery. In the particular case of eXist, this all occurs in the context
of a
servlet hosted by Jetty or Tomcat or some similar Java Servlet engine, but from the
standpoint of web development, this fact is immaterial. Instead, from a development
standpoint, your XML is all around you, you retrieve it with document()
or
collection()
methods, you manipulate the XML directly within the XQuery, then
you return the resulting content (which may or may not be XML), remaining blissfully
ignorant of whether you are working JSP or ASP.NET.
For instance, we can create the traditional "Hello, World!" application as a web service, as shown in Listing 1 (hello_world.xq).
let $user := request:get-parameter("user","World") let $message := if ($user = "World") then <p>Greetings! Please enter your name: <input type="text" name="user" value=""/><input type="submit" value="Go!"/></p> else <p>Hi, {$user}! Welcome to the Hello, World Example!</p> let $page := <html> <head> <title>Hello, {$user}!</title> </head> <body> <h1>Hello, {$user}!</h1> <form method="get" action="hello_world.xq"> {$message} </form> </body> </html> return $page
Listing 1: Hello, World, rendered in eXist's XQuery language
The eXist engine runs this XQuery as a REST based service, invocable from the command line. For instance, the document above might be given as http://localhost:8080/exist/rest//db/home/services/xq/hello_world.xq.
where this particular file is actually stored within the database itself.
The (: :) characters serve as comment delimiters The let keyword indicates the declaration and definition of a variable, with assignment being made explicitly using the := notation (with the bare equal sign serving to act as a Boolean comparison operator). Where things get a little strange is in the notion of containment. A single XML node (with or without children) also carries with it its own sense of "blockness," so that expressions such as if/then/else require either static values (such as numbers or strings), single XML elements (with or without children) or sequences of nodes and values delimited by parentheses. Thus, in the $message declaration, both the then and the else clauses return single elements.
The bracket notation within elements and attributes serves the same purpose as bracket
notation within XSLT: it evaluates the XPath expressions and returns the results in
the
appropriate context, though unlike XSLT bracketed expressions can return elements
or
attributes, not just strings (meaning that you have to be more careful when writing
bracketed XQuery that you're not attempting to test a string vs. an element or attribute
inadvertently. This can be seen in the insertion of the $message
element within
the larger XHTML template.
XQuery makes use of what has become known as FLWOR (flower) notation, where the term is an acronym for the five primary keywords of XQuery notation: For, Let, Where, Order by and Return. Typically all XQuery statements have at a minimum at least one for or let expression, and then has a final return statement indicating what gets passed back out of the overall filter. Similarly, assignment statements can contain secondary for/let/if/then/else expressions, with the return keyword indicating the returned value or expression to be passed back to the assigned variable. Thus, in Listing 1, the line:
return $ page
at the very end of the XQuery returns the element defined in the variable $page. In an open query like the one above, this final return is used by eXist to pass the information to the servlet's output, in essence writing the buffers and sending the contents to the client.
I've deliberately held off discussing the first line of listing 1. The expression
let $user := request:get-parameter("user","World")
assigns to the variable user the results of the
get-parameter()
function in the request namespace. Put
another way, this looks at either the incoming query string (if the HTML form in question
used the GET method) or the post name/value data (if the method was POST) for the
parameter
user. If the parameter exists, then use it, otherwise, use
the parameter value "World".
This call is a staple of just about any server language. The ability to pull parameters from user input was one of the first reasons for building server-side scripting languages, but this is an eXist feature, not an XQuery one. However, the benefits of this particular feature should be obvious: if you can access information from the client, modify your outgoing streams (something that can be accomplished with the corresponding response: namespace) and maintain session and authentication information, then you have all of the functions necessary for a server language.
One of the more important features of XPath 2, upon which XQuery is based, comes from the realization that extensions are inevitable. There will always be things that fall beyond the immediate scope of the language but that are important to you as the developer. For this reason, XPath 2 (and hence XQuery) includes very clear conventions for defining additional functionality to the language...a fact which implies that other XML database vendors may very well want to look at this functionality and see whether it enhances their own products.
The eXist database defines a number of these namespaces out of the box. From the standpoint of servlet development, perhaps the most important namespaces are as follows:
-
request: provides access to information sent from the client. Functions include get-cookie-names, get-cookie-value, get-data, get-header, get-header-names, get-method, get-parameter, get-parameter-names, get-server-name, get-uploaded-file, get-uploaded-file-name, and get-url.
-
response: lets the developer control the stream of data being sent back to the client. Functions include redirect-to, set-cookie, set-header, and stream-binary.
-
session: provides control over the user's HTTP sesssion. Functions include create, encode-url, get-attribute, get-attribute-names, get-id, invalidate, set-attribute, and set-current-user.
-
transform: lets the developer transform an XML node using XSLT from within the xquery. Functions include transform and stream-transform.
-
update: The update commands (distinct from a namespace) let you perform live updates of the data in the eXist XML database, either at the granular level of changing a value in the database or at the level of inserting or removing whole documents. This addresses one of the big shortcomings of XQuery, in that it provides for an effective read-write solution that can be invoked from within an XQuery.
Other extensions can be compiled in by rebuilding the Java JAR (a shell or batch script automates this process) for doing such things as writing SQL queries and updates designed to work with any JDBC compliant SQL database, such as Oracle, mySQL, Postgres, or SQL Server. This capability is especially important because it provides a bridge between the SQL and XML worlds, letting you perform complex queries (or updates) on your SQL database then passing this information to the XQuery to be additionally processed, filtered, sorted, or transformed.
Additionally, other extensions give access to a full range of math functions (including the oh-so-useful math:random function), let you send mail through an SMTP server, retrieve (and to a certain extent modify) images (which can also be stored in the database, by the way), and other functions that provide functionality more associated with a full bore server-side scripting language than an XML query language.
This article serves as a very basic introduction to XQuery as a server language. I will be addressing this topic in more detail in subsequent articles in this series, examining some of the more sophisticated capabilities and the gotchas inherent in working with XQuery and eXist, and showing what explosive power you can release when you combine eXist or other rest based XQuery engines with XForms and Ajax.
My prediction is that REST based XML databases like eXist will seriously challenge the existing raft of server languages, from ASP to Ruby, within the next couple of years. Right now, it's something of a closed secret among a few developers, but the power, sophistication and ease of use inherent in working with the XML as if it were a natural part of the server landscape can only be understood by trying it.