An XQuery Update
September 10, 2003
The XQuery/XSLT working group released another set of Working Drafts on August 22, 2003. This article is my attempt to summarize the significant changes in the new drafts. Note that there is no new version of either the Data Model or Functions and Operators specifications, which were released to Last Call in May.
Some text and examples in this summary are quoted from the working drafts. The full drafts can be found a the W3C's XML Query home page.
Full Axis Feature
XQuery is mostly a superset of XPath 1, but one significant difference was in path
expressions; XQuery left out support for some of the less common axes:
ancestor
, ancestor-or-self
, following
,
following-sibling
, preceding
, preceding-sibling
,
and namespace
.
The newest draft still leaves out the namespace
axis. The argument against
namespace
is that it complicates the implementation of nodes, and it may be
difficult to avoid overhead even for XQuery code that does not use the
namespace
axis. This is a high cost for a rarely-used feature, so it is not
included in XQuery. It's deprecated and optional in XPath 2.0. The standard functions
fn:get-namespace-uri-for-prefix
and fn:get-in-scope-namespaces
provide alternatives to the namespace
axis.
The arguments against the other axes are not as strong. One argument is that they are redundant, since they can be expressed using other axes. For example,
following-sibling::NodeTest
is equivalent to
let $e := . in parent::node()/child::NodeTest[.<<$e]
In fact the XQuery/XPath formal semantics defines following-sibling
this way.
But these alternative formulations are both inconvenient for users and harder for
an
implementation to optimize, which suggests the these axes should be standard. On the
other
hand, the other axes may be very inefficient in some reasonable implementations, and
programmers may not understand this if the axes are standard. (See Issue 114.)
So the new draft makes ancestor
, ancestor-or-self
,
following
, following-sibling
, preceding
, and
preceding-sibling
optional. It ties them to the Full Axis Feature. An
implementation is free to implement the Full Axis Feature, in which case it must implement
all these extra axes.
Node Constructors
There are new computed constructors for processing instructions, comments, and spaces. In earlier drafts you could write an XML comment directly:
<!-- This is an XML comment.-->
This is convenient, but doesn't allow you to calculate the exact comment text at runtime: the comment is an atomic value. With a computed element constructor you can calculate the text using an expression:
let $r := "XQuery" return ( comment {"The next section relates to", $r}, element section { whatever() } )
An alternative approach would be to allow enclosed expressions in direct comment
constructors, as in <!--The next section relates to {$r}-->
. I'm not sure
why the committee didn't go this route. Perhaps no one suggested it. Perhaps they
felt it
was more consistent to have a "complete" set of computed constructors.
There are also new computed processing instruction constructors:
let $target := "audio-output", $content := "beep" return pi {$target} {$content}
This is equivalent to
<?audio-output beep?>
This example uses a computed namespace constructor:
let $nsURI := "http://example.org/metric-system", $attrname := "metric:unit", $attrvalue := "meter" return element {"altitude"} { namespace metric {$nsURI}, attribute {$attrname} {$attrvalue}, "10000" }
This is equivalent to
<altitude xmlns:metric = "http://example.org/metric-system" metric:unit = "meter">10000</altitude>
The new section 2.7.4 Namespace Nodes on Constructed Elements describes how
namespace nodes are created for the result of element constructors. But note that
since
there is no namespace
axis in XQuery, there is no way you can actually observe
an element node. So what this section really specifies is what namespace prefixes
may be
used when elements are written in text form (serialized) and in the fn:name
function.
The base URI of a constructed element node, as well as copied descendant nodes, are taken from the static context, even if the original nodes have some other base URI.
Query Prolog and Modules
Each declaration in the module prolog must now be followed by a semicolon. Default
namespace declarations now require the keyword declare
, in addition to
default
, to be consistent with other declarations. Thus, you can write
declare default element namespace "http://example.org/names";
rather than
default element namespace "http://example.org/names" (: OLD :)
Similarly, define function
has become declare function
and
define variable
becomes declare variable
. Also a
validation
declaration must start with declare
, and
default collation = "namespace"
becomes declare default
collation "namespace"
. An xmlspace
declaration no longer
includes the =
token:
declare xmlspace preserve;
There is a new Base URI declaration, which is used when resolving relative URIs in the module:
declare base-uri "http://example.org";
The standard fn:doc
resolves a relative URI using the base URI of the calling
module. This means that the function call fn:doc($uri)
isn't a function call in
the C programmer's sense, but it more like a "macro invocation" since it it depends
on the
current module's static base-uri. In other words, it really means
fn:doc(fn:resolve-uri($uri, "http://example.org"))
There is a new pre-defined namespace local
bound to
http://www.w3.org/2003/08/xquery-local-functions
.
One major change is that the qname of the function being defined in a function definition
must have an explicit namespace prefix. You can use the predefined
local
prefix, but only in main modules.
The syntax of the module declaration at the start of a library module has changed from
module "http://example.org/math-functions"
to
module math = "http://example.org/math-functions";
Both variables and functions declared in a library module must be explicitly qualified by the target namespace prefix of the module. So following the above declaration, you could write
declare function math:acos ($x as xs:double) as xs:double external; declare variable $math:PI as xs:double := math:acos(-1);
data:image/s3,"s3://crabby-images/901f1/901f1b470f8ec0067808c5d3d9b761cf9d4b8dfb" alt="O'Reilly Emerging Technology Conference."
Errors and error codes
For each error that an implementation is required to detect, there is now a numeric error code. For example,
err:XP0020
It is a type error if in an axis expression, the context item
is not a node.
These are listed in Appendix F. It is not clear how these are supposed to be used. (See Issue 340.)
The May draft said:
If an implementation can determine by static analysis that an expression will necessarily raise a dynamic error...the implementation is allowed to report this error during the analysis phase
The August draft restricts this to the case of constant folding:
If any expression (at any level) can be evaluated during the analysis phase (because all its explicit operands are known and it has no dependencies on the dynamic context), then any error in performing this evaluation may be reported as a static error.
Presentation changes
Some of the changes don't change the XQuery language itself, but clarify or improve the documents. Many terms now have explicitly defined, and summarized in a new Glossary section. For example, section 2.5 now defines the terms "static error", "dynamic error", "type error", and "error value".
Section 2 has been reorganized and a new subsection "Processing Model" has been introduced. This is useful reading for understanding how XQuery works, though it uses a lot of terms and concepts. A new Appendix "Context Components" summarizes how the static and dynamic context are initialized.
Other smaller changes
A /
or //
at the start of a path expression sets the context to
the root of the original context node. In the new draft, there is a cast (using
treat
) to force the root to be a document node; otherwise, an error is
raised: for example, if the context node is a standalone element node.
In the treat
expression X treat as T
, there is no
longer a requirement that the static type of X
be "derived by
restriction" from T
.
The input()
function has been deleted. I assume the reason is that a variable
declaration with an external
value provides similar functionality, without the
extra concept of an implementation-defined input sequence.
There are also various minor changes to the grammar or the formal semantics. For example,
the context item expression ".
" is now classified as a Primary Expression
rather than as an Abbreviated Forward Step.