BumbleBee, the XQuery Test Harness
March 10, 2004
Will XQuery be the key that unlocks a new generation of data and content? My money, and the vendor money, says yes. Nearly every vendor, from the well-known old guard (IBM, Oracle, and Microsoft) to the plucky upstarts (Cerisent, X-Hive, and Qizx) has expressed their support for XQuery and are actively collaborating in its standardization. Under development by the W3C and in Last Call, XQuery looks poised to become the standard query language by which companies access and manipulate semi-structured data and merge together disparate data and content repositories.
If you're unfamiliar with XQuery, see the Resources section below for some introductory articles to get you started.
Using XQuery, however, can be quite frustrating, as you're faced with choosing from a variety of XQuery vendors that support different versions and interpretations of the XQuery specification. Then once you've selected the XQuery engine that's best for you, it can be hard to know if the queries you write today will produce reliable results tomorrow after you upgrade your engine or make changes to your queries.
The BumbleBee XQuery test harness (available at XQuery.com) addresses these frustrations and takes the pain and uncertainty out of learning and using XQuery. Named because it buzzes around FLWORs, BumbleBee provides a cross-platform, vendor-neutral automated testing environment for XQuery development. In other words, BumbleBee is to XQuery what JUnit is to Java. Write your query, define the expected result, and let the tool do the rest. With BumbleBee you can automate regression testing, quantify vendor compatibility, and solidify your understanding of XQuery by running the same query across numerous vendors with the push of a button.
I'm one of the folks behind BumbleBee, and probably its biggest fan since it's helped me so much with my own XQuery authoring.
Automated Testing, Vendor Selection, and Test-First Learning
Before we discuss the details of BumbleBee, let's put its role in context. Regular testing, especially automated testing, has proven essential to producing high-quality software. One can, of course, write XQuery code without testing, but that carries extra risk. Unlike GUI programs where buggy code usually produces obvious errors, a miswritten query can execute fine and produce results that appear correct, while in reality the results may be incomplete or slightly erroneous.
Using BumbleBee as your testing tool, you can craft automated tests to run your queries against fixed data sets and verify the results of the queries, so you can trust the results of production queries run against less controlled input. Then when you change your query or upgrade your XQuery engine, the automated tests let you perform a quick verification before deployment. What if a query bug escapes initial testing? Add a new BumbleBee test case to the test suite ensuring the bug never reappears.
To take an example from my own life, I teach a weeklong XQuery course. I've found it valuable to load all the course examples and homework into a BumbleBee test suite. That way, before teaching the course, I can test everything against the specific XQuery engine we'll be using in the course. Any bugs pop out immediately with little effort. No longer must students act as guinea pigs.
Testing can also help the process of selecting an XQuery vendor. BumbleBee includes a set of several hundred tests based on the W3C Use Cases and NIST conformance test suite. These tests can exercise an XQuery engine and quantify its conformance. You can also extend these tests with your own conformance tests to more exhaustively cover the specific areas of importance to you. Contributions from users are welcome into BumbleBee, so if you do this please consider turning them in for the use of others.
Some have also found BumbleBee useful in learning XQuery, employing it to support "test-first learning." In this mode of learning, you present yourself a challenge: "I want to take this input and produce this output." Then you work to code the query that produces the desired result. When you succeed, the test passes. Over time your suite of learning tests becomes a knowledge base you can draw from later when writing production code. As new XQuery specification versions come out, you can use the tests and test new failures to see if anything you learned is antiquated. This style of learning can also be useful by a teacher or course instructor to hand out and grade homework. Student grades are almost printed for you. You just judge code for style and take the afternoon off. If you're looking for new challenges, the XQuery.com Wiki has a Challenges area.
Using BumbleBee
Now that we understand the advantages of XQuery automated testing, let's put BumbleBee
to
work buzzing around some tests. BumbleBee executes from a command-line script
(bumblebee.bat
or bumblebee.sh
depending on platform). By
default, if you don't specify any command-line tests, BumbleBee runs a user configurable
default suite of tests. As each test is run, its name and pass or fail status is printed
to
the console. For example:
Passed -> Test (Vendor 1): Test 1 in 0.015 sec Failed! -> Test (Vendor 1): Test 2 Expected attribute value '1992' but was '1994' - comparing <book year="1992"...> at /BumbleBee_Result[1]/bib[1]/book[1]/@year to <book year="1994"...> at /BumbleBee_Result[1]/bib[1]/book[1]/@year
When all the tests have run to completion, you see a summary of test results printed to the console for each XQuery engine that was tested. For example:
Time: 39.598 seconds FAILURES!!! Vendor 1: Tests Run: 72, Failures: 22, Disabled: 0 (69.4% passed) Vendor 2: Tests Run: 72, Failures: 1, Disabled: 0 (98.6% passed) Total : Tests Run: 144, Failures: 23, Disabled: 0 (84% passed) (See log/bumblebee.log for failure details.)
To run BumbleBee against a specific directory containing BumbleBee test files, you just list the directories on the command line:
bumblebee directory1 [directory2 [...]]
For example, to run the November 2003 Use Case tests distributed with BumbleBee, type:
bumblebee tests/2003-11/usecases
Depending on your server's compliance level, you may want to use the August 2003 or May 2003 tests instead.
After you run BumbleBee, the log/bumblebee.log
file contains a comprehensive
report of all tests run. For each failed test, the test report includes the XQuery
expression that was run, the actual query result returned by the XQuery engine under
test,
the query result that was expected by the test, and the failure message. The console
output
is always fairly short; the log output is always comprehensive.
Writing BumbleBee Tests
To learn how to write a BumbleBee test, let's write a query against the following
XML file,
named tunes.xml
, representing a collection of songs:
<Tunes> <Tracks> <Track> <Name>Ready, Steady, Go</Name> <Artist>Paul Oakenfold</Artist> <Album>Bunkka</Album> <Genre>Electronic</Genre> <MyRating>10</MyRating> <Time>254</Time> </Track> <Track> <Name>Battle</Name> <Artist>Hans Zimmer and Lisa Gerrard</Artist> <Album>Gladiator Soundtrack</Album> <Genre>Instrumental</Genre> <MyRating>8</MyRating> <Time>193</Time> </Track> <Track> <Name>Orange Wedge</Name> <Artist>The Chemical Brothers</Artist> <Album>Surrender</Album> <Genre>Electronic</Genre> <MyRating>7</MyRating> <Time>254</Time> </Track> </Tracks> </Tunes>
We place this XML file in the tests/2003-05/examples
directory assuming our
chosen vendor supports the May 2003 draft. It's good practice to organize tests by
XQuery
specification version.
We want the query to generate a new XML document representing a play list of our favorite
songs sorted by song name. Using any text editor, we create the following BumbleBee
test
file named MyFirstTest.bee
in the tests/2003-05/examples
directory:
!name My First Test # What is my favorite music? !load tests/2003-05/examples/tunes.xml !query <Playlist> { for $t in doc("tests/2003-05/examples/tunes.xml")//Track where $t/Genre = "Electronic" and $t/MyRating > 5 order by $t/Name return <Track> { $t/Name, $t/Artist, $t/Genre, $t/MyRating } </Track> } </Playlist> !end !result <Playlist> <Track> <Name>Orange Wedge</Name> <Artist>The Chemical Brothers</Artist> <Genre>Electronic</Genre> <MyRating>7</MyRating> </Track> <Track> <Name>Ready, Steady, Go</Name> <Artist>Paul Oakenfold</Artist> <Genre>Electronic</Genre> <MyRating>10</MyRating> </Track> </Playlist> !end
Test directives begin with an exclamation (!
) symbol. In this example all the
test directives have been highlighted in a bold font. Comments begin with a hash
(#
).
The first test directive is the !name
directive. This directive specifies an
arbitrary name for the test. You'll see the test name used in the console and file
output to
uniquely identify the test. The second test directive is !load
. This requests
that the server load the file at the given path under the same URI as the path. On
some
engines that drive off the file system directly, this isn't technically necessary,
but it's
always good to have. The path should be relative to the directory from which we run
BumbleBee.
Next we find the !query
directive. This directive contains the text of the
XQuery expression to be run. The !query
directive must end with a single line
containing the !end
directive. The last test directive is the
!result
directive. This directive contains the text of the result we expect
to be produced by the XQuery expression specified in the !query
directive. The
!result
directive must end with a single line containing the
!end
directive. If desired, you can add additional test cases within the same
.bee
file; just start the additional tests with a new !name
directive.
As an aside, some may wonder why the hierarchical .bee
files aren't written
using XML. Turns out it's especially difficult for humans to author tests in XML --
not just
because XML is more verbose, but because the query and result contents themselves
use XML.
Thus the tests have to be escaped or placed in CDATA sections, and neither of those
solutions is helpful for test authoring. What's more, when the query or result use
CDATA
sections themselves, then it gets extremely difficult to decipher what's what.
After placing the MyFirstTest.bee
file in the
tests/2003-05/examples
directory, you can run the test individually by
typing:
bumblebee tests/2003-05/examples/MyFirstTest.bee
Alternatively, if there were multiple test files in the examples directory, you could run them all in one fell swoop using:
bumblebee tests/2003-05/examples
On running the test, you'll see the following console output:
BumbleBee: The XQuery Test Harness Test script: bumblebee/tests/2003-05/examples/MyFirstTest.bee Passed -> Test (Qizx): My First Test in 2.902 sec Test script: bumblebee/tests/2003-05/examples/MyFirstTest.bee Passed -> Test (Saxon): My First Test in 1.285 sec Time: 4.394 seconds OK! Qizx : Tests Run: 1, Failures: 0, Disabled: 0 (100% passed) Saxon: Tests Run: 1, Failures: 0, Disabled: 0 (100% passed) Total: Tests Run: 2, Failures: 0, Disabled: 0 (100% passed)
Notice that in this example BumbleBee ran the test against two XQuery engines: Qizx and Saxon. The test passed in both cases. That is, as a result of running our test through BumbleBee we know that our XQuery expression produces the expected result when run against either of these engines. The XQuery engines used by default are selectable in the external configuration of BumbleBee.
Negative and Compound Tests
Our first BumbleBee test was a positive test. It passed only when the XQuery engine under test produced the expected result and not an error condition. Sometimes you want to test that an XQuery engine produces an error when an error is appropriate. The following example ensures the engine reports a divide by zero error:
!name A Negative Test !query 3 idiv 0 = 1 !end !result ERROR !end
The !result
directive uses a special ERROR keyword to indicate that any error
reported by the XQuery engine is another permissible result. Any non-error condition
is a
failure.
BumbleBee also allows you to specify multiple results for a single query. Why would you want this? Because sometimes two answers are possible. For example, in XQuery an expression can be evaluated in any order and some orderings may short circuit to success while others may legitimately return errors. The following BumbleBee test demonstrates.
!name A Compound Test !query 1 eq 2 and 3 idiv 0 = 1 !end !result false !end !result ERROR !end
Notice the use of two !result
directives. The first !result
directive indicates that false may be returned (if the expression is evaluated left
to right
and it short circuits). The second !result
directive allows an error as another
legal possibility (if the expression is evaluated right to left). If either false
or an
error condition is returned by the XQuery engine, then the test will pass. An arbitrary
number of possible results can be declared in any BumbleBee test using this format.
XQuery Vendor Options
BumbleBee supports any XQuery engine accessible from Java. Currently that list stands at seven, listed in alphabetical order: Cerisent, Ipedo, IPSI-XQ, Qexo, Qizx/open, Saxon, and X-Hive. BumbleBee has adapter code for each of these vendors that maps from a standard interface to each vendor-specific implementation.
When you run a BumbleBee test, or suite of tests, the tests run against all enabled
XQuery
engines. If you run two tests with three XQuery engines enabled, then you will see
six test
results, a pair of results for each XQuery engine. You can control which engines BumbleBee
uses by editing a list in the bumblebee.properties
external configuration
file.
Power users can enable and disable tests for specific engines using the !enable
enginename
and !disable enginename
directives. These
directives let you author a vendor-specific test that won't be run by other vendors,
or in
skipping tests on a particular server where that server doesn't support the optional
XQuery
feature under test.
Conclusion
BumbleBee provides a powerful, portable, vendor-neutral automated test environment for XQuery. With BumbleBee you can automate your regression testing, compare multiple XQuery engines, and learn the language through structured challenges.
The latest BumbleBee release, version 1.2, includes support for seven vendors and
numerous
specification draft releases. The easy-to-write .bee
file format allows for
quick development of tests, including negative tests and compound tests.
Future BumbleBee versions may include a graphical query execution environment and test authoring tool. If this sparks your interest, write to "buzz" at xquery.com so it'll be sure to get done. If it doesn't spark your interest, write in anyway with what would.
Getting BumbleBee
A free evaluation download of BumbleBee can be found at http://xquery.com/bumblebee. Non-expiring licenses are available for commercial use by emailing with your usage requirements. Licenses are also available free of charge for developers of open source XQuery implementations and qualified non-profit and educational use. Email "buzz" to discuss qualifying for such a license.