Introduction to dbXML
November 28, 2001
In my recent XML.com article, Introduction to Native XML Databases, I provided a general overview of native XML databases and what they might be good for. In this article we'll take a look at a native XML database implementation, the open source dbXML Core.
What it Offers
The dbXML Core has been under development for a little more than a year. The current version is 1.0 beta 4 with a 1.0 final release expected to appear shortly. Full source code is available from the dbXML site.
Most of the basic native XML database features are covered, including:
- storage of collections of XML documents,
- multi-threaded database engine optimized for XML data,
- schema independent semi-structured data store,
- pre-parsed compressed document storage,
- XPath query engine,
- collection indexes to improve query performance,
- XML:DB XUpdate implementation for updates,
- XML:DB Java API implementation for building applications, and
- complete command line management tools.
Proper transaction support is the major missing feature right now; it will appear in the 1.5 release.
In order to get the most from this article you might want to download dbXML and follow the installation instructions (UNIX or Windows) to get it running.
The Basic Model
The main idea behind dbXML is to provide a simple way to store and manage large numbers of XML documents. This is accomplished by storing documents in collections, where each individual document is stored in a compressed pre-parsed form. This significantly enhances the speed attainable when working with XML data. The dbXML engine is optimized for smaller sized XML documents of up to about 50K in size. The server can store larger documents, but it isn't an ideal scenario.
Storing documents in collections provides an easy mechanism for querying and manipulating the documents as a set. If you wanted to draw a parallel to a relational database, you could consider a collection roughly equivalent to a table and each document in a collection equivalent to a row in that table. One major difference beside the obvious use of XML, is that in dbXML the schema of what can be stored in a collection is not constrained. This means you have tremendous flexibility on what document types can be stored in a collection. If you want, you can even mix documents of completely different schemas in the same collection. There probably isn't much benefit in doing that, but there is benefit in being able to store and query documents that are similar, but not exactly the same in structure. For instance, a product catalog where each different product type needs specialized data. In this case all products will have some common data and also some specialized data. In dbXML you could store all products together and then query the common data as a set or restrict your query to a particular product type and query on the product specific data.
Working from the Command Line
The dbXML server comes with a nice set of command line tools that allow you to perform all the basic administration functions that you would expect. To get a feel for how these work, let's look at a few examples. I won't explain the details of these commands, but it's likely to be clear what they are doing. A more detailed explanation of their usage can be found in the dbXML users guide.
For all the commands we'll assume myaddress.xml contains this simple document.
<address id="1"> <name> <first>John</first> <last>Smith</last> </name> </address>
Using the command line tools we can --
create a collection:
dbxmladmin add_collection -c /db -n addresses
add a document:
dbxmladmin add_document -c /db/addresses -n myaddress -f myaddress.xml
retrieve a document:
dbxmladmin retrieve_document -d /db/addresses -n myaddress
create an index on the id attribute:
dbxmladmin add_indexer -c /db/addresses -n id_idx -p @id
run an XPath query:
dbxmladmin xpath -c /db/addresses -q /address[@id = 1]
The basic pattern is to run the dbxmladmin command, tell it what operation you want, what collection context it should be executed in (-c switch), and any operation specific arguments. The gory details on all the possible operations are available in the dbXML command line tools reference.
Developing Applications
While having a nice set of administration tools is important, the real value of dbXML comes when developing custom applications. For this we use the Java XML:DB API. This API is intended to enable the development of portable XML database applications and can be considered the equivalent of JDBC or ODBC for relational databases. The XML:DB API is fairly simple to use and gives you a fair amount of flexibility when developing applications. To get a flavor of what the API offers let's take a look at a simple program that works with the collection we created earlier.
import org.xmldb.api.base.*; import org.xmldb.api.modules.*; import org.xmldb.api.*; public class Example1 { public static void main(String[] args) throws Exception { Collection col = null; try { String driver = "org.dbxml.client.xmldb.DatabaseImpl"; Class c = Class.forName(driver); Database database = (Database) c.newInstance(); DatabaseManager.registerDatabase(database); col = DatabaseManager.getCollection("xmldb:dbxml:///db/addresses"); String xpath = "/address[@id = 1]"; XPathQueryService service = (XPathQueryService) col.getService("XPathQueryService", "1.0"); ResourceSet resultSet = service.query(xpath); ResourceIterator results = resultSet.getIterator(); while (results.hasMoreResources()) { Resource res = results.nextResource(); System.out.println((String) res.getContent()); } } catch (XMLDBException e) { System.err.println("XML:DB Exception occurred " + e.errorCode + " " + e.getMessage()); } finally { if (col != null) { col.close(); } } } }
This program simply creates a connection to the dbXML server, performs a basic XPath
query, and then prints the results. In dbXML, queries can be executed against a collection
of documents or a single document. In this case we're querying the entire collection.
To
query a single document you change the query()
method to
queryResource()
and provide the dbXML id of the document to query.
If you're storing large numbers of documents it might be useful to index your collection to improve query response. In this case we indexed the id attribute in an earlier example, so the XPath engine should be quite speedy. Of course you'd need more then one document in your collection for this to be of any real value.
Another noteworthy feature of dbXML is support for XML:DB XUpdate. XUpdate provides a simple update language for XML documents. This allows you to declaratively specify what changes should be made without worrying about the details of how the database makes the changes. For instance using our sample document, if we wanted to change John's last name to "Herman" we could use this XUpdate document.
<xupdate:modifications version="1.0" xmlns:xupdate="http://www.xmldb.org/xupdate"> <xupdate:update select="/address[@id = 1]/name/last">Herman</xupdate:update> </xupdate:modifications>
This makes it easy to make changes to XML documents and in the case of dbXML you can use XUpdate to make changes to entire collections of XML documents. You can execute XUpdate modifications through the XML:DB API or using the dbXML command line tools. More XUpdate examples are available in the XUpdate use cases document.
If you're interested in developing applications for dbXML, much more detail is available in the dbXML developers guide.
Wrapping Up
|
• The dbXML
Project |
In this article I've just skimmed the surface of dbXML functionality. I encourage you to find out whether it could be something useful to you. Like all native XML databases dbXML is just a tool. It will be right for some jobs and completely wrong for others, and like all tools the best way to find out if it works is to try it.
This is an exciting time for dbXML; it's on the verge of an initial production release and will soon be receiving a new name and a new home. Development of dbXML is coming under the stewardship of the Apache Software Foundation XML sub-project, and dbXML will be renamed Xindice in the process. The project has come a long way, and now is the best time to get involved to help shape the future of open source native XML database technology.