An Introduction to the XML:DB API
January 9, 2002
In my last article, Introduction to dbXML, I provided an example that used the XML:DB API to access the dbXML server. This time around we'll take a more detailed look at the XML:DB API in order to get a better feel for what the API is about and how it can help you build applications for native XML databases (NXD).
The Proliferation of Native XML Databases
Currently, there are about 20 different native XML databases on the market. Among them are commercial products such as Tamino, X-Hive and Excelon. And open source NXDs include dbXML (now renamed Apache Xindice), eXist, and Ozone/XML. While this selection is a nice thing to see in an emerging market, it makes developing applications quite a bit more difficult. Each NXD defines its own API which prevents the development of software that will work with more then one NXD without coding for each specific server. If you've worked with relational databases, then you've likely worked with ODBC or JDBC to abstract away from proprietary relational database APIs. The goal of the XML:DB API is to bring similar functionality to native XML databases.
Status
The XML:DB API project was started a little over a year ago by the XML:DB Initiative and is currently still evolving. Most of the core framework is stable, and it has already been implemented by dbXML/Xindice and eXist. There's also a reference implementation in Java available, and there are several other implementations in progress, including some for commercial databases. The latest information on implementations can be found on the XML:DB API project site.
Basic Concepts
While the XML:DB API is simple to use once you're familiar with it, there is some introductory terminology and concepts that we need to discuss first.
Drivers
Each database that supports the XML:DB API must provide a database specific driver
that
encapsulates all the database access logic. Drivers are implementations of the
Database
interface and are managed by the DatabaseManager
. If
you're familiar with JDBC, ODBC, or SAX then the driver concept should also be familiar
and
it doesn't differ much in the XML:DB API.
Collections
In native XML databases collections are the containers in which XML documents are
stored.
Compared to a relational database, a collection is roughly equivalent to a table.
The XML:DB
API makes extensive use of the collection concept and assumes that any implementing
database
has at least one collection where documents are stored. Collections are represented
in the
API by the Collection
interface.
Services
The XML:DB API is designed to be very flexible and extensible. This capability is
achieved
through services. In fact, you can't do a whole lot of useful work with the API without
services. The most widely used example of a service is the XPathQueryService
.
As its name implies, this service enables execution of XPath queries against the database.
The API specification defines several services including XPathQueryService
,
XUpdateQueryService
, and CollectionManagementService
. In the
future services will be added for W3C XQuery and other specifications as needed. The
service
mechanism is also open for vendors to add custom services, as long as it is clear
that those
services are not portable.
Resource Abstraction
Since there are several common ways of working with XML data, the XML:DB API defines
an
abstraction for the content stored in the database. This abstraction is encapsulated
in the
generic Resource
interface. By specializing Resource
it's possible
to support other types of data beyond XML, for example, binary data. For XML the
XMLResource
specialization is provided and allows you to easily access and
update the underlying XML data as either textual XML, a W3C DOM, or a SAX event stream.
API Core Levels
Because the XML:DB API is designed to be modular, it's necessary to group features
together to form baselines for application developers to work against. These baselines
are
called core levels, and there are currently two defined in the API specification.
Core Level
0 is the base API that all drivers must implement. It includes the basic interfaces
for
collections, resources, and services. Core Level 1 extends Core Level 0 to include
the
XPathQueryService
. The idea is that applications will require a particular
Core Level of driver support. Then, if an application wants to use additional API
elements,
it should test for, before using, them.
Putting it all Together
Let's examine an example program that illustrates how everything fits together. Since this is an introductory article we need to keep things very simple. Our example won't do anything useful, but it will put all the concepts to work. If you read my article on dbXML, this program is a more generic version of the example included there. You should refer to that article for more information on setting up a dbXML repository to work with this program.
import org.xmldb.api.base.*; import org.xmldb.api.modules.*; import org.xmldb.api.*; public class Query { public static void main(String[] args) throws Exception { Collection col = null; try { String driver = null; String prefix = null; if ( ( args.length == 1 ) && args[0].equals("dbxml") ) { driver = "org.dbxml.client.xmldb.DatabaseImpl"; prefix = "xmldb:dbxml:///db/"; } else { driver = "org.xmldb.api.reference.DatabaseImpl"; prefix = "xmldb:ref:///"; } Class c = Class.forName(driver); Database database = (Database) c.newInstance(); if ( ! database.getConformanceLevel().equals("1") ) { System.out.println("This program requires a Core Level 1 XML:DB " + "API driver"); System.exit(1); } DatabaseManager.registerDatabase(database); col = DatabaseManager.getCollection(prefix + "addresses"); String xpath = "/address[@id = 1]"; XPathQueryService service = (XPathQueryService) col.getService("XPathQueryService", "1.0"); ResourceSet resultSet = service.query(xpath); ResourceIterator results = resultSet.getIterator(); while (results.hasMoreResources()) { Resource res = results.nextResource(); System.out.println((String) res.getContent()); } } catch (XMLDBException e) { System.err.println("XML:DB Exception occurred " + e.errorCode + " " + e.getMessage()); } finally { if (col != null) { col.close(); } } } }
Let's look at the various parts and pieces. For this particular program we've hard-coded driver configurations for dbXML and the XML:DB API reference implementation. Two pieces of information are required here, the name of the driver implementation class and the driver specific URI prefix. In a real program you would probably want to read these values from a configuration file. In this section of code we can also see a check to insure that the driver supports Core Level 1. The way the check is coded will work with current drivers, but this is an area where there are likely to be changes to the API in the future.
String driver = null; String prefix = null; if ( ( args.length == 1 ) && args[0].equals("dbxml") ) { driver = "org.dbxml.client.xmldb.DatabaseImpl"; prefix = "xmldb:dbxml:///db/"; } else { driver = "org.xmldb.api.reference.DatabaseImpl"; prefix = "xmldb:ref:///"; } Class c = Class.forName(driver); Database database = (Database) c.newInstance(); if ( ! database.getConformanceLevel().equals("1") ) { System.out.println("This program requires a Core Level 1 XML:DB " + "API driver"); System.exit(1); } DatabaseManager.registerDatabase(database);
Now that the driver is configured for our chosen database, we need to request a
Collection
instance. The parameter to getCollection()
consists
of the fully qualified URI as defined by the specific driver. In this particular case
the
collection we want is named addresses and is simply appended to the driver specific
prefix
we defined earlier.
col = DatabaseManager.getCollection(prefix + "addresses");
Now we want to run a query against the retrieved collection so we need to get an
XPathQueryService
implementation from that collection. Other API services are
accessed in a similar manner.
XPathQueryService service = (XPathQueryService) col.getService("XPathQueryService", "1.0");
Each service defines a custom interface for the operations that it performs. In the
case
of the XPathQueryService
it defines a query method that returns a
ResourceSet
containing the query results.
ResourceSet resultSet = service.query(xpath);
We just want to print out the results so we iterate through the result set using
a
ResourceIterator. Each query result is encapsulated as an XMLResource and the simplest
way
to print the XML content is to just retrieve it as text. For this we call
getContent()
. However, if we wanted to get the XML as a DOM tree we could
call getContentAsDOM()
, or we could setup a SAX ContentHandler
implementation and call getContentAsSAX()
to retrieve it as a SAX event stream.
This is one of the nicest features of the API.
ResourceIterator results = resultSet.getIterator(); while (results.hasMoreResources()) { Resource res = results.nextResource(); System.out.println((String) res.getContent()); }
What's left is housekeeping to make sure we close our Collection
instances.
For some drivers this is critical, while for others it doesn't matter. It's always
good
practice to insure any resources being used by the server are released.
if (col != null) { col.close(); }
Learning More
Resources |
Obviously there is much more to the XML:DB API than what's illustrated in this simple example and short article. But I have given you a better idea of what the API is and how it is used. If you want to find out more you should take a look at the XML:DB API site and the dbXML developers guide. The eXist documentation also contains some information about developing with the API.
While there is still a lot of work to do on the XML:DB API, what is available today is already usable and provides a solid framework to build on. In fact, projects like Apache Xindice are using the XML:DB API as the primary Java API for accessing the server. Participating in API development is open to anyone who's interested; feel free to join the project mailing list and contribute to the development of the XML:DB API.