Using XML and Relational Databases with Perl
December 13, 2000
Introduction
This month we're focusing on integrating XML with relational databases. There's not enough room in this column to give a comprehensive treatment of this topic, but we will explore a few of the options available for transferring data between XML documents and relational databases. I'll demonstrate how using Perl can make these types of transfers painless and straightforward. You may not want to try using XML to transfer terabytes of data over the wire, but having a basic understanding of how Perl can be used to help XML and databases work together is a useful skill to add to your bag of tricks.
All the examples from this article are available from the download link in the Resources box at the bottom.
DBI Refresher
If you've used Perl for a while it is likely that you have at least some experience with the popular DBI module and it's associated DBD::* drivers. For those who may not be familiar with how DBI works, consider the following fragment, which prints a list of names and email addresses from records stored in a local MySQL database called "mydb."
use DBI; my $dsn = "DBI:mysql:database=mydb;host=localhost"; my $dbh = DBI->connect($dsn, 'username', 'password', {RaiseError => 1}) || die $DBI::errstr; my $sth = $dbh->prepare(qq{ SELECT name, email FROM users_table }); $sth->execute || die "Error fetching data: $DBI::errstr"; while (my ($name, $email) = $sth->fetchrow_array) { print qq{ $name may be reached at: $email \n}; } $dbh->disconnect;
This code creates a connection to the database, prepares an SQL statement, executes
it,
prints the results as they are returned, and closes the connection. The only potentially
tricky bit from the Perl programmer's point of view is if an error occurs during the
initial
connection or execution of the statement, the resulting message is available through
$DBI::errstr
rather than Perl's global $!
variable, where you
might typically expect to find it. Note that I've selected MySQL for the examples
in this
article for no other reason than it is freely available for most of the platforms
that
support Perl (sorry, MacPerlers).
Using DBIx::XML_RDB
Let's begin our overview of XML-database integration with a quick look at Matt Sergeant's DBIx::XML_RDB module. It provides a simple interface for returning the results of an arbitrary SQL select statement as well-formed XML.
use strict; use DBIx::XML_RDB; my $dsn = "DBI:mysql:database=mydb;host=localhost"; my $dbx = DBIx::XML_RDB->new($dsn, 'mysql', $user, $pass); $dbx->DoSql(qq{ SELECT name, email FROM users_table }); open(XML, ">users.xml") or die "Could not open file for writing: $! \n"; print XML $dbx->GetData; close XML or die "Error closing file: $!\n";
After running this script, if we open the users.xml file we should find something like
<?xml version="1.0"?> <DBI driver="mysql:database=mydb;host=localhost"> <RESULTSET statement=" SELECT user_id, email FROM users_table"> <ROW> <user_id>adent</user_id> <email>arthur@dent.earth.co.uk</email> </ROW> <ROW> <user_id>froodydude</user_id> <email>zaphodb@freemail.galaxy.uni</email> </ROW> <ROW> <user_id>happybot</user_id> <email>marvin@droidlove.bot</email> </ROW> </RESULTSET> </DBI>
The DBIx-XML_RDB distribution ships with two command-line utilities (xml2sql.pl and sql2xml.pl) to facilitate transferring data back and forth between database tables and XML documents.
The script above works well for moving data from a database to an XML document, but what if we wanted to move data the other way? We could certainly use the xml2sql.pl script just mentioned, but given the simplicity of the data in question let's try a do-it-yourself approach.
use strict; use DBI; use XML::XPath; my $dbh = DBI->connect('dbi:mysql:mydb', 'username', 'password') || die $DBI::errstr; my $xp = XML::XPath->new(filename => 'dump.xml'); my $sth = $dbh->prepare(qq{INSERT INTO users VALUES (?, ?)}); # loop through the records foreach my $row ($xp->findnodes('/DBI/RESULTSET/ROW')) { # extract from the XML my $email = $row->find('email')->string_value; my $user = $row->find('user_id')->string_value; # insert into the db (using placeholders) $sth->execute($email, $user) || die $DBI::errstr; } $dbh->disconnect; print "Success!!! \n";
Let's take a short look at this script. First we open a connection to the database
and
create a new XML::XPath object (passing the name of our output file from the
previous example), then we call DBI's
prepare
method to build a statement handle. We then loop through the "ROW"
elements, extracting the string values for the "email" and "user_id" children and
inserting
that data into the database as we go by executing our prepared statement. Finally
we close
the database connection and print a cheerful message.
When Worlds Collide
The examples above rely heavily on the fact that the tables for the databases between which we are sharing information are identical, both in structure and naming convention. This is very rarely the case in the real world. More likely, you will be asked to glue together systems whose only common ground is the need to share information. We need, then, a way to move beyond these simple table dumps to a system that allows us to map our XML data to a variety of possible data models. Fortunately for us, Ron Bourret's XML-DBMS application was specifically designed to handle this task.
XML-RDBMS
The project page for XML-RDBMS describes it as
middleware for transferring data between XML documents and relational databases. It views the XML document as a tree of data-specific objects in which element types are generally viewed as classes and attributes and PCDATA as properties of those classes. It then uses an object-relational mapping to map these objects to the database. An XML-based mapping language is used to define the view and map it to the database.
Or, said another way, XML-DBMS provides an abstract view of an XML document, which allows us to map its contents to fields in a relational database, and provides a mechanism for transferring data, bidirectionally, between the database and the XML file.
Let's continue working with our list of users from the previous examples. Imagine that this data represents a list of people who have requested an email alert whenever the contents of our web site change. Now let's say that in time we develop a friendship with the folks that maintain a like-minded site in Germany, and we want to give users from each site a way to receive updates for changes to both sites (at their request, of course). Like us, our friends in Germany use a mailer application that accesses user records directly from the database, so it would be impractical for either party to alter its existing system simply to share users. We need a way, then, to share user information between sites without either party having to fundamentally alter their working systems.
We will start by having a look at the relevant database table descriptions for our two systems.
First theirs
+--------------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +--------------+-------------+------+-----+---------+-------+ | benutzername | varchar(80) | | | | | | email | varchar(80) | | PRI | | | +--------------+-------------+------+-----+---------+-------+
Then ours
+----------+-------------+------+-----+---------+-------+ | Field | Type | Null | Key | Default | Extra | +----------+-------------+------+-----+---------+-------+ | user_id | varchar(20) | | PRI | | | | email | varchar(80) | YES | | | | +----------+-------------+------+-----+---------+-------+
When comparing the two descriptions we see three basic differences. The first is a simple matter of naming convention, where each site's administrator has chosen to name the field containing the user's name according to their respective native languages. Next we see that out German partners have chosen to make the email field the table's primary key, while we have chosen the user_id column instead. Finally we note that the allowable length for the user_id/benutzername column varies significantly.
Using XML-DBMS makes dealing with the first two differences quite simple. Since XML-DBMS' mapping language allows us to declare abstract relationships between an XML element (or attribute) and a database column, the matter of divergent naming conventions is easily solved. We agree upon a common alias for the user_id/benutzname columns, using that alias as the XML element name in our document for the data that those columns hold, then map that element to the appropriate columns in our individual databases. The mapping language also makes trivial dealing with the fact that the primary keys are different. It allows us to declare a particular XML element as representative of the primary key column within our individual database tables without altering the document's structure. Let's imagine, then, that our German friends have graciously consented to use our existing English naming conventions for the XML document under the root element <users>. A sample document, then, would look like
<?xml version="1.0"?> <users> <user> <email>name@domain.tld</email> <user_id>id</user_id> </user> </users>
For this format to work for both parties we need only create the files that describe the relationships between these elements and the columns in our databases.
Creating Mappings
Before we dive into writing the actual map files I should point out that XML-DBMS' mapping language is quite rich, and our use of it in this example takes advantage of only a small subset of what that language has to offer. That said, I would encourage you to explore the XML-DBMS mapping language reference for a more complete overview.
Right, on with defining our map files. Not surprisingly, an XML-DBMS map file is, itself, an XML document. Let's name ours users.map.
After the standard XML declaration we will begin with the required <XMLToDBMS> root element, then tell the XML-DBMS engine to treat all empty strings as NULL values when transferring data between the our XML and the database.
<?xml version="1.0" ?> <XMLToDBMS Version="1.0"> <Options> <EmptyStringIsNull/> </Options>
Next we begin the actual mapping declarations that will allow us to mark up the data from our site into the desired XML format. Since the data we will be transferring is not tree-shaped, neither a root element nor an explicit wrapper element for each row of data exist. We must define them. Per our agreement with our friends, we will set the root element name to "users" and the row element to "user". We achieve this by using the mapping language's IgnoreRoot/PseudoRoot construct. And while we are at it, we will also declare that the key for our "user" element is contained in the "user_id" column by adding a CandidateKey child element.
<Maps> <IgnoreRoot> <ElementType Name="users"/> <PseudoRoot> <ElementType Name="user"/> <CandidateKey Generate="No"> <Column Name="user_id"/> </CandidateKey> </PseudoRoot> </IgnoreRoot>
Table-to-element (class) relationships are defined by the ClassMap element. Since our data is stored in a single table we will need only one.
<ClassMap> <ElementType Name="user"/> <ToClassTable> <Table Name="users"/> </ToClassTable>
Next we will map our columns to the elements contained in our XML document. These mappings are defined by the PropertyMap elements and their children.
<PropertyMap> <ElementType Name="user_id"/> <ToColumn> <Column Name="user_id"/> </ToColumn> </PropertyMap> <PropertyMap> <ElementType Name="email"/> <ToColumn> <Column Name="email"/> </ToColumn> </PropertyMap>
Finally we will end our mapping file by closing our open ClassMap, Maps, and root elements.
</ClassMap> </Maps> </XMLToDBMS>
We're not quite done with our maps just yet. We have created the mapping between our database and the XML document, now we need to create mapping between the XML document and our friends' database. Given that our data models are very similar the only differences will be in column name we select for the primary key and the column names that the XML elements are mapped to. We will name this file benutzer.map.
<?xml version="1.0" ?> <XMLToDBMS Version="1.0"> <Options/> <Maps> <IgnoreRoot> <ElementType Name="users"/> <PseudoRoot> <ElementType Name="user"/> <CandidateKey Generate="No"> <Column Name="benutzername"/> </CandidateKey> </PseudoRoot> </IgnoreRoot> <ClassMap> <ElementType Name="user"/> <ToClassTable> <Table Name="benutzer"/> </ToClassTable> <PropertyMap> <ElementType Name="user_id"/> <ToColumn> <Column Name="benutzername"/> </ToColumn> </PropertyMap> <PropertyMap> <ElementType Name="email"/> <ToColumn> <Column Name="email"/> </ToColumn> </PropertyMap> </ClassMap> </Maps> </XMLToDBMS>
Putting It To Work
With our map files defined, we have only to write the Perl scripts that transfer data
to
and from our databases and an XML document. XML-DBMS makes these tasks extremely
straightforward. Our first script, fetch_users.pl just connects to the database,
creates an instance of the XML::XMLtoDBMS object, sets the appropriate map file,
calls the retrieveDocument
method, then prints the resulting DOM tree to an XML
file.
use strict; use DBI; use XML::XMLtoDBMS; my $dbh = DBI->connect('dbi:mysql:mydb', 'username', 'password') || die $DBI::errstr; my $xmlTodbms = new XML::XMLtoDBMS($dbh); $xmlTodbms->setMap('users.map'); # select all of our users my $doc = $xmlTodbms->retrieveDocument('users'); open (XML, ">users.xml") or die "could not open file for writing: $! \n"; print XML qq{<?xml version="1.0"?>}; print XML $doc->toString; close XML or die "error closing file 'users.xml': $!"; # clean-up $doc->dispose; $xmlTodbms->destroy; $dbh->disconnect; print "Success!! \n";
Next, we will create the put_users.pl script. This will update our users record
with an XML instance created by fetch_users.pl. This script connects to the database,
creates an instance of the XML::XMLtoDBMS object, sets the appropriate map file,
then calls the storeDocument
method to insert our records into the database.
use strict; use DBI; use XML::XMLtoDBMS; my $dbh = DBI->connect('dbi:mysql:mydb', 'username', 'password') || die $DBI::errstr; my $xmlTodbms = new XML::XMLtoDBMS($dbh); # change the file name to fit the location. $xmlTodbms->setMap('benutzer.map'); my $doc = $xmlTodbms->storeDocument( Source => {File => "users.xml"} ); # clean-up $doc->dispose; $xmlTodbms->destroy; $dbh->disconnect; print "Success!! \n";
The Fine Print
If you've been reading closely you will notice that we never addressed the difference in length between our user_id column and our partner's benutzername column. There is little we can do about this discrepancy apart from altering our own table to deal with the larger size, since XML-DBMS has no built-in type-checking to warn us. There's also the sticky issue of duplicate records. If our XML document contains a user record that we've seen before, our script will die from a database error at the point of the duplicate record, ignoring the remaining records in that file. While this is merely inconvenient for the trivial system we've described here, it would be a recipe for disaster for a mission-critical system. The good news is that both of these weaknesses are being addressed for the 2.0 release XML-DBMS.
Summing Up
In this column, I've only scratched the surface of how you might use Perl to combine the power of relational databases with the portability of XML. I do hope, however, that the examples covered will give you enough confidence to begin experimenting on your own. And remember, new Perl XML modules are being released all the time, and many older ones are adding XML support, so be sure to check CPAN early and often.
Resources
- Download all the files from this article.
- DBIx-XML_RDB is available from The Comprehensive Perl Archive Network (CPAN)
- Information about XML-DBMS, including sources, documentation, and samples is available at the XML-DBMS project page.