Storing XML in Relational Databases

June 20, 2001

Introduction

Table of Contents

Oracle XML-SQL Utility (XSU)

IBM DB2 XML Extender

Microsoft SQL Server 2000

Sybase Adaptive Server

Vendor Comparison

An Alternative Strategy

Mapping DTDs to Databases

Using XML and Relational Databases with Perl

RAX: An XML Database API

There are various ways to solve the problem of effective, automatic conversion of XML data into and out of relational databases. Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist in converting XML documents into relational tables. The various solutions are as follows.

Oracle XML SQL Utility models XML document elements as a collection of nested tables. Enclosed elements are modeled by employing the Oracle Object datatype. The "SQL-to-XML" conversion constructs an XML document by using a one-to-one association between a table, referenced by Object datatype, and a nested element. "XML-to-SQL" might require either data model amending (converting it from relational into object-relational) or restructuring the original XML document.
IBM DB2 XML Extender allows storing XML documents either as BLOB-like objects or as decomposed into a set of tables. The latter transformation, known as XML collection, is defined in XML 1.0 syntax.
Microsoft approaches the problem by extending SQL-92 and by introducing OPENXML row set.
Sybase Adaptive Server introduces the ResultSetXml Java class as a base for processing XML documents in both directions.

In this article, we will explore in detail these vendor's solutions. Thereafter, we will attempt to answer the questions:

Can we restructure the and simplify the problem?
What is the right approach in a heterogeneous database environment?

I'll use the following vocabulary as an example.


<!-- Primitive Types -->



<!ELEMENT CURRENCY1   (#PCDATA)>

<!ATTLIST CURRENCY1    e-dtype    NMTOKEN   #FIXED  "string"

                        e-dsize    NMTOKEN   #FIXED  "3">



<!ELEMENT CURRENCY2   (#PCDATA)>

<!ATTLIST CURRENCY2    e-dtype    NMTOKEN   #FIXED  "string"

                        e-dsize    NMTOKEN   #FIXED  "3">



<!ELEMENT AMOUNT      (#PCDATA)>

<!ATTLIST AMOUNT       e-dtype    NMTOKEN   #FIXED  "decimal">



<!ELEMENT SETTLEMENT  (#PCDATA)>

<!ATTLIST SETTLEMENT   e-dtype    NMTOKEN   #FIXED  "date">



<!ELEMENT BANKCODE    (#PCDATA)>

<!ATTLIST BANKCODE     e-dtype    NMTOKEN   #FIXED  "string">



<!ELEMENT BANKACCT    (#PCDATA)>

<!ATTLIST BANKACCT     e-dtype    NMTOKEN   #FIXED  "string">



<!-- Derived Types -->



<!ELEMENT ACCOUNT  (BANKCODE, BANKACCT)>



<!ELEMENT FXTRADE  (CURRENCY1, CURRENCY2, AMOUNT, SETTLEMENT, ACCOUNT)>

Oracle XML-SQL Utility (XSU)

SQL to XML Mapping

Oracle translates the chain of object references from the database into the hierarchical structure of XML elements. In an object-relational database, the field ACCOUNT in the table FXTRADE is modeled as an object reference of type AccountType:


CREATE TABLE FXTRADE 

{

     CURRENCY1      CHAR (3),

     CURRENCY2      CHAR (3),

     AMOUNT         NUMERIC (18,2),

     SETTLEMENT     DATE,

     ACCOUNT        AccountType // object reference

}



CREATE TYPE AccountType as OBJECT

{  

     BANKCODE       VARCHAR (100),

     BANKACCT       VARCHAR (100) 

}

A corresponding XML document generated from the given object-relational model (using ";SELECT * FROM FXTRADE") looks like


<?xml version="1.0"?>

<ROWSET>

     <ROW num="1"> 

        <CURRENCY1>GBP</CURRENCY1>

        <CURRENCY2>JPY</CURRENCY2>

        <AMOUNT>10000</AMOUNT>

        <SETTLEMENT>20010325</SETTLEMENT>

        <ACCOUNT>

   <BANKCODE>812</BANKCODE>

   <BANKACCT>00365888</BANKACCT>

        </ACCOUNT>

    </ROW>

    <!-- additional rows ... -->

 </ROWSET>

Extracting XML from the database

The example below is taken from Oracle's XSU documentation with appropriate substitution of SQL statements and using Oracle's pure Java JDBC thin driver.

First, an instance of OracleXMLQuery is created; second, a query is executed, and the result is represented in the form of an XML document as above. Similarly, an XML document can be extracted in the form of DOM; in this case qry.getXMLDOM() would be invoked instead of getXMLString().


import oracle.jdbc.driver.*;

import oracle.xml.sql.query.OracleXMLQuery;

import java.lang.*;

import java.sql.*;



// class to test XML document generation as String

class testXMLSQL {



   public static void main(String[] args)

   {

     try {

      // Create the connection

      Connection conn  = getConnection("scott","tiger");



      // Create the query class

      OracleXMLQuery qry = new OracleXMLQuery(conn,

         "SELECT  * FROM FXTRADE");



      // Get the XML string

      String str = qry.getXMLString();



      // Print the XML output

      System.out.println("The XML output is:\n"+str);



      // Always close the query to get rid of any resources..

      qry.close();

     } catch(SQLException e) {

      System.out.println(e.toString());

     }

   }



   // Get the connection given the user name and password.!

   private static Connection getConnection(String username,

        String password)

        throws SQLException

   {

      // register the JDBC driver..

       DriverManager.registerDriver(new 

          oracle.jdbc.driver.OracleDriver());



      // Create the connection using the OCI8 driver

       Connection conn =

        DriverManager.getConnection(

           "jdbc:oracle:thin:@dlsun489:1521:ORCL",

           username,password);



      return conn;

   }

}

Storing XML in the database

In this example OracleXMLSave is employed to store our XML document in an object-relational model; the insertXML method performs the actual insertion of the data.

import java.sql.*;

import oracle.xml.sql.dml.OracleXMLSave;

public class testXMLInsert

{

   public static void main(String args[])

             throws SQLException

  {

    Connection conn = getConnection("scott","tiger");

    OracleXMLSave sav = new OracleXMLSave(conn, "scott. FXTRADE");

  // Assume that the user passes in this document as 0-arg

      sav.insertXML(args[0]);

      sav.close();

   }



  ...

}

That's fine if the XML and object-relational model in the database are synchronized, but what if they aren't? You have two options in that case.

Adjust the object-relational model -- a modifiable object-relational view can be constructed to accomplish multi-table modifications; or,
alternatively, the XML document can be decomposed into a collection of "flat" subdocuments, using XSLT.

XSU does not permit storage of attribute values; it's recommended that you transform attributes into elements.

Summary of Oracle XSU

An XML to SQL mapping is modeled by an object-relational model construction rule as follows: each nested XML element is mapped onto an object reference of appropriate type. Mapping rules are implicitly embedded in the database model.

The Java API consists of the classes OracleXMLQuery and OracleXMLSave.

IBM DB2 XML Extender

SQL to XML Mapping

IBM's XML Extender provides two access and storage methods for using DB2 as an XML repository:

XML column: stores and retrieves entire XML documents as DB2 column data
XML collection: decomposes XML documents into a collection of relational tables, or composes XML documents from a collection of relational tables.

DTDs are stored in the DTD repository, a DB2 table called DTD_REF; it's schema name is "db2xml". Each DTD in the DTD_REF table has a unique ID. The mapping between the database tables and the structure of the XML document is defined by means of a Data Access Definition (DAD) file. DAD refers to a processed document DTD, thus providing a bridge between an XML document, its DTD, and mapping rules onto database tables.

Here's an example DAD.

<?xml version="1.0"?>

<!DOCTYPE    DAD    SYSTEM    "dad.dtd">    

<DAD>    

    <dtdid>FXTRADE.DTD</dtdid>    

    <validation>YES</validation>    

    <Xcollection>    

        <prolog>?xml version="1.0"?</prolog>    

        <doctype>!DOCTYPE     FXTRADE       FXTRADE.DTD </doctype>    

        <root_node>    

   <element_node     name="FXTRADE">    

       <RDB_node>    

           <table name="FXTRADE"/>    

           <table name="ACCOUNT" key="ID"/>    

           <condition>    

      FXTRADE.ACCOUNT=ACCOUNT.ID    

           </condition>    

       </RDB_node>    

       <element_node name="CURRENCY1">    

               <text_node>    

           <RDB_node>    

                 <table name="FXTRADE"/>    

                 <column name="CURRENCY1" type="CHAR(3)"/>    

           </RDB_node>    

                </text_node>

       </element_node>

       <element_node name="CURRENCY2">

               <text_node>

           <RDB_node>

                 <table name="FXTRADE"/>

                 <column name="CURRENCY2" type="CHAR(3)"/>

           </RDB_node>

              </text_node>

       </element_node>

       <element_node name="AMOUNT">

               <text_node>

           <RDB_node>

                 <table name="FXTRADE"/>

                 <column name="AMOUNT" type="DECIMAL(18,2)"/>

           </RDB_node>

               </text_node>

       </element_node>

       <element_node name="SETTLEMENT">

             <text_node>

           <RDB_node>

                 <table name="FXTRADE"/>

                 <column name="SETTLEMENT" type="DATE"/>

           </RDB_node>

              </text_node>

       </element_node>

       <element_node name="ACCOUNT"> 

           <element_node name="BANKCODE">

               <text_node>

                   <RDB_node>

                       <table name="ACCOUNT"/>

                       <column name="BANKCODE"

                           type="VARCHAR(100)"/>

                   </RDB_node>

               </text_node>

           </element_node>

           <element_node name="BANKACCT">

               <text_node>

                   <RDB_node>

                        <table name="ACCOUNT"/>

                        <column name="BANKACCT"

                            type="VARCHAR(100)"/> 

                    </RDB_node>

               </text_node>

           </element_node>

       </element_node> <!--end of  Account element-->

   </element_node>    <!-- end of  FxTrade element -->

        </root_node>

    </Xcollection>

</DAD>

DAD defines a mapping between XML elements and relational database columns by employing element_node to RDB_node associations. The top-level element_node FXTRADE is defined as a join between tables FXTRADE and ACCOUNT, with field ID in ACCOUNT table as a primary key. Child element CURRENCY1 is mapped onto field CURRENCY1 in the table FXTRADE, and so on. Note that the ACCOUNT element does not include any RDB_node -- this is not required since the link between ACCOUNT and FXTRADE tables has been defined earlier. Child elements of ACCOUNT, BANCCODE, and BANKACCT respectively are defined in the corresponding columns of ACCOUNT table. Atomic XML elements are tagged in DAD as text_node. In the example above all elements, except FXTRADE and ACCOUNT, are atomic.

Extracting XML from the database

XML document composition and decomposition are handled by stored procedures: stored procedure dxxGenXML() extracts XML documents from a database; stored procedure dxxShredXML() stores XML documents in a database.

The main input parameters of dxxGenXML() are

DAD: stored in form of DB2 CLOB data type;
result table name: the constructed XML document is forwarded to this table.

Other input parameters specify the maximum number of rows returned, and <RDB_node> <condition> element override options. The output parameters include the actual number of returned rows, return code, and return message.

An extensive example of stored procedures within a C program can be found in "IBM DB2 Universal Database XML Extender Administration and Programming, Version 7".

Storing XML in the database

Putting an XML document into a database is accomplished by dxxShredXML(), a stored procedure.

The input parameters of dxxShredXML() are

DAD: stored in the form of DB2 CLOB data type;
input XML document: stored in the form of DB2 XMLCLOB data type.

Its output parameters are a return code and return message.

Summary

The XML-SQL mapping rules are specified by means of a Data Access Definition (DAD) file, which is an XML document. The DB2 XML Extender administration facilities include a means to construct DAD files for each persistent DTD.

Future enhancements will include a new XML-SQL transformation syntax, which will employ an XML Transforms language, a subset of W3C XSLT.

Microsoft SQL Server 2000

SQL to XML Mapping

SQL Server's SQL-to-XML and XML-to-SQL mapping rules employ different syntaxes. Mapping details are discussed in the sections on extraction and storing.

Extracting XML from the database

Mapping between database columns and XML elements or attributes is defined by means of AS aliases in a SELECT:

<database column> AS [Element Name! Nesting Level! Attribute Name! Directive]

The document top level is assigned level 1, as below. By default, column data is mapped onto attribute values. The directive "element" may be used to change this default setting.

The process of generating an XML document from the database has two steps.

Step 1. Create As-aliases to atomic elements in your desired output XML; the alias defines the parent/child relationships between elements. The table below shows the aliases for our example document.


FXTRADE       /* LEVEL=1 */    

    CURRENCY1   [FXTRADE!1!CURRENCY1]    

    CURRENCY2   [FXTRADE!1!CURRENCY2]    

    AMOUNT      [FXTRADE!1!AMOUNT]    

    SETTLEMENT  [FXTRADE!1!SETTLEMENT]    

    ACCOUNT   /* LEVEL=2  */    

        BANKCODE     [ACCOUNT!2!BANKCODE]    

        BANKACCT     [ACCOUNT!2!BANKACCT]

Step 2. Define the output tree structure in SQL. Each level of the tree is defined through a SELECT statement, thereafter the levels are combined together into the tree by means of a UNION ALL statement. The level-1 SELECT statement introduces the names of atomic elements on all levels. Each SELECT statement introduces a tree level tag and its parent tag. There is a single record in the result set corresponding to the tree root, as defined in the first SELECT statement below.


SELECT    

    1          AS    Tag,    

    NULL       AS    Parent,    

    NULL       AS    [FXTRADE!1!CURRENCY1],    

    NULL       AS    [FXTRADE!1!CURRENCY2],    

    NULL       AS    [FXTRADE!1!AMOUNT],    

    NULL       AS    [FXTRADE!1!SETTLEMENT],    

    NULL       AS    [ACCOUNT!2!BANKCODE],    

    NULL       AS    [ACCOUNT!2!BANKACCT]    

FROM            

    FXTRADE    

UNION ALL    

SELECT    

    2,    

    1,    

    FXTRADE.CURRENCY1,    

    FXTRADE.CURRENCY2,    

    FXTRADE.AMOUNT,    

    FXTRADE.SETTLEMENT,    

    ACCOUNT.BANKCODE,    

    ACCOUNT.BANKACCT    

FROM    

    FXTRADE,    ACCOUNT    

WHERE    

    FXTRADE.ACCOUNT = ACCOUNT.ID    

ORDER    BY    [ACCOUNT!2!BANKCODE],    

        [ACCOUNT!2!BANKACCT]    

FOR    XML    EXPLICIT, ELEMENTS

The FOR XML constructs an XML document by analyzing tags and AS-aliases in the combined row set. The keyword EXPLICIT selects the most flexible, user-defined mode of constructing XML document. Another mode, AUTO, constructs XML document by applying default rules. The keyword ELEMENTS models SQL columns at element level; otherwise, the default is that columns are modeled on attribute level.

Storing XML in the database

Storing XML documents employs OPENXML, a new row set function, similar to table or view. OPENXML can be used for inserts or updates, or for SELECT INTO target tables. OPENXML simplified syntax is shown below:

OPENXML    (<XML document handler>, <path pattern>, <flags>)    

WITH     (Schema | Table)

The process of storing an XML document has three steps.

Obtain an XML document handler by compiling XML document into internal DOM representation, using the stored procedure sp_xml_preparedocument.
Construct a schema by associating schema fields with atomic XML elements.

XML elements are defined by a path pattern (absolute base path) plus a relative element path. Element-centric mapping is indicated by a flags value of 2. Existing tables can be used instead of a schema, with field names equal to XML names.
Remove the compiled XML document from memory by using the stored procedure sp_xml_removedocument.

These steps are illustrated by the following example.

DECLARE @idoc int    

DECLARE @doc varchar(1000)    

SET @doc ='    

<FXTRADE>    

    <CURRENCY1>GBP</CURRENCY1>    

    <CURRENCY2>JPY</CURRENCY2>    

    <AMOUNT>10000</AMOUNT>    

    <SETTLEMENT>20010325</SETTLEMENT>    

    <ACCOUNT>    

        <BANKCODE>812</BANKCODE>    

        <BANKACCT>00365888</BANKACCT>    

    </ACCOUNT>    

</FXTRADE>'    

-- Create internal DOM representation of the XML document.    

EXEC sp_xml_preparedocument @idoc OUTPUT, @doc    

-- Execute a SELECT statement using OPENXML row set provider.    

SELECT *    

FROM OPENXML (@idoc, '/FXTRADE/ACCOUNT', 2)    

WITH (    

  CURRENCY1     CHAR (3),       '../@CURRENCY1',    

  CURRENCY2     CHAR (3),       '../@CURRENCY2',    

  AMOUNT        NUMERIC (18,2), '../@AMOUNT',    

  SETTLEMENT    DATETIME,       '../@SETTLEMENT',    

  BANKCODE      VARCHAR (100),  '@BANKCODE',    

  BANKACCT      VARCHAR (100),  '@BANKACCT' )

EXEC sp_xml_removedocument @idoc

Summary

With Microsoft SQL Server 2000, the extraction and storage of XML documents does not use symmetrical syntax. Extraction extends a SELECT-clause by use of the FOR XML construct. Storing introduces a row set function OPENXML, analogous to a table or view. Extraction mapping rules are based on (a) introducing tags for specifying tree level and (b) associating table fields with parent/child relationships between XML document elements. Storing restructures an XML document into a flat schema or table; "field-element" association is defined using XPath notation.

Sybase Adaptive Server

SQL to XML Mapping

Sybase employs an XML document type, ResultSet, to describe both XML document metadata (element name, type, size, etc.) and actual row data. Here is excerpt from a hypothetical FxTradeSet.xml:

<?xml version="1.0"?>    

<!DOCTYPE ResultSet SYSTEM "ResultSet.dtd">    

<ResultSet>    

    <ResultSetMetaData>    

        <ColumnMetaData    

        ...

        getColumnLabel="CURRENCY1"    

        getColumnName="CURRENCY1"    

        getColumnType="12"    

         ... />    

   ...    

   </ResultSetMetaData>    

   <ResultSetData>    

        <Row>    

           <Column name="CURRENCY1">GBP</Column>    

        ...    

        </Row>    

    </ResultSetData>    

</ResultSet>

The ResultSet DTD does not appear to permit definition of the nested elements.

Extracting XML from the database

The Java class ResultSetXml has a constructor which takes an SQL query as an argument, thereafter the getXmlLText method extracts an XML document from result set:

jcs.xml.resultset.ResultSetXml     rsx = new jcs.xml.resultset.ResultSetXml    

           ("Select * from FxTrade", <other parameters>);    

FileUtil.string2File ("FxTradeSet.xml", rsx.getXmlText());

Storing XML in the database

The ResultSetXml class constructor can also take an XML document as an argument. Then the method toSqlScript generates sequences of SQL statements for insert/update into a specified table from a result set.

String    xmlString = FileUtil.file2string ("FxTradeSet.xml");    

jcs.xml.resultset.ResultSetXml     rsx = new jcs.xml.resultset.ResultSetXml    

           (xmlString);    

String     sqlString  = rsx.toSqlScript ("FxTrade", <other parameters>)

Summary

The extraction and storage of XML documents are symmetrical in nature. Storing does not seem to permit modifying more than one table. Extraction transforms the results of an SQL query into a document with a flat structure.

Vendor Comparison

Vendor	Mapping rules	Single table / Multiple tables	Means of transformation	Symmetrical extraction / storing
Oracle	Implicitly; by constructing object-relational data model	Multiple	Designated Java classes	Symmetrical, if XML document and object-relational model match
IBM	Data Access Definition file	Multiple	Designated stored procedures	Symmetrical
Microsoft	SQL extension; row set function	Multiple for extraction; Single for storing	By using SQL construct FOR XML and row set OPENXML	Asymmetrical
Sybase	Result Set DTD	Single; query may encompass multiple tables	By using Java classes	Symmetrical

Common features between vendors:

XML persistence is achieved on an ad hoc basis, i.e., there's no general facility for storing arbitrary XML documents); special mapping is required if the XML document uses a new grammar;
storage often requires data preprocessing, such as number/date reformatting with respect to the locale being used; XSLT can be utilized to perform XML data massaging.

An Alternative Strategy

Can the task of XML document storage in a database be split into stages? First, store XML in generic database structures without applying any mapping rules; and, second, applying a framework for subsequent data processing? The advantage of this approach would be to have a universal acquisition agent for virtually any XML document. The remainder of this article proposes just such a solution.

In this strategy, XML documents are saved as normalized tree structures -- raw tree data -- for further application processing. The framework for further data processing is then a set of database objects (stored procedures), similar to DOM APIs, for navigating data by employing traditional SQL processing.

There are advantages to using a framework approach as opposed to full automation: the population of database tables is normally controlled by application logic distributed between "pure" application code, transaction processing, database layer, and stored procedures, not to mention that some tables may reside in remote databases.

Additionally, in a heterogeneous database environment, having a unified tool to perform the same task for different databases has obvious advantages.

The author's solution, ObjectCentric Solutions x-Persistence Utility, implements the strategy discussed here.

Raw tree structures are defined by means of the following tables.

1) Tree Node Definition    



CREATE TABLE NodeDef  (

   nodeID       udd_id         NOT NULL, // unique node id    

   dimID        udd_id         NOT NULL, // tree category: XML    

   name         udd_name       NOT NULL, // node name    

   value        udd_paramvalue NULL,     // node value    

   value_type   udd_type       NULL,     // value type    

   power        udd_power      NULL,     // node tree level    

   isRoot       udd_logical    NOT NULL, // top node flag    

   descriptor   udd_descriptor NULL,     // DOM Node type     

   color        udd_color      NULL      // informal data    

)    



2) Parent-Child relationships between nodes    



CREATE TABLE NodeLinks    (    

   parentID     udd_id     NOT NULL,   // parent node    

   childID      udd_id     NOT NULL    // child node    

)

Example usage

Storing an XML document in a database is a matter of invoking the XML2SQL program:

XML2SQL  <XML document filename>

Extracting an XML document from a database as a tree structure is implemented using the following stored procedures:

get_tree_roots <tree category> -- extract all document root ids for a given forest
get_tree <root id> -- extract all nodes for a given root (document)
get_children <node, level> -- extract all children of a certain level for a given node

Implementation details:

Current platform coverage: Sybase, MS SQL Server.
Platforms being evaluated: Oracle, DB2, and PostgreSQL.
The utility is built on top of the Xerces XML Parser.
Database installation involves the addition of just a few tables, and stored procedures to navigate the database tables as tree structures.

The x-Persistence Utility is currently used as the core of an XML-based data acquisition agent in Open Finance Laboratory, a customizable integration platform for a range of financial applications such as portfolio management, asset-liability management, risk management, etc. Open Finance Laboratory adopted FpML as a descriptive language for Interest Rate Derivatives products. x-Persistence Utility is currently used as a FpML/FRA data acquisition agent.

References

Oracle XML-SQL Utility, http://otn.oracle.com/tech/xml/oracle_xsu
IBM DB2 XML Extender, www.ibm.com/software/data/db2/extenders/xmlext
XML Perspective. In control with FOR XML Explicit. SQL Server Magazine, http://msdn.mcrosoft.com/library/periodic/period01/xmlExplicit.htm
Writing XML Using OPENXML
OPENXML
Technology Preview: x-Persistence Utility, ObjectCentric Solutions, Inc., 2001, www.objcentric.com