Storing XML in Relational Databases
June 20, 2001
Introduction
Table of Contents
Oracle XML-SQL Utility (XSU)
Related Articles |
There are various ways to solve the problem of effective, automatic conversion of XML data into and out of relational databases. Database vendors such as IBM, Microsoft, Oracle, and Sybase have developed tools to assist in converting XML documents into relational tables. The various solutions are as follows.
-
Oracle XML SQL Utility models XML document elements as a collection of nested tables. Enclosed elements are modeled by employing the Oracle Object datatype. The "SQL-to-XML" conversion constructs an XML document by using a one-to-one association between a table, referenced by Object datatype, and a nested element. "XML-to-SQL" might require either data model amending (converting it from relational into object-relational) or restructuring the original XML document.
-
IBM DB2 XML Extender allows storing XML documents either as BLOB-like objects or as decomposed into a set of tables. The latter transformation, known as XML collection, is defined in XML 1.0 syntax.
-
Microsoft approaches the problem by extending SQL-92 and by introducing OPENXML row set.
-
Sybase Adaptive Server introduces the ResultSetXml Java class as a base for processing XML documents in both directions.
In this article, we will explore in detail these vendor's solutions. Thereafter, we will attempt to answer the questions:
- Can we restructure the and simplify the problem?
- What is the right approach in a heterogeneous database environment?
I'll use the following vocabulary as an example.
<!-- Primitive Types --> <!ELEMENT CURRENCY1 (#PCDATA)> <!ATTLIST CURRENCY1 e-dtype NMTOKEN #FIXED "string" e-dsize NMTOKEN #FIXED "3"> <!ELEMENT CURRENCY2 (#PCDATA)> <!ATTLIST CURRENCY2 e-dtype NMTOKEN #FIXED "string" e-dsize NMTOKEN #FIXED "3"> <!ELEMENT AMOUNT (#PCDATA)> <!ATTLIST AMOUNT e-dtype NMTOKEN #FIXED "decimal"> <!ELEMENT SETTLEMENT (#PCDATA)> <!ATTLIST SETTLEMENT e-dtype NMTOKEN #FIXED "date"> <!ELEMENT BANKCODE (#PCDATA)> <!ATTLIST BANKCODE e-dtype NMTOKEN #FIXED "string"> <!ELEMENT BANKACCT (#PCDATA)> <!ATTLIST BANKACCT e-dtype NMTOKEN #FIXED "string"> <!-- Derived Types --> <!ELEMENT ACCOUNT (BANKCODE, BANKACCT)> <!ELEMENT FXTRADE (CURRENCY1, CURRENCY2, AMOUNT, SETTLEMENT, ACCOUNT)> |
Oracle XML-SQL Utility (XSU)
SQL to XML Mapping
Oracle translates the chain of object references from the database into the hierarchical structure of XML elements. In an object-relational database, the field ACCOUNT in the table FXTRADE is modeled as an object reference of type AccountType:
CREATE TABLE FXTRADE { CURRENCY1 CHAR (3), CURRENCY2 CHAR (3), AMOUNT NUMERIC (18,2), SETTLEMENT DATE, ACCOUNT AccountType // object reference } CREATE TYPE AccountType as OBJECT { BANKCODE VARCHAR (100), BANKACCT VARCHAR (100) } |
A corresponding XML document generated from the given object-relational model (using ";SELECT * FROM FXTRADE") looks like
<?xml version="1.0"?> <ROWSET> <ROW num="1"> <CURRENCY1>GBP</CURRENCY1> <CURRENCY2>JPY</CURRENCY2> <AMOUNT>10000</AMOUNT> <SETTLEMENT>20010325</SETTLEMENT> <ACCOUNT> <BANKCODE>812</BANKCODE> <BANKACCT>00365888</BANKACCT> </ACCOUNT> </ROW> <!-- additional rows ... --> </ROWSET> |
Extracting XML from the database
The example below is taken from Oracle's XSU documentation with appropriate substitution of SQL statements and using Oracle's pure Java JDBC thin driver.
First, an instance of OracleXMLQuery
is created; second, a query is executed,
and the result is represented in the form of an XML document as above. Similarly,
an XML
document can be extracted in the form of DOM; in this case qry.getXMLDOM()
would be invoked instead of getXMLString()
.
import oracle.jdbc.driver.*; import oracle.xml.sql.query.OracleXMLQuery; import java.lang.*; import java.sql.*; // class to test XML document generation as String class testXMLSQL { public static void main(String[] args) { try { // Create the connection Connection conn = getConnection("scott","tiger"); // Create the query class OracleXMLQuery qry = new OracleXMLQuery(conn, "SELECT * FROM FXTRADE"); // Get the XML string String str = qry.getXMLString(); // Print the XML output System.out.println("The XML output is:\n"+str); // Always close the query to get rid of any resources.. qry.close(); } catch(SQLException e) { System.out.println(e.toString()); } } // Get the connection given the user name and password.! private static Connection getConnection(String username, String password) throws SQLException { // register the JDBC driver.. DriverManager.registerDriver(new oracle.jdbc.driver.OracleDriver()); // Create the connection using the OCI8 driver Connection conn = DriverManager.getConnection( "jdbc:oracle:thin:@dlsun489:1521:ORCL", username,password); return conn; } } |
Storing XML in the database
In this example OracleXMLSave
is employed to store our XML document in an
object-relational model; the insertXML
method performs the actual insertion of
the data.
import java.sql.*; import oracle.xml.sql.dml.OracleXMLSave; public class testXMLInsert { public static void main(String args[]) throws SQLException { Connection conn = getConnection("scott","tiger"); OracleXMLSave sav = new OracleXMLSave(conn, "scott. FXTRADE"); // Assume that the user passes in this document as 0-arg sav.insertXML(args[0]); sav.close(); } ... } |
That's fine if the XML and object-relational model in the database are synchronized, but what if they aren't? You have two options in that case.
- Adjust the object-relational model -- a modifiable object-relational view can be constructed to accomplish multi-table modifications; or,
- alternatively, the XML document can be decomposed into a collection of "flat" subdocuments, using XSLT.
XSU does not permit storage of attribute values; it's recommended that you transform attributes into elements.
Summary of Oracle XSU
An XML to SQL mapping is modeled by an object-relational model construction rule as follows: each nested XML element is mapped onto an object reference of appropriate type. Mapping rules are implicitly embedded in the database model.
The Java API consists of the classes OracleXMLQuery
and
OracleXMLSave
.
IBM DB2 XML Extender
SQL to XML Mapping
IBM's XML Extender provides two access and storage methods for using DB2 as an XML repository:
- XML column: stores and retrieves entire XML documents as DB2 column data
- XML collection: decomposes XML documents into a collection of relational tables, or composes XML documents from a collection of relational tables.
DTDs are stored in the DTD repository, a DB2 table called DTD_REF; it's schema name is "db2xml". Each DTD in the DTD_REF table has a unique ID. The mapping between the database tables and the structure of the XML document is defined by means of a Data Access Definition (DAD) file. DAD refers to a processed document DTD, thus providing a bridge between an XML document, its DTD, and mapping rules onto database tables.
Here's an example DAD.
<?xml version="1.0"?> <!DOCTYPE DAD SYSTEM "dad.dtd"> <DAD> <dtdid>FXTRADE.DTD</dtdid> <validation>YES</validation> <Xcollection> <prolog>?xml version="1.0"?</prolog> <doctype>!DOCTYPE FXTRADE FXTRADE.DTD </doctype> <root_node> <element_node name="FXTRADE"> <RDB_node> <table name="FXTRADE"/> <table name="ACCOUNT" key="ID"/> <condition> FXTRADE.ACCOUNT=ACCOUNT.ID </condition> </RDB_node> <element_node name="CURRENCY1"> <text_node> <RDB_node> <table name="FXTRADE"/> <column name="CURRENCY1" type="CHAR(3)"/> </RDB_node> </text_node> </element_node> <element_node name="CURRENCY2"> <text_node> <RDB_node> <table name="FXTRADE"/> <column name="CURRENCY2" type="CHAR(3)"/> </RDB_node> </text_node> </element_node> <element_node name="AMOUNT"> <text_node> <RDB_node> <table name="FXTRADE"/> <column name="AMOUNT" type="DECIMAL(18,2)"/> </RDB_node> </text_node> </element_node> <element_node name="SETTLEMENT"> <text_node> <RDB_node> <table name="FXTRADE"/> <column name="SETTLEMENT" type="DATE"/> </RDB_node> </text_node> </element_node> <element_node name="ACCOUNT"> <element_node name="BANKCODE"> <text_node> <RDB_node> <table name="ACCOUNT"/> <column name="BANKCODE" type="VARCHAR(100)"/> </RDB_node> </text_node> </element_node> <element_node name="BANKACCT"> <text_node> <RDB_node> <table name="ACCOUNT"/> <column name="BANKACCT" type="VARCHAR(100)"/> </RDB_node> </text_node> </element_node> </element_node> <!--end of Account element--> </element_node> <!-- end of FxTrade element --> </root_node> </Xcollection> </DAD> |
DAD defines a mapping between XML elements and relational database columns by employing element_node to RDB_node associations. The top-level element_node FXTRADE is defined as a join between tables FXTRADE and ACCOUNT, with field ID in ACCOUNT table as a primary key. Child element CURRENCY1 is mapped onto field CURRENCY1 in the table FXTRADE, and so on. Note that the ACCOUNT element does not include any RDB_node -- this is not required since the link between ACCOUNT and FXTRADE tables has been defined earlier. Child elements of ACCOUNT, BANCCODE, and BANKACCT respectively are defined in the corresponding columns of ACCOUNT table. Atomic XML elements are tagged in DAD as text_node. In the example above all elements, except FXTRADE and ACCOUNT, are atomic.
Extracting XML from the database
XML document composition and decomposition are handled by stored procedures: stored
procedure dxxGenXML()
extracts XML documents from a database; stored procedure
dxxShredXML()
stores XML documents in a database.
The main input parameters of dxxGenXML() are
- DAD: stored in form of DB2 CLOB data type;
- result table name: the constructed XML document is forwarded to this table.
Other input parameters specify the maximum number of rows returned, and <RDB_node> <condition> element override options. The output parameters include the actual number of returned rows, return code, and return message.
An extensive example of stored procedures within a C program can be found in "IBM DB2 Universal Database XML Extender Administration and Programming, Version 7".
Storing XML in the database
Putting an XML document into a database is accomplished by dxxShredXML()
, a
stored procedure.
The input parameters of dxxShredXML() are
- DAD: stored in the form of DB2 CLOB data type;
- input XML document: stored in the form of DB2 XMLCLOB data type.
Its output parameters are a return code and return message.
Summary
The XML-SQL mapping rules are specified by means of a Data Access Definition (DAD) file, which is an XML document. The DB2 XML Extender administration facilities include a means to construct DAD files for each persistent DTD.
Future enhancements will include a new XML-SQL transformation syntax, which will employ an XML Transforms language, a subset of W3C XSLT.
Microsoft SQL Server 2000
SQL to XML Mapping
SQL Server's SQL-to-XML and XML-to-SQL mapping rules employ different syntaxes. Mapping details are discussed in the sections on extraction and storing.
Extracting XML from the database
Mapping between database columns and XML elements or attributes is defined by means
of
AS
aliases in a SELECT
:
<database column> AS [Element Name! Nesting Level! Attribute Name! Directive]
The document top level is assigned level 1, as below. By default, column data is mapped onto attribute values. The directive "element" may be used to change this default setting.
The process of generating an XML document from the database has two steps.
Step 1. Create As
-aliases to atomic elements in your desired output XML; the
alias defines the parent/child relationships between elements. The table below shows
the
aliases for our example document.
FXTRADE /* LEVEL=1 */ CURRENCY1 [FXTRADE!1!CURRENCY1] CURRENCY2 [FXTRADE!1!CURRENCY2] AMOUNT [FXTRADE!1!AMOUNT] SETTLEMENT [FXTRADE!1!SETTLEMENT] ACCOUNT /* LEVEL=2 */ BANKCODE [ACCOUNT!2!BANKCODE] BANKACCT [ACCOUNT!2!BANKACCT] |
Step 2. Define the output tree structure in SQL. Each level of the tree is defined
through
a SELECT
statement, thereafter the levels are combined together into the tree
by means of a UNION ALL
statement. The level-1 SELECT
statement
introduces the names of atomic elements on all levels. Each SELECT
statement
introduces a tree level tag and its parent tag. There is a single record in the result
set
corresponding to the tree root, as defined in the first SELECT
statement below.
SELECT 1 AS Tag, NULL AS Parent, NULL AS [FXTRADE!1!CURRENCY1], NULL AS [FXTRADE!1!CURRENCY2], NULL AS [FXTRADE!1!AMOUNT], NULL AS [FXTRADE!1!SETTLEMENT], NULL AS [ACCOUNT!2!BANKCODE], NULL AS [ACCOUNT!2!BANKACCT] FROM FXTRADE UNION ALL SELECT 2, 1, FXTRADE.CURRENCY1, FXTRADE.CURRENCY2, FXTRADE.AMOUNT, FXTRADE.SETTLEMENT, ACCOUNT.BANKCODE, ACCOUNT.BANKACCT FROM FXTRADE, ACCOUNT WHERE FXTRADE.ACCOUNT = ACCOUNT.ID ORDER BY [ACCOUNT!2!BANKCODE], [ACCOUNT!2!BANKACCT] FOR XML EXPLICIT, ELEMENTS |
The FOR XML
constructs an XML document by analyzing tags and
AS
-aliases in the combined row set. The keyword EXPLICIT
selects
the most flexible, user-defined mode of constructing XML document. Another mode,
AUTO
, constructs XML document by applying default rules. The keyword
ELEMENTS
models SQL columns at element level; otherwise, the default is that
columns are modeled on attribute level.
Storing XML in the database
Storing XML documents employs OPENXML
, a new row set function, similar to
table or view. OPENXML
can be used for inserts or updates, or for SELECT
INTO
target tables. OPENXML
simplified syntax is shown below:
OPENXML (<XML document handler>, <path pattern>, <flags>) WITH (Schema | Table)
The process of storing an XML document has three steps.
-
Obtain an XML document handler by compiling XML document into internal DOM representation, using the stored procedure
sp_xml_preparedocument
. -
Construct a schema by associating schema fields with atomic XML elements.
XML elements are defined by a path pattern (absolute base path) plus a relative element path. Element-centric mapping is indicated by a flags value of 2. Existing tables can be used instead of a schema, with field names equal to XML names.
-
Remove the compiled XML document from memory by using the stored procedure
sp_xml_removedocument
.
These steps are illustrated by the following example.
DECLARE @idoc int DECLARE @doc varchar(1000) SET @doc =' <FXTRADE> <CURRENCY1>GBP</CURRENCY1> <CURRENCY2>JPY</CURRENCY2> <AMOUNT>10000</AMOUNT> <SETTLEMENT>20010325</SETTLEMENT> <ACCOUNT> <BANKCODE>812</BANKCODE> <BANKACCT>00365888</BANKACCT> </ACCOUNT> </FXTRADE>' -- Create internal DOM representation of the XML document. EXEC sp_xml_preparedocument @idoc OUTPUT, @doc -- Execute a SELECT statement using OPENXML row set provider. SELECT * FROM OPENXML (@idoc, '/FXTRADE/ACCOUNT', 2) WITH ( CURRENCY1 CHAR (3), '../@CURRENCY1', CURRENCY2 CHAR (3), '../@CURRENCY2', AMOUNT NUMERIC (18,2), '../@AMOUNT', SETTLEMENT DATETIME, '../@SETTLEMENT', BANKCODE VARCHAR (100), '@BANKCODE', BANKACCT VARCHAR (100), '@BANKACCT' ) EXEC sp_xml_removedocument @idoc |
Summary
With Microsoft SQL Server 2000, the extraction and storage of XML documents does not
use
symmetrical syntax. Extraction extends a SELECT
-clause by use of the FOR
XML
construct. Storing introduces a row set function OPENXML
,
analogous to a table or view. Extraction mapping rules are based on (a) introducing
tags for
specifying tree level and (b) associating table fields with parent/child relationships
between XML document elements. Storing restructures an XML document into a flat schema
or
table; "field-element" association is defined using XPath notation.
Sybase Adaptive Server
SQL to XML Mapping
Sybase employs an XML document type, ResultSet, to describe both XML document metadata (element name, type, size, etc.) and actual row data. Here is excerpt from a hypothetical FxTradeSet.xml:
<?xml version="1.0"?> <!DOCTYPE ResultSet SYSTEM "ResultSet.dtd"> <ResultSet> <ResultSetMetaData> <ColumnMetaData ... getColumnLabel="CURRENCY1" getColumnName="CURRENCY1" getColumnType="12" ... /> ... </ResultSetMetaData> <ResultSetData> <Row> <Column name="CURRENCY1">GBP</Column> ... </Row> </ResultSetData> </ResultSet> |
The ResultSet DTD does not appear to permit definition of the nested elements.
Extracting XML from the database
The Java class ResultSetXml
has a constructor which takes an SQL query as an
argument, thereafter the getXmlLText
method extracts an XML document from
result set:
jcs.xml.resultset.ResultSetXml rsx = new jcs.xml.resultset.ResultSetXml ("Select * from FxTrade", <other parameters>); FileUtil.string2File ("FxTradeSet.xml", rsx.getXmlText()); |
Storing XML in the database
The ResultSetXml
class constructor can also take an XML document as an
argument. Then the method toSqlScript
generates sequences of SQL statements for
insert/update into a specified table from a result set.
String xmlString = FileUtil.file2string ("FxTradeSet.xml"); jcs.xml.resultset.ResultSetXml rsx = new jcs.xml.resultset.ResultSetXml (xmlString); String sqlString = rsx.toSqlScript ("FxTrade", <other parameters>) |
Summary
The extraction and storage of XML documents are symmetrical in nature. Storing does not seem to permit modifying more than one table. Extraction transforms the results of an SQL query into a document with a flat structure.
Vendor Comparison
Vendor |
Mapping rules |
Single table / Multiple tables |
Means of transformation |
Symmetrical extraction / storing |
Oracle |
Implicitly; by constructing object-relational data model |
Multiple |
Designated Java classes |
Symmetrical, if XML document and object-relational model match |
IBM |
Data Access Definition file |
Multiple |
Designated stored procedures |
Symmetrical |
Microsoft |
SQL extension; row set function |
Multiple for extraction; Single for storing |
By using SQL construct FOR XML and row set OPENXML |
Asymmetrical |
Sybase |
Result Set DTD |
Single; query may encompass multiple tables |
By using Java classes |
Symmetrical |
Common features between vendors:
- XML persistence is achieved on an ad hoc basis, i.e., there's no general facility for storing arbitrary XML documents); special mapping is required if the XML document uses a new grammar;
- storage often requires data preprocessing, such as number/date reformatting with respect to the locale being used; XSLT can be utilized to perform XML data massaging.
An Alternative Strategy
Can the task of XML document storage in a database be split into stages? First, store XML in generic database structures without applying any mapping rules; and, second, applying a framework for subsequent data processing? The advantage of this approach would be to have a universal acquisition agent for virtually any XML document. The remainder of this article proposes just such a solution.
In this strategy, XML documents are saved as normalized tree structures -- raw tree data -- for further application processing. The framework for further data processing is then a set of database objects (stored procedures), similar to DOM APIs, for navigating data by employing traditional SQL processing.
There are advantages to using a framework approach as opposed to full automation: the population of database tables is normally controlled by application logic distributed between "pure" application code, transaction processing, database layer, and stored procedures, not to mention that some tables may reside in remote databases.
Additionally, in a heterogeneous database environment, having a unified tool to perform the same task for different databases has obvious advantages.
The author's solution, ObjectCentric Solutions x-Persistence Utility, implements the strategy discussed here.
Raw tree structures are defined by means of the following tables.
1) Tree Node Definition CREATE TABLE NodeDef ( nodeID udd_id NOT NULL, // unique node id dimID udd_id NOT NULL, // tree category: XML name udd_name NOT NULL, // node name value udd_paramvalue NULL, // node value value_type udd_type NULL, // value type power udd_power NULL, // node tree level isRoot udd_logical NOT NULL, // top node flag descriptor udd_descriptor NULL, // DOM Node type color udd_color NULL // informal data ) 2) Parent-Child relationships between nodes CREATE TABLE NodeLinks ( parentID udd_id NOT NULL, // parent node childID udd_id NOT NULL // child node ) |
Example usage
Storing an XML document in a database is a matter of invoking the XML2SQL program:
XML2SQL <XML document filename>
Extracting an XML document from a database as a tree structure is implemented using the following stored procedures:
get_tree_roots
<tree category> -- extract all document root ids for a given forestget_tree
<root id> -- extract all nodes for a given root (document)get_children
<node, level> -- extract all children of a certain level for a given node
Implementation details:
- Current platform coverage: Sybase, MS SQL Server.
- Platforms being evaluated: Oracle, DB2, and PostgreSQL.
- The utility is built on top of the Xerces XML Parser.
- Database installation involves the addition of just a few tables, and stored procedures to navigate the database tables as tree structures.
The x-Persistence Utility is currently used as the core of an XML-based data acquisition agent in Open Finance Laboratory, a customizable integration platform for a range of financial applications such as portfolio management, asset-liability management, risk management, etc. Open Finance Laboratory adopted FpML as a descriptive language for Interest Rate Derivatives products. x-Persistence Utility is currently used as a FpML/FRA data acquisition agent.
References
- Oracle XML-SQL Utility, http://otn.oracle.com/tech/xml/oracle_xsu
- IBM DB2 XML Extender, www.ibm.com/software/data/db2/extenders/xmlext
- XML Perspective. In control with FOR XML Explicit. SQL Server Magazine, http://msdn.mcrosoft.com/library/periodic/period01/xmlExplicit.htm
- Writing XML Using OPENXML
- OPENXML
- Technology Preview: x-Persistence Utility, ObjectCentric Solutions, Inc., 2001, www.objcentric.com