XML and Database Mapping in .NET
October 23, 2002
The designers of the .NET runtime put a lot of thought into the issue of binding XML to a database. The path from XML to database and back again starts with ADO.NET, .NET's database persistence layer, which can be thought of as analogous to Java's JDBC.
The System.Data.SqlClient
namespace has all the classes you'd expect to see,
like the SqlConnection
, SqlCommand
, and DataReader
(analogous to the Java Connection
, SqlCommand
, and
ResultSet
respectively). The classes in this namespace, whose names all begin
with Sql
, are meant to work with Microsoft's database of choice, SQL Server.
There is an entire suite of classes in the namespace System.Data.OleDb
, whose
names start with OleDb
, which work with any OLE DB provider. If you've used
JDBC, you already know basically how to use these classes.
Along with these classes, ADO.NET also introduces the DataSet
. A
DataSet
instance represents an entire database, including the ability to
track changes made to individual data elements and to persist them to the underlying
database when necessary. The DataSet
can maintain its state while disconnected
from the actual database.
The DataSet
can be used to build a data model without writing any SQL. It can
also be used to persist the data model to an XML Schema or to read an existing XML
Schema
and dynamically build the data model. The DataSet
can also read and write its
data to XML.
On With the Show
You'll recall that in previous articles I used a dog show to demonstrate generating classes in Java and C# from an XML schema. This time, since we're building a database, the database schema for the dog show will require the following schema:
Normally if you were building a database like this, you would use SQL or have access
to a
graphical entity relationship modeling tool. With the .NET DataSet
class,
however, you can build a database schema dynamically. The following code demonstrates
one
way to do that:
using System; using System.Data; public class MakeDataSet { public static void Main(string [] args) { DataSet dataSet = new DataSet("DogShow"); // create Show table DataTable showTable = dataSet.Tables.Add("Show"); DataColumn showIdColumn = showTable.Columns.Add("Id", typeof(Int32)); showTable.Columns.Add("Name", typeof(String)); showTable.PrimaryKey = new DataColumn [] {showIdColumn}; // create Breed table DataTable breedTable = dataSet.Tables.Add("Breed"); DataColumn breedIdColumn = breedTable.Columns.Add("Id", typeof(Int32)); breedTable.Columns.Add("Name", typeof(String)); breedTable.PrimaryKey = new DataColumn [] {breedIdColumn}; // create Dog table DataTable dogTable = dataSet.Tables.Add("Dog"); DataColumn dogIdColumn = dogTable.Columns.Add("Id", typeof(Int32)); dogTable.Columns.Add("Name", typeof(String)); DataColumn dogBreedIdColumn = dogTable.Columns.Add("BreedId", typeof(Int32)); dogTable.PrimaryKey = new DataColumn [] {dogIdColumn}; // create foreign key relationship dataSet.Relations.Add("DogBreed", breedIdColumn, dogBreedIdColumn); // create Judge table DataTable judgeTable = dataSet.Tables.Add("Judge"); DataColumn judgeIdColumn = judgeTable.Columns.Add("Id", typeof(Int32)); judgeTable.Columns.Add("FirstName", typeof(String)); judgeTable.Columns.Add("LastName", typeof(String)); judgeTable.PrimaryKey = new DataColumn [] {judgeIdColumn}; // create ShowRing table DataTable showRingTable = dataSet.Tables.Add("ShowRing"); DataColumn showRingIdColumn = showRingTable.Columns.Add("Id", typeof(Int32)); showRingTable.Columns.Add("Name", typeof(String)); showRingTable.PrimaryKey = new DataColumn [] {showRingIdColumn}; // create Judging table DataTable judgingTable = dataSet.Tables.Add("Judging"); judgingTable.Columns.Add("ShowTime", typeof(DateTime)); DataColumn judgingBreedIdColumn = judgingTable.Columns.Add("BreedId", typeof(Int32)); DataColumn judgingJudgeIdColumn = judgingTable.Columns.Add("JudgeId", typeof(Int32)); DataColumn judgingShowRingIdColumn = judgingTable.Columns.Add("ShowRingId", typeof(Int32)); DataColumn judgingShowIdColumn = judgingTable.Columns.Add("ShowId", typeof(Int32)); // create foreign key relationships dataSet.Relations.Add("JudgingBreed", breedIdColumn, judgingBreedIdColumn); dataSet.Relations.Add("JudgingJudge", judgeIdColumn, judgingJudgeIdColumn); dataSet.Relations.Add("JudgingShowRing", showRingIdColumn, judgingShowRingIdColumn); dataSet.Relations.Add("JudgingShow", showIdColumn, judgingShowIdColumn); } }
Where's the DB?
I mentioned that the DataSet
class can represent a disconnected view of a
database. In fact, as in this case, a DataSet
doesn't really need a database
backing it at all. You can even think of the XML output we're producing as the database,
although it lacks all the usual ACID properties of a relational database. Still, it
makes
for a convenient demonstration of some of the XML functionality.
Having created the DataSet
for the dog show, you could now go on and populate
it with data using code something like this:
DataRow show = showTable.NewRow();
show["Id"] = 1;
show["Name"] = "O'Reilly
Invitational Dog Show";
showTable.Rows.Add(show);
That works well enough, although it's a lot of work, and there's nothing being done
at
compile time to guarantee that you actually have a column named "Id" in the
DataTable
. It would be much better if you could access tables and columns in
a more type-safe way. Writing the code to do that could be a lot of work, but you'd
only
have to do it once. Actually, the .NET framework can do a lot of the heavy lifting
for you.
In addition to the work you saw it do in the last article, the xsd
tool can
generate a subclass of DataSet
for a specific W3C XML Schema.
But first you need a schema. The DataSet
class has a
WriteXmlSchema()
method which creates an W3C XML Schema from the
DataSet
. You simply need to add a line to the end of the
MakeDataSet
program to write the schema to a file:
dataSet.WriteXmlSchema("DogShow.xsd");
Now you can generate a DogShow
class with the following command line:
xsd /dataset DogShow.xsd
The resulting code, placed by default in the file DogShow.cs
, is quite
lengthy, and I won't include the source listing here (you can read it here). Suffice it to say that xsd
has generated a class which is a subclass of DataSet
, which you can use to
directly access tables and columns. It's quite convenient.
Here's the source code for a program which uses the generated class to create an
instance
of DogShow
and add data to it. After that, it writes the data to an XML file.
using System; using System.Data; public class DogShowWriter { public static void Main(string [] args) { DogShow dogShow = new DogShow(); // insert a show row DogShow.ShowRow show = dogShow.Show.AddShowRow(1, "O'Reilly Invitational"); // insert a breed row DogShow.BreedRow breed = dogShow.Breed.AddBreedRow( 1, "English Springer Spaniel"); // insert a couple of dog rows dogShow.Dog.AddDogRow( 1, "Wil-Orion's Angus Highlander", breed); dogShow.Dog.AddDogRow( 2, "Len-Lear's Webmaster", breed); dogShow.Dog.AddDogRow( 3, "Ch. Sallylyn's Condor", breed); // insert a judge row DogShow.JudgeRow judge = dogShow.Judge.AddJudgeRow(1, "John", "Smith"); // insert a show ring row DogShow.ShowRingRow showRing = dogShow.ShowRing.AddShowRingRow(1, "Ring 1"); // insert a judging row DateTime judgingTime = new DateTime(2002,10,20,14,00,00); dogShow.Judging.AddJudgingRow(judgingTime, breed, judge, showRing, show); // write the data dogShow.WriteXml("DogShow.xml"); } }
You can see that with the generated DogShow
class, it's much easier to write
and debug your code, for a few reasons. First, since the compiler will warn you when
you
reference a table or column that doesn't exist, you're much less likely to run into
runtime
errors. Second, xsd
has generated constructors for each table's rows, so you
don't need to go through the tedious effort of referencing each column through the
row's
indexer property.
You should note the last line of code, where I've called
dogShow.WriteXml("DogShow.xml");
. It creates an XML file representing the
DataSet
that I just created and filled with data. Here's the file:
<?xml version="1.0" standalone="yes"?> <DogShow> <Show> <Id>1</Id> <Name>O'Reilly Invitational</Name> </Show> <Breed> <Id>1</Id> <Name>English Springer Spaniel</Name> </Breed> <Dog> <Id>1</Id> <Name>Wil-Orion's Angus Highlander</Name> <BreedId>1</BreedId> </Dog> <Dog> <Id>2</Id> <Name>Len-Lear's Webmaster</Name> <BreedId>1</BreedId> </Dog> <Dog> <Id>3</Id> <Name>Ch. Sallylyn's Condor</Name> <BreedId>1</BreedId> </Dog> <Judge> <Id>1</Id> <FirstName>John</FirstName> <LastName>Smith</LastName> </Judge> <ShowRing> <Id>1</Id> <Name>Ring 1</Name> </ShowRing> <Judging> <ShowTime>2002-10-20T14:00:00.0000000-04:00</ShowTime> <BreedId>1</BreedId> <JudgeId>1</JudgeId> <ShowRingId>1</ShowRingId> <ShowId>1</ShowId> </Judging> </DogShow>
This file can also be read back into a DataSet
-- or into an instance of the
generated DogShow
class -- with the ReadXml()
method.
Making the Data Fit
So now we've got an XML file which represents the O'Reilly Invitational Dog Show. Unfortunately, it's not the same as the one in the last article, which was more tree-oriented. The XML generated here is more table-oriented, which is entirely appropriate, given that we are talking about a relational database. It would be nice to be able to transform the original tree-oriented XML into the new table-oriented schema. Seems like there ought to be a tool to do that.
Thanks to the fact that all of .NET's XML classes are tightly integrated, you can
use XSLT
to transform a document on disk into an XmlDocument
in memory, and then pass
that to another class. So, given an appropriate XSLT stylesheet named "DogShow.xsl",
the
following code will read it right into the DataSet
:
using System; using System.Data; using System.Xml; using System.Xml.Xsl; public class DogShowTransformer { public static void Main(string [] args) { // Create a DataSet instance DogShow dogShow = new DogShow(); // Read the data XmlDocument document = new XmlDocument(); document.Load("DogShow.xml"); // Read the stylesheet XslTransform xslt = new XslTransform(); xslt.Load("DogShow.xsl"); // Transform the data XmlReader reader = xslt.Transform(document, null); // Load the DataSet dogShow.ReadXml(reader); } }
Now What?
DataSet
does quite a bit more than I can cover here. For example, I mentioned
its ability to keep track of changes while disconnected from the database. The
DiffGram
is responsible for this ability. The DiffGram
is itself
an XML document, and it is used to serialize DataSet
s across a web service. The
DiffGram
consists of three sections: one containing the current data,
<DataInstance>
; one containing any changes to the DataSet
,
<diffgr:before>
; and one listing any errors,
<diffgr:errors>
. Any changes in the <diffgr:before>
section and errors in the <diffgr:errors>
section are related back to the
data in the <DataInstance>
section by the diffgr:id
attribute.
I haven't touched on the XmlDataDocument
class yet, either. This class
enables you to synchronize multiple of views of the same DataSet
, much as I
demonstrated with XSLT earlier. (The main reason I did not use XmlDataDocument
is that it depends on the names of elements being the same in both schemas, so that
the
elements can be mapped to each other properly. As you will recall, in the previous
article,
the XML schema had different element names.)
There's still a whole lot more to be covered, including the SqlCommand
's
ExecuteXmlReader()
method, which returns an XmlReader
instance
you can use to read data from a database with the regular .NET XML tools.
I continue to be impressed with the extent of .NET's XML integration. All the major technologies are represented, and Microsoft has use XML in innovative ways throughout the framework. I believe that the Java XML community can certainly learn a thing or two from .NET.
Resources |
Postscript
One of the challenges of comparing the handling of XML in Java and C# is the fact that XML processing is not built into the Java class library. When deciding what features to compare, I've often resorted to searching the Web to find comparable Java tools for features that are a standard part of the .NET Common Language Runtime.
In my last article, I wanted to compare XML databinding in Java and C#. I knew that the .NET CLR contained excellent support for generating classes from an XML Schema, and I knew that Castor also supported databinding. As I examined Castor, I saw that it also supported binding Java objects to relational databases using something called JDO, and I put this fact aside to investigate in a future article.
Unfortunately, I forgot to read the fine print:
Does Castor JDO comply with the SUN JSR-000012 specification?
No, Castor JDO doesn't comply with the SUN's JDO specification.
Although Castor JDO carries very similar goals as SUN's JDO, it has been developed independently from the JSR.
And this pretty much tells the story when it comes to support for XML in Java. There are many implementations; some predate the applicable standards, and some conform to the standards.