From JDOM to XmlDocument
April 3, 2002
The Microsoft .NET framework is becoming well known for its integration of XML into
nearly
all data-manipulation tasks. In the first article in this
series, I walked through the process of porting a simple Java application using SAX
to one
using .NET's XmlReader
. I concluded that there are advantages and disadvantages
to each language's way of doing things, but pointed out that if you are not a fan
of
forward-only, event-based XML parsing, neither one will really fit your fancy. This
article
focuses on porting whole-document XML programs from Java to C#.
The Exercise
I begin with one of those standard small programs that everyone has written at some point to learn about XML. I've written a Java program which used JDOM to read and write an XML file representing a catalog of compact discs. Of course this is really a program of little practical use given the relatively easy availability of similar free and open source applications; however, it represents a fairly simple problem domain, and it also allows me to show off the diversity of my CD collection.
So, to begin, here is the source listing for CDCatalog.java. I have not bothered to create any sort of DTD or schema at this time because it's a very simple document, and validation is not necessary.
package com.xml; import org.jdom.Document; import org.jdom.Element; import org.jdom.JDOMException; import org.jdom.input.SAXBuilder; import org.jdom.output.XMLOutputter; import java.io.File; import java.io.FileWriter; import java.io.IOException; import java.io.PrintStream; import java.util.Iterator; import java.util.List; public class CDCatalog { File file = null; Document document = null; public static void main(String args[]) { if (args.length > 0) { String xmlFile = args[0]; CDCatalog catalog = new CDCatalog(xmlFile); String action = args[1]; try { if (args.length == 5) { String title = args[2]; String artist = args[3]; String label = args[4]; if (action.equals("add")) { catalog.addCD(title, artist, label); } else if (action.equals("delete")) { catalog.deleteCD(title, artist, label); } } // save the changed catalog catalog.save(); } catch (Exception e) { e.printStackTrace(System.err); } } } public CDCatalog(String fileName) { try { file = new File(fileName); if (file.exists()) { loadDocument(file); } else { createDocument(); } } catch (Exception e) { e.printStackTrace(System.err); } } private void loadDocument(File file) throws JDOMException { SAXBuilder builder = new SAXBuilder(); document = builder.build(file); } private void createDocument() { Element root = new Element("CDCatalog"); document = new Document(root); } public void addCD(String title, String artist, String label) { Element cd = new Element("CD"); cd.setAttribute("title",title); cd.setAttribute("artist",artist); cd.setAttribute("label",label); document.getRootElement().getChildren().add(cd); } public void deleteCD(String title, String artist, String label) { List cds = document.getRootElement().getChildren(); for (int i = 0; i < cds.size(); i++) { Element next = (Element)cds.get(i); if (next.getAttribute("title").getValue() .equals(title) && next.getAttribute("artist").getValue() .equals(artist) && next.getAttribute("label").getValue() .equals(label)) { next.detach(); } } } public void save() throws IOException { XMLOutputter outputter = new XMLOutputter(); FileWriter writer = new FileWriter(file); outputter.output(document,writer); writer.close(); } }
I very intentionally used JDOM for this program rather than DOM; since the target of our port is a real DOM, I wanted to add the twist of starting off with an API that, while DOM-like, is really not DOM.
Compiling and running it a few times, we get this XML file (reformatted to be more human-readable):
C:\java com.xml.CDCatalog CDCatalog.xml add "Dummy" "Portishead" "Go!" <?xml version="1.0" encoding="UTF-8"?> <CDCatalog> <CD title="Dummy" artist="Portishead" label="Go!" /> <CD title="Caribe Atomico" artist="Aterciopelados" label="BMG" /> <CD title="New Favorite" artist="Alison Kraus + Union Station" label="Rounder" /> <CD title="Soon As I'm On Top Of Things" artist= "Zoe Mulford" label="MP3.com" /> <CD title="Japanese Melodies" artist="Yo-Yo Ma" label="CBS" /> <CD title="In This House, On This Morning" artist= "Wynton Marsalis Septet" label="Columbia" /> </CDCatalog>
Everything's a Node
XmlDocument
is the .NET XML document tree view object. Much like JDOM's
Document
class, XmlDocument
allows you to access any node in the
XML tree randomly. Unlike Document
, however, XmlDocument
is itself
a subclass of XmlNode
. I didn't talk about XmlNode
in the previous
article, so let's take a look at it and some of its members now. If you are already
familiar
with DOM, this will explain how .NET implements it; if not, this should serve as a
basic
introduction, with the caveat that we're talking specifically about the .NET
implementation.
The System.Xml
assembly is somewhat monolithic; everything you might find in
an XML document is a subclass of XmlNode
. Besides XmlDocument
,
this includes the document type (XmlDocumentType
), elements
(XmlElement
), attributes (XmlAttribute
), CDATA
(XmlCDataSection
), even the humble entity reference
(XmlEntityReference
). And, as you can tell, the object names are very
descriptive.
As an experienced object-oriented developer, you know that this makes for some very
nicely
polymorphic code. All of XmlNode
's subclasses inherit, override, or overload
several important members. Among the properties that you can access are the following.
Name
- the name of the nodeNamespaceURI
- the namespace of the nodeNodeType
- the type of the nodeOwnerDocument
- the document in which the node appearsPrefix
- the namespace prefix of the nodeValue
- the value of the node
I said all of these members are properties. It may at first seem like a violation of
object-oriented encapsulation, but you may access any of these properties directly
if you
want to change, for example, the value of a node. Once you realize that getting or
setting a
C# property actually involves calling an implicit accessor method, however, any objections
to this technique should disappear. These accessor methods are generated automatically
through the use of a special syntax, which you'll see if you look up
XmlNode.Value
, for example, in the .NET framework SDK reference:
public virtual string Value {get; set;}
An explanation of the mechanism at work here is beyond the scope of this article; suffice to say, it works.
There are exceptions to this willy-nilly access to C# properties, of course; you cannot
set the NodeType
of an XmlNode
, because it already is
what it is. Also, you can set some properties of some node types and not others; for
example, while you can set the Value
of an XmlAttribute
,
attempting to set the Value
of an XmlElement
will cause an
InvalidOperationException
to be thrown, because an element cannot have a
value (though it can certainly have children elements and attributes and other types
of
nodes).
Additionally, some nodes may be marked as read-only. You can check the
IsReadOnly
property of any XmlNode
to verify whether it can be
changed. Setting some properties of a read-only node will cause an
ArgumentException
to be thrown. In those cases where the node is read-only,
all you can really do is remove it from one location in the tree and insert it
elsewhere.
XmlNode
s have methods as well as properties. Some of the ones you'll use a lot
are AppendChild()
and InsertAfter()
, both of whose names are
fairly descriptive.
That's enough description for now, let's dive into the code.
Ready, Set, Port
Remembering our basic C# language lessons from part one, let's skip right on ahead to the API-specific code.
There are a few changes we have to worry about that were not relevant in our first
exercise. For example, while there is a File
class in C#, it's not directly
comparable to the Java File
. For example, nearly all its methods are
static.
As I mentioned earlier, we're converting our code not just from Java to C#, but from
JDOM
to DOM. While this is not necessarily a formidable task, it does complicate things
a bit.
Perhaps the easiest part of this port will be changing instances of JDOM's
Document
and Element
to C#'s XmlDocument
and
XmlElement
, respectively.
JDOM's way of doing things is often the reverse of DOM's way. For example, in our
Java
createDocument()
method, we instantiate a root Element
and then
instantiate the Document
, passing in the Element
. In C#, we
instantiate the XmlDocument
, call its CreateElement()
method to
create an XmlElement
, and then insert the resultant XmlElement
as
the root element of the tree with AppendChild()
.
A similar pattern is used in the AddCD()
method; the
XmlDocument
's CreateElement()
method is called, and the
XmlElement
is then inserted as a child of the root element with
AppendChild()
. In short, the XmlDocument
serves a dual role as a
representation of the document itself and as a factory to create new elements.
Another difference between JDOM and System.Xml, and indeed between JDOM and DOM itself,
is
that while JDOM deals exclusively with standard Java collections (such as List
,
in deleteCD()
), DOM defines its own collections of nodes. In our C#
DeleteCD()
method, we're dealing with an XmlNodeList
. In
practical terms, an XmlNodeList
is not dealt with any differently than a Java
list
.
Down Another Path
Upon further thought, it seems like our original Java program is missing something;
there's
no way to search for entries. This sounds like a job for XPath. XPathNavigator
implements XPath in .NET, and you create an XPathNavigator
by calling
XmlNode.CreateNavigator()
. The following XPath will find the Zoe Mulford CDs
in my collection:
//CD[@artist='Zoe Mulford']
In C# code, we'll compile this path and select those nodes that match it, then print them.
XPathNavigator nav = document.CreateNavigator(); XPathNodeIterator iterator = nav.Select("//CD[@artist=normalize-space('" + artist + "')]"); Console.WriteLine("CDs by {0}", artist); while (iterator.MoveNext()){ XPathNavigator nav2 = iterator.Current.Clone(); Console.WriteLine(" \"{0}\"", ((IHasXmlNode)nav2).GetNode().Attributes[0].Value); }
You'll notice that I added a call to normalize-space()
in the XPath. If you're
not familiar with it,
normalize-space()
strips off leading and trailing white space from the value
and reduces any repeating whitespace to a single space character. While it's not strictly
necessary, I thought it might be useful in this case because the data, which was entered
manually by a person, might not be normalized.
By using XPath, we can navigate directly to any CDs we want to find. This may or may not be any more efficient than searching through an entire CD catalog by hand; but it is easier to use in a consistent manner and has the added benefit that any performance improvements in the .NET runtime will automatically be reflected in your application.
So, here's our final code listing.
using System; using System.IO; using System.Xml; using System.Xml.XPath; public class CDCatalog { FileStream file = null; XmlDocument document = null; public static void Main(string [] args) { if (args.Length > 0) { string xmlFile = args[0]; CDCatalog catalog = new CDCatalog(xmlFile); string action = args[1]; if (args.Length == 5) { string title = args[2]; string artist = args[3]; string label = args[4]; if (action == "add") { catalog.AddCD(title, artist, label); } else if (action == "delete") { catalog.DeleteCD(title, artist, label); } } else if (args.Length == 3) { string artist = args[2]; if (action == "find") { catalog.SearchForArtist(artist); } } // save the changed catalog catalog.Save(); } } public CDCatalog(string fileName) { if (File.Exists(fileName)) { LoadDocument(fileName); } else { CreateDocument(fileName); } } private void LoadDocument(string fileName) { file = File.Open(fileName,FileMode.Open); document = new XmlDocument(); document.Load(file); } private void CreateDocument(string fileName) { file = File.Create(fileName); document = new XmlDocument(); XmlElement root = document.CreateElement("CDCatalog"); document.AppendChild(root); } public void AddCD(string title, string artist, string label) { XmlElement cd = document.CreateElement("CD"); cd.SetAttribute("title",title); cd.SetAttribute("artist",artist); cd.SetAttribute("label",label); document.DocumentElement.AppendChild(cd); } public void DeleteCD(string title, string artist, string label) { XmlNodeList cds = document.DocumentElement.ChildNodes; for (int i = 0; i < cds.Count; i++) { XmlElement next = (XmlElement)cds[i]; if (next.GetAttribute("title") == title && next.GetAttribute("artist") == artist && next.GetAttribute("label") == label) { document.DocumentElement.RemoveChild(next); } } } public void SearchForArtist(string artist) { XPathNavigator nav = document.CreateNavigator(); XPathExpression expr = nav.Compile( "//CD[@artist=normalize-space('" + artist + "')]"); XPathNodeIterator iterator = nav.Select(expr); Console.WriteLine("CDs by {0}:", artist); while (iterator.MoveNext()){ XPathNavigator nav2 = iterator.Current.Clone(); Console.WriteLine(" \"{0}\"", ((IHasXmlNode)nav2).GetNode().Attributes[0].Value); } } public void Save() { file.Position = 0; XmlTextWriter writer = new XmlTextWriter( new StreamWriter(file)); document.WriteTo(writer); file.SetLength(file.Position); writer.Close(); } }
And now we'll compile and run a couple of tests:
C:\>csc /debug /r:System.Xml.dll /t:exe CDCatalog.cs Microsoft (R) Visual C# .NET Compiler version 7.00.9466 for Microsoft (R) .NET Framework version 1.0.3705 Copyright (C) Microsoft Corporation 2001. All rights reserved. C:\>CDCatalog CDCatalog.xml add "High Strung Tall Tales" "Adrian Legg" "Relativity" C:\>CDcatalog CDCatalog.xml find "Alison Kraus + Union Station" CDs by Alison Kraus + Union Station: "New Favorite"
Conclusions
I've shown you how to port your SAX Java code to XmlReader
and your JDOM Java
code to XmlDocument
, with a small helping of XPath. These are the basic
technologies that most developers are familiar with, and you should now be ready to
apply
them in your C# programming.
But the original task I set out to accomplish was to see what could be learned from
Microsoft's XML APIs. In my first article, I concluded that Microsoft's one-stop-shopping
is
both positive and negative, depending on your point of view. However, I'm beginning
to see a
greater benefit to this single source of objects; the XmlNodeType
that you deal
with in XmlReader
is exactly the same object that you deal with in DOM.
This could easily have the benefit of shortening your learning cycle, as well as making
your
code more reusable. The Java community could certainly stand to learn something here.
In the next installment of this series, I'll take another look at the venerable RSSReader,
and make it a better C# program by using XmlReader
the way it was meant to be
used, as a pull-parser. And I'll compare that to some pull-parsers in the Java world.