XSH, An XML Editing Shell
July 10, 2002
Introduction
A few months ago we briefly examined some of the command line utilities available
to users of Perl and XML. This month we will continue in that vein by looking at the
300-pound gorilla of Perl/XML command line tools, Petr Pajas' intriguing
XML::XSH
.
XML::XSH
and the xsh
executable provide a rich shell environment
which makes performing common XML-related tasks as terse and straightforward as using
a UNIX
shells like bash
or csh
. Yes, that's right -- an XML editing
shell. As we will see, it's not as crazy as it seems.
xsh Basics
Before we look at xsh
's advanced tricks, let's get familiar with the
environment it provides. We'll begin by starting the xsh
shell:
[user@host user] xsh -i ----------------------------------------------------- xsh - XML Editing Shell version 0.9 (Revision: 1.6) ----------------------------------------------------- ... xsh scratch:/>
The xsh
shell starts in interactive mode, creating a new default scratch pad
document, called new_document.xml
with the ID scratch
. The shell
prompt takes the form of the current document's ID (scratch, in this case), followed
by a
colon, and then the current working context within that document expressed as an XPath
location (/, in this case). In other words, we can tell from the prompt that we are
at the
root (/) level of the current XML document, whose ID is scratch
.
We can open an existing XML document from the file system in order to figure out how to navigate within and between documents:
xsh scratch:/> open cams=files/camelids.xml parsing files/camelids.xml done. xsh cams:/>
The open
command opens the document camelids.xml
from the
directory files
in the same directory in which we started the xsh
shell, assigns it the ID of cams,
and changes the working context to the root
(/) of that document.
To list the elements contained in the current context we use the ls
command.
xsh cams:/> ls <?xml version="1.0" encoding="iso-8859-1"?> <camelids>...</camelids> Found 1 node(s). xsh cams:/>
Also in Perl and XML |
OSCON 2002 Perl and XML Review PDF Presentations Using AxPoint Multi-Interface Web Services Made Easy |
Since the current context is the abstract root of the document, we see the XML declaration
and the sole top-level <camelids>
element. If our document contained
processing instructions or a Document Type Definition between the XML declaration
and the
top-level element, they would appear here, too.
Right through here is where is where things get interesting. Just like its UNIX shell
cousins, many of xsh
's commands accept paths as arguments, specifying the
context in which that command is evaluated. The difference is that in xsh
those
paths are XPath expressions which provide access to the contents of the open XML documents,
rather than file system paths that provide an interface to the files and directories
of the
mounted volumes.
So, for example, if we wanted list all of the <habitat>
elements in our
camelids document, we need only supply the appropriate XPath expression to the
ls
command:
xsh cams:/> ls //habitat
This yields:
<habitat> Bactrian camels' habitat consists mainly of Asia's deserts. The temperature ranges from -29 degrees Celsius in the winter to 38 degrees Celsius in the summer. </habitat> <habitat> Dromedary camels prefer desert conditions characterized by a long dry season and a short rainy season. Introduction of the dromedary into other climates has proven unsuccessful as the camel is sensitive to the cold and humidity (Nowak 1991). </habitat> <habitat> Llamas are found in deserts, mountainous areas, and grasslands. </habitat> <habitat> Guanacos inhabit grasslands and shrublands from sea level to 4,000m. Occasionally they winter in forests. </habitat> <habitat> Vicunas are found in semiarid rolling grasslands and plains at altitudes of 3,500-5,750 meters. These lands are covered with short and tough vegetation. Due to their daily water demands, vicunas live in areas where water is readily accessible. Climate in the habitat is usually dry and cold. Nowak (1991), Grizmek (1990). </habitat> Found 5 node(s). xsh cams:/>
Or, if we want our query to be more specific, we can use predicate expressions in our XPath statement. For example,
xsh cams:/> ls //habitat[ancestor::species/@name='Lama guanicoe']
to select just the Guanaco's habitat element.
Similarly, we can change the command evaluation context within the current document
by
giving an XPath expression to the cd
command:
xsh cams:/> cd //species[@name='Camelus dromedarius']/natural-history xsh cams:/camelids/species[2]/natural-history>
Which causes the context location in our shell prompt to change to reflect the new
context
to which we have navigated. Thus, commands not explicitly passed an absolute location
path
will be evaluated in the context of the <natural-history>
element
contained in the document's second <species>
element (the one whose
name
attribute is equal to "Camelus dromedarius"). Thus, if we give the
ls
commadn with no path specified, we'll see the contents of the new
context:
xsh cams:/camelids/species[2]/natural-history> ls <natural-history> <food-habits>...</food-habits> <reproduction>...</reproduction> <behavior>...</behavior> <habitat>...</habitat> </natural-history> Found 1 node(s). xsh cams:/camelids/species[2]/natural-history>
In addition, xsh
provides a way to execute commands on any currently open
document without changing the element context by prepending that document's ID and
a colon
to the XPath expression:
xsh cams:/camelids/species[2]/natural-history> cd / xsh cams:/> open xmlnews=http://www.xml.com/xml/news.rss parsing http://www.xml.com/xml/news.rss done. xsh xmlnews:/> xsh xmlnews:/> ls cams:/camelids/species[3]/common-name <common-name>Llama</common-name> Found 1 node(s). xsh xmlnews:/>
Notice that the context changed to the root of the newly opened RSS document once
it is
parsed into memory, but we still have easy access to the data contained in the camelids
document by adding that document's ID (cams
) and a colon to the front of the
path.
Also note that the location of the file passed to the open
command is not
limited to files on the local machine; it can also be an HTTP or FTP URL, so long
as a
well-formed XML document is returned.
To see a list of all the currently open documents and their associated IDs, use the
files
command:
xsh xmlnews:/> files cams = files/camelids.xml xmlnews = http://www.xml.com/xml/news.rss xsh xmlnews:/>
Closing an open document is as easy as passing its ID to the close
command.
xsh xmlnews:/> close xmlnews closing file http://www.xml.com/xml/news.rss xsh :>
If we wanted to save a local copy of the xmlnews
document before closing, we
would use the saveas
command:.
xsh xmlnews:/> saveas xmlnews files/xmldotcom_news.rss xmlnews=new_document1.xml --> files/xmldotcom_news.rss (utf-8) saved xmlnews=files/xmldotcom_news.rss as files/xmldotcom_news.rss in utf-8 encoding xsh :>
We've now reviewed xsh
basics: we can start the shell, open, close, and
navigate through contents of XML documents. If this is all there was to xsh
, it
would still be a winner as an XPath testbed and teaching tool (making it quite useful
to
users of XSLT and XPathScript, as well as XML::LibXML
and the other Perl
modules which offer an XPath interface). But xsh
bills itself as an XML
editing shell, and as we will see, it's that and a fair bit more.
Creating and Editing XML Documents
We can begin by creating a new XML document:
xsh :> create mynews news-channels
This creates a new document with the ID mynews
with the top-level element
news-channels
and changes the context to the root of the new document. Let's
have look:
xsh mynews:/> ls <?xml version="1.0" encoding="utf-8"?> <news-channels/> xsh mynews:/>
So far, so good. Now lets add an element to the news-channels
element.
xsh mynews:/> cd news-channels xsh mynews:/news-channels> add element channel into .
We use the add
command to add a channel
element into the current
context element, which is represented by a period character, and we can verify the
result by
listing the current context:
xsh mynews:/news-channels> ls <news-channels><channel/></news-channels> Found 1 node(s). xsh mynews:/news-channels>
Note that the first argument for the add
command must be the type of
node being added (element
in this case, ).
Suppose we need to add a name
attribute to the new channel
element, as well as an rss-url
child element.
xsh mynews:/news-channels> add attribute "name='seepan uploads'" into ./channel[1] xsh mynews:/news-channels> add element rss-url into ./channel[1]
Next, we'll add the URL of the CPAN RSS file as text node of the rss-url
element:
xsh mynews:/news-channels> add text "http://search.cpan.org/recent.rdf" into ./channel[1]/rss-url
Let's add another channel
element:
xsh mynews:/news-channels> add element channel before //channel[1] xsh mynews:/news-channels> add attribute "name='perl news'" into ./channel[1] xsh mynews:/news-channels> add element rss-url into ./channel[1] xsh mynews:/news-channels> add text "http://search.cpan.org/recent.rdf" into ./channel[1]/rss-url
We used the before
location expression as the third argument to the
add
command, specifying the first channel
element as the
evaluation context. This inserts the new channel into the list as the preceding sibling
of
the previously created channel.
Again, we van verify this by listing all the channels in the document:
xsh mynews:/news-channels> ls //channel <channel name="perl news"><rss-url>http://www.perl.com/pace/perlnews.rdf </rss-url></channel> <channel name="seepan uploads"><rss-url>http://search.cpan.org/recent.rdf </rss-url></channel> Found 2 node(s). xsh mynews:/news-channels>
Careful readers will have noticed the "seepan" typo -- we can fix this using map,
which applies a block of Perl code to nodes returned by the subsequent XPath
expression:
xsh mynews:/news-channels> map { $_ = 'cpan uploads' } //channel[2]/@name
Here's a view of the full contents of our new document, obtained by listing the document's root:
xsh mynews:/news-channels> ls / <?xml version="1.0" encoding="utf-8"?> <news-channels> <channel name="perl news"> <rss-url>http://www.perl.com/pace/perlnews.rdf</rss-url> </channel> <channel name="cpan uploads"> <rss-url>http://search.cpan.org/recent.rdf</rss-url> </channel> </news-channels> Found 1 node(s).
Our new document is a bit simplistic, to be sure. But our goal here is just to demonstrate
the basics of editing documents with xsh
. What we've learned so far can be
applied to the most complex XML documents.
To finish up, let's save our new document to disk and quit the shell:
xsh mynews:/news-channels> saveas mynews files/perl_channels.xml mynews=new_document2.xml --> files/perl_channels.xml (utf-8) saved mynews=files/perl_channels.xml as files/perl_channels.xml in utf-8 encoding xsh mynews:/news-channels> xsh mynews:/news-channels> exit [user@host user]$
xsh
Scripting
No shell would be complete without the ability to perform automated or scripted tasks.
As a
final example, let's create an xsh
script, which uses the data contained in the
perl_channels.xml
document we just created, to fetch all the current Perl
news items from all the channels into a single XML document:
quiet; open sources=files/perl_channels.xml; create merge news-items; $i = 0; foreach sources://rss-url { $name = string(.); open $i=$name; xcopy $i://item into merge:/news-items; close $i; $i=$i+1; }; close sources; saveas merge files/headlines.xml; close merge;
Looking closer at this script we see that it loads the perl_channels.xml
document, iterates over all of its <rss-url>
elements, fetches each
document from the Web using the open
command to grab the URL, and copies all of
each channel's <item>
elements into a new document. The new document is
then saved to disk as headlines.xml
before exiting.
Starting to see why an XML editing shell isn't such a crazy idea? I know I am.
Going Further
I've offered a glimpse of the ease and power that xsh
provides, but there are
many more commands and features available. For example,
xslt doc1 some_stylesheet.xsl doc2
transforms the document with the ID doc1
using the XSLT stylesheet
some_stylesheet.xsl
and stores the result in new document with the ID
doc2
.
Similarly, the command
xupdate myxupdate doc1
alters the content of the doc1
document using the rules contained in the
XUpdate document stored in myxupdate
.
For a complete list of commands, type help command
at the
xsh
prompt, or help commandname
for detailed usage
of a specific command.
Conclusions
I was initially skeptical about the notion of an "XML editing shell". At first glance,
it
seemed to me to be pushing the file path/XPath metaphor a bit too far; surely it's
little
more than a technical curiosity? But I was very wrong, and I don't mind admitting
it.
XML::XSH
is an astonishingly powerful tool which has quickly become a new
tool in my daily XML work. I highly recommend it.