Introducing XML::SAX::Machines, Part One
February 13, 2002
Introduction
In recent columns we have seen that SAX provides a modular way to generate and filter XML content. For those just learning how SAX works, though, the task of hooking up to the correct parser-generator-driver and building chains of filters can be tricky. More experienced SAX users may have a clearer picture of how to proceed, but they often find that initializing complex filter chains is tedious and lends itself to lots of duplicated code.
Consider the following simple filter chain script:
use XML::SAX::ParserFactory; use XML::SAX::Writer; use My::SAXFilter::One; use My::SAXFilter::Two; use My::SAXFilter::Three; my $writer = XML::SAX::Writer->new(); my $filter3 = My::SAXFilter::Three->new( Handler => $writer ); my $filter2 = My::SAXFilter::Two->new( Handler => $filter3 ); my $filter1 = My::SAXFilter::One->new( Handler => $filter2 ); my $parser = XML::SAX::ParserFactory->parser( Handler => $filter1 ); $parser->parse_uri( $xml_file );
Not too bad for this tiny example, perhaps, but imagine how it might look in a
complex system with 10 or 15 filters all doing their part. Also, new SAX users often
stumble
over the fact that the handler chain must be built in reverse order ($filter3
has to be initialized before $filter2
so it can be passed in as the handler
class, for example). Yet another potential weakness in this script is that the filters
in
the chain are hard-coded from the start. While it is possible to make some aspects
more
flexible, adding the ability to have a dynamic list of filters only adds to the complexity
of the script.
Barrie Slaymaker's outstanding new XML::SAX::Machines
addresses both the
complexity and the tedium of creating SAX systems. Compare the following snippet to
the one
above.
use XML::SAX::Machines qw( :all ); my $machine = Pipeline( "My::SAXFilter::One", "My::SAXFilter::Two", "My::SAXFilter::Three", \*STDOUT ); $machine->parse_uri( $xml_file );
Less verbose, more intuitive (note that the chain is declared in processing order) and, perhaps most importantly, making the filter chain dynamic is as simple as creating a list of strings containing module names:
my $machine = Pipeline( @filter_list, \*STDOUT );
Where @filter_list
is built dynamically elsewhere in the application.
The story does not end there, however. XML::SAX::Machines
and its associated
Machine
classes provide a small host of options for building easy-to-maintain
SAX-based XML processing systems. Over the next two months we will be looking at this
inventive distribution, beginning with this month's introduction.
Machine Types
XML::SAX::Machines
is high-level wrapper class that allows its various
Machine
classes (which may also be used as standalone libraries) to be easily
chained together to create complex SAX filtering systems. XML::SAX::Machines
currently installs and knows about several Machines
by default.
Pipeline
Implemented by XML::SAX::Pipeline
, a Pipeline
provides a way to
set up a linear series of filters (or other Machines) that works like the traditional
hand-rolled SAX filter chain that we looked at in the introduction. That is, the events
fired go directly to the next filter or handler on the chain with no intervention.
my $machine = Pipeline( "My::SAXFilter::One", "My::SAXFilter::Two", "My::SAXFilter::Three", \*STDOUT );
In this example, the three filter classes are fired in linear order with the results
of
My::SAXFilter::One
being sent to My::SAXFilter::Two
and so on.
Manifold
Manifold Machines
provide a way to create multi-pass filters. The events are
cached at the beginning of the Manifold
's run and duplicate copies of that
event stream are sent through the filters one by one and recompiled into a single
document
upon completion. It is implemented by XML::SAX::Manifold
.
my $machine = Pipeline( Manifold( "My::SAXFilter::A", "My::SAXFilter::B", "My::SAXFilter::C", ), \*STDOUT );
Here, events fired during parsing are buffered and sent directly to each of the three filters (in order) and the output of each of the filters is merged into a single stream before being handed off to the Writer class.
Tap
Implemented by XML::SAX::Tap
, a Tap
offers a way to insert a
class that examines one or more SAX events, but in no way alters the data passed to
the next
filter or handler. This can be extremely useful for cases where you want to examine
the
result of a given filter or other Machine part for debugging purposes. The handler
that you
use for your Tap
need not forward the events as a typical filter would since
the same events will also be sent to the next handler in the chain as if the
Tap
did not exist. Note:
my $machine = Pipeline( "My::SAXFilter::One", "My::SAXFilter::Two", Tap( "My::SAXDumper" ), "My::SAXFilter::Three", \*STDOUT );
In this case, we have taken the Pipeline
from above and added a
Tap
to send events fired by My::SAXFilter::Two
to our SAXDumper
for debugging.
ByRecord
ByRecord
carves up record-oriented XML documents and sends each record through
each filter in the ByRecord
machine as a separate event stream delimited by
start_document
and end_document
events. All other events (data
outside of the records) are forwarded appropriately to the downstream filter or handler.
It
is implemented by XML::SAX::ByRecord
my $machine = Pipeline( ByRecord( "My::RecordFilter::One", "My::RecordFilter::Two", ), "My::SAXFilter::One", "My::SAXFilter::Two", "My::SAXFilter::Three", \*STDOUT );
In this case, we have taken the Pipeline
from above and added a
ByRecord
Machine to process the record-oriented parts of the document before
beginning the rest of the Pipeline
chain.
Now that we have an idea of the various Machines that are currently available, let's get straight to this month's code example.
Example -- Adding Custom Tag Libraries to XHTML
One of the more interesting ideas to emerge in the Web development world in recent years is the notion of custom tag libraries (or taglibs, for short). In a taglib implementation one or more custom tags are defined and the server application evaluates and expands or replaces those tags with the result of running some chunk of code on the server. This allows document authors to add reusable bits of server-side functionality to their pages without the hair loss associated with embedding code in the documents.
For this month's example we will write a mod_perl
handler that allows us to create our own custom taglibs. We will do this by creating
SAX
filters that transform the various tags in our library into the desired results. ANd
we'll
use SAX::Machines
within our Apache handler to manage the filter chain.
First, we need to define our taglib. To keep the example simple we start off with
only two
tags: an <include>
tag that provides a way to insert the contents of an
external document defined by the uri
attribute at the location of the tag, and
a <fortune>
tag that inserts a random quote.
To avoid possible collision with the elements allowed in the documents that will contain the tags from our taglib, we will quarantine them in their own XML namespace and bind that namespace to the prefix "widget".
Here is an example of a simple XHTML document containing our custom tags:
<?xml version="1.0"?> <html xmlns:widget="http://localhost/saxpages/widget"> <head> <title>My Cool Taglib-Enabled Page</title> </head> <body> <widget:include uri="/path/to/widgets/common_header.xml"/> <p> Today quote is: </p> <pre><widget:fortune/></pre> <p> Thanks for stopping by. </p> <widget:include uri="/path/to/widgets/common_footer.xml"/> </body> </html>
Now let's create our SAX filters to expand our custom tags. We'll write the filter that include an external XML document, first.
package Widget::Include; use strict; use vars qw(@ISA $WidgetURI); @ISA = qw(XML::SAX::Base); $WidgetURI = 'http://localhost/saxpages/widget';
After a bit of initialization we get straight to the SAX event handlers. In the
start_element
handler we examine the current element's
NamespaceURI
and LocalName
properties to see if we have an
"include" element in our widgets namespace. If it finds one, it further checks for
an
uri
attribute, and, if it finds one, it passes that file name on to a new
parser instance using the current filter as the handler.
sub start_element { my ( $self, $el ) = @_; if ( $el->{NamespaceURI} eq $WidgetURI && $el->{LocalName} eq 'include' ) { if ( defined $el->{Attributes}->{'{}uri'} ) { my $uri = $el->{Attributes}->{'{}uri'}->{Value}; my $parser = XML::SAX::ParserFactory->parser( Handler => $self ); $p->parse_uri( $uri ); } }
If we did not get an element with the right name in the right namespace we forward the event to the next filter in the chain.
else { $self->SUPER::start_element( $el ); } }
We do a similar test in the end_element
event handler; forwarding the events
that we are not interested in.
sub end_element { my ( $self, $el ) = @_; $self->SUPER::end_element( $el ) unless $el->{NamespaceURI} eq $WidgetURI and $el->{LocalName} eq 'include'; }
Also in Perl and XML |
OSCON 2002 Perl and XML Review PDF Presentations Using AxPoint |
That's it. Since this filter inherits from XML::SAX::Base
we need only
implement the event handlers that are required for the task at hand. All other events
will
be safely forwarded to the next filter/handler.
The filter that implements the <widget:fortune> tag is very similar. We check to
see
if the current element is named "fortune" and is bound to the correct namespace. If
so, we
replace the element with the text returned from a system call to the fortune
program. If not, the events are forwarded to the next filter.
package Widget::Fortune; use strict; use vars qw(@ISA $WidgetURI); @ISA = qw(XML::SAX::Base); $WidgetURI = 'http://localhost/saxpages/widget'; sub start_element { my ( $self, $el ) = @_; if ( $el->{NamespaceURI} eq $WidgetURI && $el->{LocalName} eq 'fortune' ) { my $fortune = `/usr/games/fortune`; $self->SUPER::characters( { Data => $fortune } ); } else { $self->SUPER::start_element( $el ); } } sub end_element { my ( $self, $el ) = @_; $self->SUPER::end_element( $el ) unless $el->{NamespaceURI} eq $WidgetURI and $el->{LocalName} eq 'fortune'; }
With the filters out of the way we turn to the Apache handler that will make our
filters
work as expected for the files on our server. The basic Apache handler module that
makes our
taglibs work is astonishingly small considering what it provides. We simply create
a new
instance of XML::SAX::Pipeline
then, inside the required handler
subroutine, we create a Pipeline
machine, passing in the names of the widget
filter classes we just created. Then we send the required HTTP headers and call
parse_uri
on the file being requested by the client.
package SAXWeb::MachinePages; use strict; use XML::SAX::Machines qw( :all ); sub handler { my $r = shift; my $machine = Pipeline( "Widget::Include" => "Widget::Fortune" => \*STDOUT ); $r->content_type('text/html'); $r->send_http_header; $machine->parse_uri( $r->filename ); }
Finally, we need to upload the XML documents to the server and add a small bit to one of our Apache configuration file so our handler is called appropriately. I used
<Directory /www/sites/myhostdocroot > <FilesMatch "\.(xml|xhtml)"> SetHandler perl-script PerlHandler SAXWeb::MachinePages </FilesMatch> </Directory>
After restarting Apache, a request to the XML document we created earlier will look something like the following:
<html xmlns:widget='http://localhost/saxpages/widget'> <head> <title>My Cool Page</title> </head> <body> <div class='header'> <h2>MySite.tld</h2> <hr /> </div> <p> Today quote is: </p> <pre>The faster we go, the rounder we get. -- The Grateful Dead </pre> <p> Thanks for stopping by. </p> <div class='footer'> <hr /> <p>Copyright 2000 MySite.tld, Ltd. All rights reserved.</p> </div> </body> </html>
No Webby awards here, to be sure, but the basic foundation is sound and implementing
new
tags for our tag library is a matter of creating new SAX filter classes and adding
them the
Pipeline
in the Apache handler.
Conclusions
We've only touched the surface of what XML::SAX::Machines
can do. Tune in
next month when we will delve deeper into the API and show off some of its advanced
features.