Introducing XML::SAX::Machines, Part Two
March 20, 2002
Introduction
In last month's column we
began our introduction to XML::SAX::Machines
, a group of modules which greatly
simplifies the creation of complex SAX application with multiple filters. This month
we pick
up where we left off by further illustrating how XML::SAX::Machines
can be used
to remove most of the drudgery of building SAX-based XML processing applications.
If you
have not read last month's offering, please do so now.
Example One - MachinePages Revisited
In last month's column we created a simple mod_perl
handler that uses
SAX::Machines
to allow developers and HTML authors to use custom tag
libraries in their HTML documents. This example was fine, as far as it went, but it
can be
made a lot more robust and flexible with very little effort. For example, the list
of SAX
filters in the previous example was hard-coded into the handler script itself. One
of the
best features of the interface that SAX::Machines
provides is that the filter
chains and other machine definitions can be built dynamically at run-time using simple
Perl
arrays.
For our first example this month we will extend the previous MachinePages
handler to capitalize on SAX::Machines
' dynamic abilities by allowing the SAX
filters applied to a given document to be passed in through Apache's configuration
API. In
addition, we will give developers the option to apply one or more XSLT stylesheets
to the
filtered SAX event streams; again, allowing the choice of stylesheets to be selected
via
configuration directive.
package SAXWeb::MachinePages; use strict; use Apache::Constants; use XML::SAX::Machines qw( :all ); use XML::Filter::XSLT; use XML::SAX::Writer; sub handler { my $r = shift; my @filters;
With the basic initialization out of the way, we can begin reading in the list of
filters
that are to be applied to the given request. We do this by calling the
dir_config
method on the Apache::Request
object, processing the
string containing a custom MachineFilters option (if on exists) into an array, and
pushing
that array onto our global list of SAX filters.
if ( defined( $r->dir_config('MachineFilters') ) ) { my @widgets = split /\s+/, $r->dir_config('MachineFilters'); push @filters, @widgets }
Next, we use dir_config
to check for a MachineStyles option and, if present,
we process that option and use the resulting filenames to configure a chain of
XML::Filter::XSLT
instances As above, we append those instances onto the
top-level list of filters to be applied to the event stream.
if ( defined( $r->dir_config('MachineStyles') ) ) { my @stylesheets = split /\s+/, $r->dir_config('MachineStyles'); foreach my $stylesheet ( @stylesheets ) { my $xsl_filter = XML::Filter::XSLT->new(); $xsl_filter->set_stylesheet_uri( $stylesheet ); push @filters, $xsl_filter; } }
Note the difference between this block and the previous one. In the MachineFilters
block
we simply added strings containing the class names of the SAX filters to the list
of filters, while, here, we have created instances of the XML::Filter::XSLT
class and pushed those blessed objects onto the list. XML::SAX::Machines
invisibly copes with both cases by autoloading and creating new instances of those
filter
classes passed in as plain strings, while working in the predictable way for those
filters
which are passed as blessed references.
Moving on, we create a new XML::SAX::Writer
object and set its output stream
to point at a plain scalar variable inventively named $output
.
my $output = ''; my $writer = XML::SAX::Writer->new( Output => \$output );
Next, we create a Pipeline
machine, which gives us linear chain of SAX
filters, passing in the list of filters we have collected and setting the instance
of
XML::SAX::Writer
as the final handler.
my $machine = Pipeline( @filters, $writer );
We then call the machine's parse_uri
method, passing in the file name of the
document that client requested.
$machine->parse_uri( $r->filename );
Note that we did not create an instance of a SAX parser class, but, rather, called
parse_uri
on the Pipeline
object. Again,
XML::SAX::Machines
"does what you mean" in this case by creating an instance
of an XML::SAX
parser behind the scenes.
To finish off our Apache handler, we have to set the appropriate HTTP headers and send the result of the SAX process to the client.
$r->content_type('text/html'); $r->send_http_header; $r->print( $output ); return OK; } 1;
Our new MachinePages handler now allows for fine-grained control over which filters are applied to which documents using options from the server-wide httpd.conf file or .htaccess files.
PerlSetVar MachineFilters "My::FilterOne My::FilterTwo" PerlSetVar MachineStyles "/www/htdocs/stylesheets/default.xsl ..."
These options can be used as-is or wrapped in <Directory>
,
<Files>
, <FilesMatch>
containers, or any of the
other common Apache configuration control blocks, for greater control.
Example Two - Creating A Smart SAX Controller
In complex applications where producing XML is only one part of the overall functionality
provided, it is often wise to keep the XML processing facilities as separate from
the core
application as possible. One way to achieve this is to create an abstract "controller"
class
that handles the gory details; allowing the core application to call a few simple
methods to
achieve complex results. XML::SAX::Machines
is especially well-suited for
creating these simple but powerful abstract controllers. For our second and final
example we
will build a class that implements this pattern.
Consider the following illustration:
We see from this diagram that the SAX controller class is responsible for establishing the SAX processing chain from end to end, while providing a simple one-stop interface to the rest of the application. The application simply calls one or two methods in the controller class to obtain the result it expects
Typical SAX Controller Program Flow
-
The developer initializes a new Controller object inside the core application, passing in instances of the desired Generator, Handler, and any desired Filters.
-
The developer calls the Controller's
parse()
method, passing in whatever data the Generator needs to initialize the SAX event stream. -
The Controller passes that data along to the Generator via the
prepare()
method (if it implements one). -
If implemented, the Generator's
prepare()
method examines or alters the data passed and returns the (possibly altered) data back to the Controller. During this examination, the Generator has a chance to see what it is about to parse and has the opportunity to set additional Filters. -
The Controller calls the Generator's
set_filters()
method (if implemented) to retrieve any additional Filters. -
The Controller initializes the SAX filter chain (via
SAX::Machines
), setting the passed Handler as the Machine's final SAX Handler, and the Machine itself as the event handler for the passed Generator. -
The Controller calls the Generator's
parse()
method, passing in the altered data returned from theprepare()
method. -
The Generator begins the SAX event stream, firing the events (
start_document
,start_element
,end_document
, etc.) at the first Filter in the chain, if any. -
The final Filter (or Generator, if no Filters were added to the chain) fires the SAX events at the Handler. The Handler does something with the data passed through the event methods (builds a DOM tree, writes an XML document to the file system or browser, etc.)
-
The result of the parse is returned to the application.
Let's get down to business and create the actual controller.
package MyController; use strict; use vars qw( $DefaultSAXHandler $DefaultSAXGenerator); use XML::LibXML; use XML::SAX::Machines qw( :all ); $DefaultSAXHandler ||= 'StringWriter'; $DefaultSAXGenerator ||= 'XML::SAX::ParserFactory';
After a bit of initialization, we create the constructor for our controller. Borrowing
from XML::SAX::Machines
' DWIM nature, we will provide default classes for the
Generator and Handler options, allowing developers to pass these in either as simple
class
names or blessed instances.
sub new { my $class = shift; my %args = @_; my $self; if ( defined $args{Handler} ) { if ( ! ref( $args{Handler} ) ) { my $handler_class = $args{Handler}; eval "require $handler_class"; $args{Handler} = $handler_class->new(); } } else { eval "require $DefaultSAXHandler"; $args{Handler} = $DefaultSAXHandler->new(); } if ( defined $args{Generator} ) { if ( ! ref( $args{Generator} ) ) { my $driver_class = $args{Generator}; eval "use $driver_class"; $args{Generator} = $driver_class->new(); } } else { eval "use $DefaultSAXGenerator"; $args{Generator} = $DefaultSAXGenerator->new(); } $args{FilterList} ||= []; $self = bless \%args, $class; return $self; }
Next, we get to the meat of the controller class, the parse()
method. In
addition to allowing developers to pass SAX filters in during initialization, we will
go a
step further by allowing the SAX generator to set additional filters by giving it
a chance
to see what it's about to parse (via an optional prepare()
method).
Also, we will send the result of the generator's prepare()
as the sole
argument for its parse()
method. This idea is especially interesting given the
fact that SAX event streams can be generated from more than just XML documents. So,
for
example, we could easily write a custom SAX generator that subclasses
XML::Generator::DBI
and implements a prepare()
method that maps
URLs to specific SQL queries. In that case, prepare()
would return the SQL
select statement rather than the URL passed in from the core application.
sub parse { my $self = shift; my $to_be_parsed = shift; # filters passed to the object from the application side. my @filterlist = @{$self->{FilterList}} || (); # give the Generator a peek at what it's about to parse and alter it, if needed. if ( $self->{Generator}->can('prepare') { $to_be_parsed = $self->{Generator}->prepare( $to_be_parsed ); } # allow filters to be passed from the generator # (could be hard-coded, or filters set during prepare()). if ( $self->{Generator}->can('get_filters') { push @filterlist, $self->{Generator}->get_filters; } # build the filter machine, setting the last stage to the passed Handler my $machine = Pipeline( @filterlist, $self->{Handler} ); # set the generator to fire its events at the pipeline $self->{Generator}->set_handler( $machine ); # get the result and return it to the app. my $parse_result = $self->{Generator}->parse( $to_be_parsed ); return $parse_result; }
To keep things flexible, we will also provide a few simple configuration methods for setting up the SAX controller.
sub set_handler { my $self = shift; my $handler = shift; if ( defined( $handler ) ) { if ( ! ref( $handler ) ) { my $handler_class = $handler; eval "use $handler_class"; $self->{Handler} = $handler_class->new(); } else { $self->{Handler} = $handler; } } } sub set_generator { my $self = shift; my $generator = shift; if ( defined( $generator ) ) { if ( ! ref( $generator ) ) { my $generator_class = $generator; eval "use $generator_class"; $self->{Generator} = $generator_class->new(); } else { $self->{Generator} = $generator; } } } sub set_filterlist { my $self = shift; my @filters = @_; $self->{FilterList} = \@filters; } 1;
That's it, we're done with our controller. Here are a few examples of how it may be called from the core application.
For those who like methods --
my $sax_controller = MyController->new(); $sax_controller->set_generator( 'Some::Generator' ); $sax_controller->set_handler( $my_blessed_instance ); $sax_controller->set_filterlist( 'XML::Filter::Foo', 'XML::Filter::Bar' ); my $result = $sax_controler->parse( $something );
And the same for those who like constructor arguments instead --
my $sax_controller = MyController->new( Generator => 'Some::Generator' Handler => $my_blessed_instance, FilterList => \@list_of_filter_names); my $result = $sax_controler->parse( $something );
As with the MachinePages example above, XML::SAX::Machines
adds significant
value to our application by making the filter chain both easy to configure and trivial
to
create dynamically.
Conclusions
Also in Perl and XML |
OSCON 2002 Perl and XML Review PDF Presentations Using AxPoint |
XML::SAX::Machines
makes the task of creating complex SAX-based application
extremely simple and straightforward, while providing a level of flexibility that
would by
painful at best to duplicate by hand. It offers a modern, easy-to-use interface that,
like
all Perlish things, makes easy things easy and makes hard things... well, not just
possible,
but easy, too. If you are considering SAX as the API of choice for your XML processing
applications, XML::SAX::Machines
should be at the top of your evaluation
list.