AxKit: XML Web Publishing with Apache and mod_perl
May 24, 2000
Introduction
Table of Contents |
•Introduction |
One of XML's major benefits to web developers is that it is a standard way to separate data from presentation, and create a consistent templating system for a web site. Yet that promise is yet to be fully realized by many, due to the immature state of XML tool support, especially in authoring.
An important part of using XML for web publishing is content delivery. Although XML-to-HTML conversion is partially possible in browsers such as Internet Explorer, the reality is that HTML or XHTML will be served from web servers for a long time to come. This means server-side XML transformation is the most viable option for publishing with XML today.
Server side transformations can be handled at various levels. The most basic of these is static transformations (e.g., using an XSLT processor and some shell scripts), but this method can quickly become awkward, and is not satisfactory for dynamic web sites.
Another option is application server environments such as Zope or Enhydra. If you have a real need to use these products, they are a good choice. But keep in mind that they have a tendency to operate within their own enclosed universe.
A third choice is to use an XML content delivery infrastructure such as that provided by the Apache Cocoon project. Cocoon is a Java-based environment for pipelined transformation of XML resulting in web pages served to the user. It also offers more advanced features for active server pages etc.
AxKit, a mod_perl and Apache-based XML content delivery solution, takes an approach similar to Cocoon. It provides simple ways for web developers to deliver XML utilizing multiple processing stages and style sheets, all programmable through Perl. AxKit takes care of caching so that the developer doesn't have to worry about it. It's also tightly bound to the Apache web server, providing a good route forward for those with an existing investment in mod_perl and Apache.
The fundamental way in which XML is delivered to a client in AxKit is through transformation with one or more style sheets. AxKit does not see style sheets solely in terms of XSLT transformations, but as more generic processing stages allowing arbitrary languages and operations.
In this article, I will describe AxKit's architecture, and give details of its installation and future development. Some familiarity with transforming XML would be helpful in reading this article.
Overview
AxKit is based on a plugin architecture. This allows the developer to quickly design modules based on currently available technology to create
- new style sheet (transformation) language;
- new methods for delivering alternate style sheets
- new methods for determining media types
Because AxKit is built in Perl, these plugins are simple to develop. Not long after releasing AxKit, a developer wrote a suffix-based style sheet chooser module (which returns different style sheets if the user requests file.xml.html or file.xml.text) in just 15 lines of code!
The plugin architecture also makes developing new style sheet modules easy, using some of the readily available code in Perl's excellent CPAN (the Comprehensive Perl Archive Network). A style sheet module to deliver XML-News files as HTML would only take a few lines of code based on David Megginson's XMLNews::HTMLTemplate module, and AxKit works out all the nuances of caching for you.
AxKit comes with a number of pre-built style sheet modules, including two XSLT modules: one built around Perl's XML::XSLT module, a DOM based XSLT implementation that is in the beginning stages, and one built around Ginger Alliance Ltd's Sablotron XSLT library, which is a much more complete and fast XSLT implementation built in C++.
For the closet XSLT haters out there, there's XPathScript -- a language of my invention that takes some of the good features of XSLT, such as node matching and finding using XPath, and combines them with the power of ASP-like code integration and inline Perl scripting. XPathScript also compiles your style sheet into native Perl code whenever it changes, so execution times are very good for XML style sheet processing.
The core of AxKit delivers good performance. Serving cached results, it runs at about 80% of the speed of Apache. It achieves this primarily because it's built in mod_perl. The tight coupling with Apache that mod_perl provides means that a lot of the code is running in compiled C. In order to deliver cached results, AxKit just tells Apache where to find the cached file, and that it doesn't want to handle it. Apache then serves the page with its usual efficiency.
Finally, AxKit works hand-in-hand with Apache. So any webmaster skills you might
have in
Apache administration won't go to waste. AxKit integrates directly with Apache's
<Files>
, <Location>
and <Directory>
directives. All AxKit's configuration takes this approach, so you won't have to teach
a
webmaster new tricks to build up your XML site.
How AxKit Works
Table of Contents |
•Introduction |
AxKit registers two "handlers" with Apache in order to do its work. In Apache terms,
these
are modules that work in various parts of the request phase (which covers things like
Authentication, Type checking, Response, and Logging). When a request for a file comes
in,
AxKit does some quick checking to see if the file is XML. The main checks performed
are to
see if the file extension is .xml
, and/or to check the first few characters of
the file for the <?xml?>
declaration. If the file is not XML, AxKit lets
Apache deliver the file as it would normally. Note that it's possible to only apply
AxKit to
certain parts of your web site.
When an XML file is detected, the next step is to call plugin modules to determine
the
media type and/or style sheet preference. Media type chooser plugins normally look
at the
User-Agent
header, or possibly at the Accept
header. However,
it's possible to define any method at all to determine the media type.
The existing style sheet choosers are based on examining the path info (this is a
path
following the filename, so you could request myfile.xml/mystyle
), query string
(for example myfile.xml?style=mystyle
), and file suffix
(myfile.xml.mystyle
).
The final part is plumbing together all the style sheets with the XML file in the right order, implementing cascading where appropriate, and also "doing the right thing" with regards to the cache. AxKit invalidates the page cache when external entities (parsed or unparsed) change, as well as when the original document is altered. This allows modular style sheets to change only part of their make-up and ensure that changes to these sub-components cause a re-build of the cache.
Mapping XML Files to Style Sheets
AxKit uses two separate methods for mapping XML files to style sheets. The primary
method
is that specified in the W3C Recommendation at http://www.w3.org/TR/xml-stylesheet. This specifies that an
<?xml-stylesheet?>
processing instruction at the beginning of the XML file
(after the <?xml?>
declaration, and before the first element) defines the
location and type of the style sheet.
The second method of mapping XML files to style sheets is used when no usable
<?xml-stylesheet?>
directives are found in the XML file. This uses a
DefaultStyleMap
option in your Apache configuration files. These directives
can be used anywhere within Apache's <Files>
, <Location>
,
and <Directory>
sections, and .htaccess
configuration system.
In this way it's possible to define complex mapping rules for different file types
and
locations in whichever manner pleases you, without having each XML file individually
specify
its style sheet.
AxKit then uses the type of the style sheet (in the type="..."
pseudo-attribute of the <?xml-stylesheet?>
processing instruction, or the
first parameter of the DefaultStyleMap
option) to decide which module to use to
process the file. Types are mapped to a module using another Apache configuration
option:
AddStyleMap
. Again, this directive can appear anywhere within Apache's
configuration structure. This allows you to try different modules for processing the
same
file.
Choosing a Style Sheet
Often, AxKit will have more than one style sheet option for serving a particular file. How does it choose which one to use?
The choice is made based on media type and on "style sheet preference." For a style sheet to be chosen for the file currently being served, the media types must match, or the style sheet must have a type of "all."
Style sheet preferences are slightly more complex. AxKit has three concepts of style sheets: persistent, preferred, or alternate. Without drowning in detail here, this facility allows a processor further up the "pipeline" to determine a style sheet -- so, for instance, a user could personalize their look and feel by determining which style sheet was applied.
Cascading Style Sheets
It's easy to get confused by the term "style sheet" here--in AxKit they are not restricted to XSLT sheets, and are best thought of as general processing and transformation stages. Style sheets in AxKit's terms can do anything, provided you can build a Language module to parse it. This includes the function of creating original XML content, as well as transforming and formatting it. So it becomes possible to, for instance, retrieve database results, add tags, and format the result into WML or HTML.
Cascading refers to the case of one style sheet's results "cascading" into the next (alternatively, you can think of this as a pipeline of processing stages). With AxKit there are a number of ways to achieve this. The first and simplest method is to have all your style sheets based on DOM, and produce DOM trees. When all the style sheets have finished processing, AxKit takes care to dispose of your DOM tree and output the result to the user agent.
The second method of cascading is to simply pass around the textual result of your output. This is necessary with modules like Sablotron where there is no DOM tree available. Modules further down the processing stream can parse this result as XML, and continue processing.
The final, and possibly most interesting, method of cascading processing stages is to use "end-to-end SAX." Here, AxKit sets up a chain of SAX handlers to process the document. Each style sheet stage based on SAX simply sends on SAX events to the next SAX handler up the chain. The final SAX handler in the chain simply outputs its results as text to the browser. The key advantage of this end-to-end system is that it starts outputting data to the browser as soon as parsing begins.
This system allows database modules to avoid building DOM trees in memory, which can be very resource intensive, but to simply fire SAX events, and the output from the database will appear as results are available.
Setting up AxKit
Now that we've been through the theory of how AxKit works, it's time to install and start using it. I don't believe in tools like this being hard to use or set up, so provided you can use an editor and modify a few Apache configuration files, installation should be simple.
Obviously, AxKit requires an installed Apache web server. AxKit also requires mod_perl, so if you don't have it already installed, you need to add it into your Apache. Depending on your platform, this can be complex. More information is available at http://perl.apache.org/guide/.
To install AxKit, first download the distribution. Extract the archive and change to the directory it creates. Then type:
perl Makefile.PL
make
make test
make install
(If you don't have 'apxs' in your path, mod_perl versions below 1.24 will produce a warning at the first step. This warning can be ignored.)
Once AxKit is installed, you will need to edit your Apache web server's configuration
files. First you need to enable AxKit so that Apache understands AxKit's configuration
directives, so add the following line to your httpd.conf
file:
PerlModule AxKit
Finally, you can add in the core components of AxKit--the XMLFinder and the StyleFinder.
These can be added to any .htaccess
file, or other Apache configuration
file:
PerlTypeHandler Apache::AxKit::XMLFinder PerlHandler Apache::AxKit::StyleFinder AxAddStyleMap text/xsl Apache::AxKit::Language::XSLT
The last line associates the type text/xsl
with its style sheet code
module.
Now you're ready to start serving up XML files. To get started, try looking at the example files in the AxKit distribution.
Conclusion
|
|||
• Apache |
|||
• mod_perl |
|||
AxKit provides web developers with the tools they need to deliver complex XML-based systems quickly, and eases them into the development process. It provides the power to develop their own system for style sheet negotiation, and also the flexibility to design completely new style sheet languages.
Although AxKit is not finished yet, the majority of the features described above are built and working reliably. The most significant things missing from AxKit are SAX-based style sheet languages (which need to be designed and built--I have a number of ideas for these) and alternate ways to generate the initial XML file (as opposed to filesystem XML). These will be coming in future releases.
As AxKit is an open source project, I hope people will jump in and help. We have the beginnings of an active mailing list, where you can vote on features, help develop them, or simply lurk. We're moving extremely quickly with the features. Developing in Perl allows us to do this, while still maintaining readable code (something I deem very important -- so don't assume because it's written in Perl that it's going to be a ball of spaghetti!). If there's something you'd like to see in AxKit, please join the mailing list and participate in the project.