The State of the Python-XML Art
September 18, 2002
Welcome to the first Python-XML column. Every month I'll offer tips and techniques for XML processing in Python and close coverage of particular packages. Python is an excellent language for XML processing, and there is a wealth of tools and resources to help the intrepid developer be productive. In what follows I'll survey these tools and resources, giving a sense of how broadly Python supports XML technologies and giving you a head start on the more in-depth topics to follow.
The world of Python-XML
One of the best things about Python-XML is the active community of practitioners and contributors. From introductory texts to references to mailing lists, these resources will provide answers to most questions worth asking about Python and XML. If you are new to Python and coming from the XML perspective, Sean McGrath's article XML Processing with Python is an older but still very pertinent introduction to the area.
The Python XML SIG is the primary focus of Python work for XML, and its mailing list is a good place for discussion. The XML SIG has also produced some important general XML work such as the XML Bookmark Exchange Language (XBEL), which is now used in several Web browsers. There is also a lot of general Python-XML discussion on the 4Suite and Zope-xml mailing lists.
There's a lot of material in print on XML processing with Python. Python & XML, by Christopher A. Jones and Fred L. Drake, Jr. (O'Reilly) is a valuable book on the topic. Definitive XML Application Development, by Lars Marius Garshol (Prentice Hall), introduces XML processing, using Python throughout as the implementation language. It also uses Java and, thus, provides useful comparison for those familiar with Java-XML programming. Python Cookbook, edited by Alex Martelli and David Ascher (O'Reilly), has a section with XML recipes. Python How to Program, by Deitel, Deitel, Liperi and Wiedermann (Prentice Hall), has several chapters and a detailed case study covering XML topics. Python Web Programming, by Steve Holden and David Beazley (New Riders), has a section on XML. The first Python-XML book, XML Processing with Python, by Sean McGrath (Prentice Hall), is a bit dated in general but covers topics that none of the other books do. XML Processing with Perl, Python, and PHP, by Martin C. Brown (Sybex), devotes six chapters to Python. Most general Python books that cover version 2.0 and up will introduce the built-in XML processing libraries. Mark Lutz maintains a list of Python books. I'll be reviewing a selection of Python-XML books later on in this column.
Andrew Kuchling's Python/XML HOWTO is good starter documentation. The on-line slides for Alexandre Fayolles's EuroPython 2002 tutorial on Python-XML processing are also a useful starter. The Python Cookbook offers a good number of recipes on XML. I maintain a collection of recipes, tips and pointers on Python processing in XML. You will find many other on-line resources referenced there. The XML SIG maintains a Wiki, but it doesn't have a great deal of content yet.
Python and XML Software
The following table lists the currently available Python-XML software that I judge to be significant. It is not a list of every bit of software in Python that has anything to do with XML: for example, I do not list pyglade, which is software for generating user interfaces in the GNOME desktop system for UNIX. The user interface specifications in question are in XML, but this is not really enough to call it an XML processing tool for Python. However, you can certainly use the tools I mention for convenient manipulation of pyglade specifications. The general rules of thumb for including software are, first, whether it implements a technology or set of technologies strongly associated with XML; and, second, whether it does so in a way that is useful for any arbitrary XML file I may want to process.
I've organized the table according to the areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. I rate the vitality of each listed project as either "weak", "steady" or "strong" according to the recent visible activity on each project: mailing list traffic, releases, articles, other projects that use it, etc. I will often omit entries I judge to be of weak vitality in areas where there are other projects of steady or strong vitality.
name | description | vitality |
---|---|---|
XML parsing | ||
PyLTXML |
PyLTXML is a Python extension wrapping the LTXML parser. It supports DTD validation. |
steady |
cDomlette |
cDomlette is part of 4Suite. It is a fast C-based DOM implementation with a Python API, and includes a wrapper of the expat parser. It supports DTD validation. It also supports XInclude and XML Base. |
strong |
libxml/python |
This Python extension module is a wrapper for libxml. It supports DTD validation. |
strong |
pyRXP |
pyRXP is a Python extension wrapping the RXP XML parser. It supports DTD validation. |
steady |
pyexpat |
Pyexpat is part of PyXML and is a wrapper of the expat parser. It supports DTD validation. |
strong |
qp_xml |
qp_xml is part of PyXML. It is a simple parser written entirely in Python with no validation support. |
steady |
xmlproc |
xmlproc is part of PyXML. It is a parser written entirely in Python. It supports DTD validation and provides API access to parsed DTD constructs. |
steady |
XPath, XSLT and XPointer | ||
4XSLT |
4XSLT is part of 4Suite, as is 4XPath and 4XPointer. 4XSLT supports a large portion of EXSLT. |
strong |
Pyana |
Pyana is a Python extension module wrapping the Xalan XSLT engine. |
strong |
libxslt/Python |
This Python extension module is a wrapper for libxslt. It supports a large portion of EXSLT. |
strong |
Schema languages (besides DTD) | ||
XSV |
XSV is a W3C XML Schema (WXS) implementation. It is actually one of the first WXS implementations, and drives the W3C's on-line validator. |
steady |
XVIF |
XVIF implements RELAX NG, enhanced with the XML Validation Interoperability Framework for XML processing pipelining. It includes an implementation of XML Regular Fragmentations. 4Suite includes experimental RELAX NG and XVIF integration through this software. |
steady |
Protocols | ||
Python Web Services |
This is a collection of Python modules for SOAP, WSDL and related technologies. |
steady |
WDDX/Python |
PyXML comes with a WDDX module for Python. |
|
wsdl4py |
wsdl4py is a simple Python library for WSDL processing. See also uddi4py. |
steady |
xmlrpclib |
Python versions from 2.1 up bundle XML-RPC client and server modules. |
strong |
RDF and Topic Maps | ||
4RDF |
4RDF is part of 4Suite. It includes an RDF/XML and NTriples parser, RDF store system, Python triples API and an implementation of the Versa query language. |
strong |
Redfoot and RDFLib |
Redfoot is an RDF server written in Python. RDFLib is the triple store and RDF/XML parser component. |
strong |
Redland/Python |
This is a Python interface for the Redland RDF Application Framework. |
strong |
tmproc |
tmproc is a Python implementation of XML Topic Maps, based on ISO/IEC 13250 Topic Maps. |
strong |
DOM | ||
4DOM |
4DOM is part of PyXML. It is a comprehensive implementation of W3C DOM Level 2. |
steady |
cDomlette |
See the "XML parsing" section |
strong |
minidom |
Python versions from 2.0 up bundle a minidom module. Minidom is a lightweight DOM implementation that is more pythonic. It follows the general lines of DOM Level 2. |
strong |
pulldom |
Python versions from 2.0 up bundle a pulldom module. Pulldom is a special DOM-like implementation that only loads parts of an XML document as requested. |
strong |
Miscellany | ||
4XLink |
4XLink is part of 4Suite. It implements a portion of XLink. |
weak |
4XUpdate |
4XUpdate is part of 4Suite. It is a Python implementation of XUpdate. It can be used to apply difference patches generated by XMLDiff. |
strong |
Pyxie |
Pyxie is a line-oriented XML processor. |
weak |
XIST |
XIST, "object oriented XSLT", uses an easily extensible, DOM-like view of source and target XML documents to do tree transformations. |
strong |
XMLTools |
XMLTools is a small suite of tools that includes a graphical XML tree viewer and editor for the GTK windowing library. |
strong |
XMLdiff |
XMLdiff is a python tool that figures out the significant differences between two XML files or DOM trees. It can generate XUpdate output. |
strong |
c14n.py |
c14n.py is part of PyXML. It implements XML canonicalization. |
strong |
xml.sax |
Python versions from 2.0 up bundle a SAX module. |
strong |
xmlarch |
xmlarch is a XML architectural forms processor written in Python, using SAX. |
weak |
There are many Python projects for storage and network serving of data which have specialized facilities for XML documents. These include Maki, Zope, 4Suite repository and XDisect. I do not list these in the chart above because it is especially subjective as to whether these can be considered XML tools. I also plan to cover Python server frameworks with XML facilities in this column in future.
I have probably missed a few entries here and there. Please post any omissions to the comments section at the end of this article. For those working on new Python and XML goodies, do not forget to post announcements to the Python XML SIG mailing list. This is the best way to be sure that I and a lot of others are aware of your work. I'll mention new software and significant updates regularly in this column.
Into the fray...
I hope these pointers give you a good start into the world of Python and XML. In the next article I'll tour the many facilities added to core Python by the PyXML package.