The State of Python-XML in 2004
October 13, 2004
The table below lists the currently available Python-XML software that I judge to be significant. It is not a list of every bit of software in Python that has anything to do with XML. For example, I do not list pyglade (part of PyGTK), which is software for generating user interfaces in the GNOME desktop system for UNIX. The user interface specifications in question are in XML, but this is not really enough to call it an XML processing tool for Python. However, you can certainly use the tools I mention for convenient manipulation of pyglade specifications.
The general rules of thumb for including software are, firstly, whether it implements a technology or set of technologies strongly associated with XML; and secondly, whether it does so in a way that is useful for any arbitrary XML file I may want to process.
Another example of a project that doesn't fit these parameters is Mark Pilgrim's excellent Universal Feed Parser, which parses almost every known form of RSS and Atom newsfeed formats, including some that are not well-formed XML. This package is not a general-purpose tool for XML processing, but rather focused on a specific XML vocabulary. I did make a bit of a compromise on this principle to cover RDF packages, since even though RDF/XML is a specific XML vocabulary, it is generally acknowledged as a valid way to express the data in any XML.
I organize the table according to selected areas of XML technology. This will give newcomers to Python a quick look at the coverage of XML technologies in Python and should serve as a quick guide to where to go to address any particular XML processing need. I have added reference links to column articles on software I've covered in this column. I have set a "heartbeat" rating for each project. One heart means the project is almost inactive and three means the project is very active. I judge this rating subjectively, according to recent activity I can find for each project: mailing list traffic, releases, articles, other projects that use it, etc.
In 2002 I reported 34 Python-XML projects. Last year I added 24 and this year 16 (marked with an asterisk) for a grand total of 74. This month alone two new projects have emerged, showing the continuing interest in Python processing of XML. This year I added a new category, for XML generators, with 9 entries. There has been a bloom in Python packages for generating XML. An existing category that keeps on growing is in Pythonic APIs or data bindings. There are 15 as of this year's count. There is no doubt that patience for non-Pythonic ways of processing XML has worn thin, but considering that my list may not even be complete (rumor has it Guido van Rossum has a data-binding tool of his own), one wonders whether this area is ripe for consolidation. At this point I leave you to judge such matters for yourself.
XML Parsng EnginesParsing engines offer unique, low-level parsers. Many packages offer additional capabilities, but this section mainly documents the various low-level XML parsers for Python, on which other packages then build. Packages do not support DTD validation unless such support is explicitly stated. Note: There are no Python parsers that I know of that support XML 1.1, although, as many have remarked (1), (2), (3), XML 1.1 is probably in trouble as far as adoption is concerned. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
DOMThe Document Object Model is probably the best-known API for XML, and is very well-represented in the Python world. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Data Bindings and Specialized APIsSAX and DOM are perhaps the best-known XML processing APIs, but there are many projects that strive for an API that focuses on the strengths of Python. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
XPath and XSLTXPath and XSLT are perhaps the most universal XML processing tools. XSLT is not just a styling tool but a full-blown (if verbose) scripting language for XML. XPath is embedded in almost every other XML technology you can think of. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
Schema Languages (Not Built into Parsers)Schema languages allow one to communicate XML formats, validate that instances match the constraints, and even assess convenience features for the XML formats. DTD is the original schema language, and is usually implemented in XML parser (and so most implementations are covered in the section on parsers). |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
XML GeneratorsThese are Python tools that can be used to generate XML. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
ProtocolsOne of the earliest and most discussed uses of XML is to transmit data from one application or machine to another. These tools provide such XML protocol facilities for use in Python. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
RDF and Topic MapsThe Resource Description Framework is a system for managing metadata. Its primary serialization syntax is an XML vocabulary. These are Python tools for processing this RDF/XML syntax. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||
MiscellanyIn this category is software that does not fall into any other area. |
||||||||||||||||||||||||||||||||||||||||||||||||||
|
The Community Marches On
I'm sure I've missed some resources in this update article. If you know of any I've neglected, please mention them in a comment to this article and I'll be sure to take note of them for future updates. I mention new or newly discovered resources at the end of each column article, and I compile the updates yearly. Certainly anyone working where Python meets XML should participate on the Python XML SIG mailing list, and post announcements there. Doing so is the best way to be sure that I and a lot of others are aware of your work.
Also in Python and XML |
|
Should Python and XML Coexist? |
|
This month's regular update starts with something mind-bogglingly brilliant (if odd). My xmlhack colleague Oleg Paraschenko created Pysch, a scheme runtime environment in Python that he wrote expressely with the purpose of running Scheme tools SXPath and SXSLT under Python. Psych already runs these target packages, and according to Paraschenko, "I think that Pysch can be used to run any Scheme code, after first using third-party tools to process the Scheme code and save it in XML format for parsing by Psych." But there is the expected limitation: "Pysch is very slow. I'm not going to fix it yet. I use Pysch for research goals and not in production."
Mike Hostetler announced XMLBuilder 1.3. "You create an XMLBuilder object, send it some dictionary data, and it will generate the XML for you." I just mentioned the 1.1 release last month and I only post consecutive updates upon major changes. That certainly is the case here. It appears this is the first actually usable version of XMLBuilder. The announcement says "Support for non-ascii character." I hadn't realized such a limitation in earlier releases. I applaud the author and contributors for putting in the work to establish the "XML" in "XMLBuilder."
As I always vehemently argue, it ain't XML if it doesn't support Unicode. I probably have to weaken this rule a bit for XML generation code, saving the full strictness for XML parsing code, but I'm not comfortable with and can't recommend XML generation code that doesn't support the full character model. See the XMLBuilder announcement.
Roland Leuthe released minixsv 0.2, "a lightweight XML schema validator written in pure Python. It implements only a subset of the W3C XML schema [WXS] 1.0 recommendation." The WXS subset is very limited, but Leuthe admits the package is "pre-alpha," and I'll keep an eye out for further developments. minixsv works with the standard minidom or elementtree. As the page says, "Other DOM implementations can be easily adapted by implementing a newly derived XML interface class."
The major update rule also applies to my release of Scimitar 0.9.0. Scimitar is a fast ISO Schematron implementation that works by compiling a Schematron schema into a Python validator script. It now supports the full draft ISO Schematron spec, including variables and abstract patterns. See the announcement.
Philippe Normand announced XMLObject 0.1.3, a data-binding tool that allows you to map from customized Python classes to XML, and vice versa. See the announcement.
Fredrik Lundh released ElementTree 1.2.1. He says: "ElementTree 1.2.1 is 1.2 plus code that takes advantage of new expat features in newer versions of Python. As a result, the parser is now 20-30% faster on many kinds of XML documents. Enjoy!"
For users of various .NET Python tools, Srijit Kumar Bhadra posted some useful sample code for generating XML output. Later he posted some corrections to the code comments.