An Atom-Powered Wiki
April 14, 2004
In my last article I covered the changes from version 7 to version 8 of the draft AtomAPI. Now the latest version of the AtomAPI is version 9 which adds support for SOAP. This change, and its impact on API implementers, will be covered in a future article. In this article I'm going to build a simple implementation of the AtomAPI.
The first task at hand is to pick a viable candidate. I had a list of criteria which included working with a small code base, working in Python, and the target also being a slightly unconventional application of the AtomAPI. The reason I wanted a small code base in Python is that it's a language I'm familiar with, and small is good for the sake of exposition. The reason I picked an unconventional application of the AtomAPI is that I've found that to be a good technique for stretching a protocol, looking for strengths and weaknesses.
The application I've picked is PikiPiki, which is a wiki, a cooperative authoring system for the Web. It's written in Python, is GPL'd, has a small code base, and the code is easy to navigate. It also has a good lineage given that MoinMoin is based on PikiPiki. The source for both the client and the modified server described in this article can be downloaded from the EditableWebWiki.
To create an implementation of the AtomAPI there are a few operations we need to support. Each entry, which in the case of a wiki will be the content for a WikiWord, needs to have a unique URI called the EditURI that supports GET, PUT and DELETE. In addition a single PostURI that accepts POST to create new entries needs to be added. Last we'll add a FeedURI that supports GET to return a list of the entries. Supporting the listed operations on these URIs is all that's needed to have a fully functioning Atom server. (This of course ignores SOAP, which I'll cover later.)
Character Encoding
Character encoding is often overlooked. Despite that it's an important part of working with any XML format. Atom is no exception. Before making any additions to PikiPiki we'll need to make a few small changes to ensure that all of our data is encoded correctly. For a good introduction to character encoding consult the excellent introduction by Jukka Korpela.
To make things easier we can encode all of PikiPiki's data as UTF-8. There are many encoding to choose from, all with different advantages and disadvantages; but UTF-8 has some special properties: it allows us to use any Unicode character, for the most part treats the data like regular "C" strings, and we are guaranteed support by any conforming XML parser. Also, support for UTF-8 is one of the few things that most browsers do right.
Since this is a wiki, and for now all the data coming into it comes through a form,
we need
to ensure that all incoming data is encoded as UTF-8. The easiest way to do this is
by
specifying that the encoding for form page is UTF-8; lacking any other indications,
a
browser will submit the data from a form using the same character encoding that the
page is
served in. While HTML forms can specify alternate character sets that the server will
accept
when data is submitted, via the accept-charset attribute, support for this is spotty (meaning it worked perfectly in
Mozilla, and I failed to get it working in Microsoft's Internet Explorer). So our
first
change to PikiPiki is to add a meta
tag to the generated HTML.
def send_title(text, link=None, msg=None, wikiword=None): print "<head><title>%s</title>" % text print '<meta http-equiv="Content-Type" content="text/html; charset=utf-8">'
Now all of our web pages should submit UTF-8 encoded data, and since all of the web pages produced from the wiki are combinations of ascii markup embedded in the Python program and the UTF-8 in the stored wiki entries, we can be sure our output is UTF-8.
A Wiki revolves around WikiWords, mixed-case words that are the title for and unique identifiers of every page on the wiki. In the case of PikiPiki, the WikiWord is also the filename that the text of the page is stored in.
The next change is to move the configuration of PikiPiki into a separatefile. We'll
be
creating two new CGI programs to handle the AtomAPI, and they both need access to
some
configuration information. The configuration section is just a set of global variables
that
we'll move into piki_conf.py
:
from os import path import cgi data_dir = '/home/myuserpath/piki.bitworking.org/' text_dir = path.join(data_dir, 'text') editlog_name = path.join(data_dir, 'editlog') cgi.logfile = path.join(data_dir, 'cgi_log') logo_string = '<img src="/piki/pikipiki-logo.png" border=0 alt="pikipiki">' changed_time_fmt = ' . . . . [%I:%M %p]' date_fmt = '%a %d %b %Y' datetime_fmt = '%a %d %b %Y %I:%M %p' show_hosts = 0 css_url = '/piki/piki.css' nonexist_qm = 0
EditURI
The next task at hand is to handle the functions of the EditURI. In the
AtomAPI each entry has an associated EditURI, a URI you can dereference in order to
retrieve
the representation of the entry. You can also PUT an Atom entry to the EditURI to
update the
entry. In this case, each definition of a WikiWord in PikiPiki will act as a single
entry.
To handle the EditURI functions we'll create a Python script atom.cgi
.
First let's map out the GET. We need to package up the UTF-8 encoded contents of a
WikiWord
and send it back. We need to decide on the form of the URI we are going to use. In
this case
we are going to be calling a CGI program and need to pass in the WikiWord as a parameter.
We
could pass it in either as a query parameter or we could pass it in as a sort of path.
For
example, in the first case, if the WikiWord was "FrontPage", the EditURI could be
atom.cgi?wikiword=FrontPage
. In the second place, the EditURI might be
atom.cgi/FrontPage
. Well choose the latter; the WikiWord will be passed in
via the "PATH_INFO" environment variable.
def main(body): method = os.environ.get('REQUEST_METHOD', '') wikiword = os.environ.get('PATH_INFO', '/') wikiword = wikiword.split("/", 1)[1] wikiword = wikiword.strip() word_anchored_re = re.compile(WIKIWORD_RE) if method == 'POST': ret = create_atom_entry(body) elif word_anchored_re.match(wikiword): if method in ['GET', 'HEAD']: ret = get_atom_entry(wikiword) elif method == 'PUT': ret = put_atom_entry(wikiword, body) elif method == 'DELETE': ret = delete_atom_entry(wikiword) else: ret = report_status(405, "Method not allowed", "") else: ret = report_status(400, "Not a valid WikiWord", "The WikiWord you referred to is invalid.") return ret[1]
Our CGI pulls the HTTP method from the environment variable "REQUEST_METHOD" and the WikiWord from the "PATH_INFO" environment variable. Based on those two pieces of information we dispatch to the correct function. When we process GET we also are careful to respond to HEAD requests too. This is an important point, as the Apache web server will do the right thing with the HEAD response, that is, generate the right headers and send only the headers, discarding the body.
def get_atom_entry(wikiword): filename = getpath(wikiword) base_uri = piki_conf.base_uri if path.exists(filename): issued = last_modified_iso(filename) content = file(filename, 'r').read() else: issued = currentISOTime() content = "Create this page." return (200, ENTRY_FORM % vars())
Where ENTRY_FORM
is defined as:
"""Content-type: application/atom+xml; charset=utf-8 Status: 200 Ok <?xml version="1.0" encoding='utf-8'?> <entry xmlns="http://purl.org/atom/ns#"> <title>%(wikiword)s</title> <link rel="alternate" type="text/html" href="%(base_uri)s/%(wikiword)s" /> <id>tag:dev.bitworking.org,2004:%(wikiword)s</id> <issued>%(issued)s</issued> <content type="text/plain">%(content)s</content> </entry>"""
There are two important points to note about this code. The first is what we do if
the
desired WikiWord does not exist. If we were writing this for a typical CMS, for a GET for an entry that didn't
exist we would normally return with a status code of 404. Wikis, in contrast, when
dealing
with the HTML content, present what appears to be an infinite URI space. That is,
you can
request any URI at a wiki and, as long as you specify a validly formed WikiWord, you
won't
get a 404. Instead you will get a web page that prompts you to enter the content for
that
WikiWord. Go ahead and try it on the PikiPiki wiki that is setup for testing this
implementation of the AtomAPI. This WikiWord currently doesn't have a definition:
http://piki.bitworking.org/piki.cgi/SomeWikiWordThatDoesntExist. To keep
parity with the HTML interface, the AtomAPI interface works the same way.
The second point is character encoding. Note that we state character encoding in two
places
in the response, both in the HTTP header Content-type:
and in the XML
Declaration.
There are two more HTTP methods to handle for the EditURI, DELETE and PUT. PUT is used to update the content for a WikiWord, replacing the existing content with that delivered by the PUT. DELETE is used to remove an entry; it's easy to implement: just delete the associated file.
def delete_atom_entry(wikiword): ret = report_status(200, "OK", "Delete successful.") if wikiwordExists(wikiword): try: os.unlink(getpath(wikiword)) except: ret = report_status(500, "Internal Server Error", "Can't remove the file associated with that word.") return ret
Note that unless something really bad happens, we return with a status code of 200 OK. That is, if the entry doesn't exist then we still return 200. You might be scratching your head if you remember we just talked about our implementation always returning an entry for every valid WikiWord, whether or not it actually had filled in content. That is, if you come right back and do a GET on the URI we just DELETE'd, it will not give you a 404, but instead will return the default filled in entry, "Create this page". Is this a problem? No. It may seem a bit odd, but it's not a problem at all. DELETE and GET are two different, orthogonal requests. There is no guarantee that some other agent, or some process on the server itself, didn't come along and recreate that URI between the DELETE and the GET.
Supporting PUT allows us to change the content of a WikiWord. To make the handling
of XML
easier I've used the Python wrapper for libxml2, an
excellent tool for handling XML, in particular because it let's you use XPath expressions to query XML documents. In this
case we're using them to pull out the content
element.
def put_atom_entry(wikiword, content): ret = report_status(200, "OK", "Entry successfully updated.") doc = libxml2.parseDoc(content) ctxt = doc.xpathNewContext() ctxt.xpathRegisterNs('atom', 'http://purl.org/atom/ns#') text_plain_content_nodes = ctxt.xpathEval( '/atom:entry/atom:content[@type="text/plain" or not(@type)]' ) all_content_nodes = ctxt.xpathEval('/atom:entry/atom:content') content = "" if len(text_plain_content_nodes) > 0: content = text_plain_content_nodes[0].content if len(text_plain_content_nodes) > 0 or len(all_content_nodes) == 0: writeWordDef(wikiword, content) append_editlog(wikiword, os.environ.get('REMOTE_ADDR', '')) else: # If there are 'content' elements but of some unknown type ret = report_status(415, "Unsupported Media Type", "This wiki only supports plain texti") return ret
The detail to notice in the implementation is the XPath used to pick out the
content
element. Content elements may have a 'type' attribute, but if it is
not present then it defaults to 'text/plain'. Since 'text/plain' is the only type
of content
we can support in a wiki, it's the only type of content we'll look for.
That takes care of the EntryURI; we just have the PostURI and FeedURI to go.
PostURI
The PostURI is used for creating new WikiWord entries.
def create_atom_entry(body): wikiword = extractWikiWord(body) if wikiword: if wikiwordExists(wikiword): ret = report_status(409, "Conflict", "An entry with that name already exists.") else: ret = put_atom_entry(wikiword, body) if (ret[0] == 200): ret = (201, CREATED_RESP % {'base_uri': base_uri, 'atom_base_uri': atom_base_uri, 'wikiword': wikiword }) else: ret = report_status(409, "Conflict", "Not enough information to form a wiki word.") return ret
The function 'extractWikiWord' pulls out the contents of the title
element and
converts it into a WikiWord. If we have a good WikiWord and it doesn't already exist,
then
we use 'put_atom_entry' to create it. Otherwise we respond with an HTTP status code
of 409
to indicate that we won't let a POST overwrite an already existing WikiWord.
FeedURI
The FeedURI is the last piece we need to implement. The FeedURI is used by clients to locate the PostURI for creating new entries and the EditURIs for editing each entry. The format of the FeedURI is exactly that of an Atom feed. This is different from the Atom we use with the PostURI and the EditURI, which is just the 'entry' element from Atom. Since the format of the FeedURI is the same as that for a regular feed, you might be tempted to have the same feed for both aggregation and editing. This might work in the case of wiki but not for a general site. The reason is that you may have entries in draft or unpublished form which must appear at the FeedURI so you can edit them, but must not appear in your aggregation feed. Given that this is for a publicly editable wiki, we don't have such a constraint so we can use this feed for both purposes.
The FeedURI is implemented as a separate script, atomfeed.cgi
, that builds a
feed. The code, which is bit too long to include here, builds an Atom feed by sorting
all
the files that contain WikiWord definitions in reverse chronological order, then takes
the
WikiWord and associated content, and formats it in an Atom entry. The entries are
concatenated together and placed in an Atom feed. The only special additions are the
link
elements that contain the PostURI and the EditURIs, which are denoted
with attributes rel="service.post" and rel="service.edit" respectively. Here is a
snippet
from the Atom feed produced by atomfeed.cgi
.
<?xml version="1.0" encoding="utf-8"?> <feed version="0.3" xmlns="http://purl.org/atom/ns#"> <title>PikiPiki</title> <link rel="alternate" type="text/html" href="http:/.bitworking.org.cgi"/> <link rel="service.post" type="application/atom+xml" href="http:/.bitworking.org/atom.cgi"/> <link rel="next" type="application/atom+xml" href="http:/.bitworking.org/atomfeed.cgi/10"/> <modified>2004-03-09T21:32:58-05:00</modified> <author> <name>Joe Gregorio</name> <url>http://bitworking.org/</url> </author> <entry> <title>JustTesting</title> <link rel="service.edit" type="application/atom+xml" href="http:/.bitworking.org/atom.cgi/JustTesting" /> <link rel="alternate" type="text/html" href="http:/.bitworking.org.cgi/JustTesting" /> <id>tag:piki.bitworking.org,2004:JustTesting</id> <issued>2004-03-09T21:32:58-05:00</issued> <modified>2004-03-09T21:32:58-05:00</modified> <content type="text/plain"> This is content posted from an AtomAPI client. </content> </entry> <entry> <title>PikiSandBox</title> <link rel="service.edit" type="application/atom+xml" href="http:/.bitworking.org/atom.cgi/PikiSandBox" /> <link rel="alternate" type="text/html" href="http:/.bitworking.org.cgi/PikiSandBox" /> <id>tag:piki.bitworking.org,2004:PikiSandBox</id> <issued>2004-03-04T21:49:03-05:00</issued> <modified>2004-03-04T21:49:03-05:00</modified> <content type="text/plain"> '''I dare you''': press the Edit button and add something to this page. -- MartinPool </content> </entry>
This feed also contains one more link element of a type we haven't talked about yet.
The
second link, the one with rel="next"
, points to the next set of entries. That
is, when we produce a FeedURI you don't want to put all the entries into a
single feed. That could end up being hundreds if not thousands of entries which would
be
impractical to handle. Instead put in a fixed number, like 20, and then the 'next'
link
points to another feed, with the next 20 entries. If a feed is in the middle of such
a chain
then it also contains a link with rel="prev"
which points to the set of entries
previous to the current one. In this way clients can navigate around the list of entries
in
manageable sized sets. It should be noted here that the client code that comes with
this
implementation does not implement traversing 'next' and 'prev' links in a feed.
The Client
An AtomAPI enabled wiki wouldn't be worth much if there wasn't a client available, so I've included a wxPython client that allows you to create new entries on the wiki and to edit old entries.
Remember how careful we were when specifying and using the character encoding? There isn't much code involved in supporting and processing everything in UTF-8, but careful planning ahead pays dividends. Here is a screenshot of the client editing one of the pages on a wiki with some unicode characters in it:
All of the source for both the client and the server can be downloaded from the EditableWebWiki, which is running the code described above. Note that the client is a GUI application written in Python. You must use the version of wxPython that is compiled with Unicode support. Lastly, for your platform you'll have to ensure that you have fonts available to display the Unicode characters you are going to be using.
Rough Spots
One of the reasons we started using the AtomAPI on a wiki was to stretch the API and
see
where things broke down. Nothing really awful showed up, though we did find some rough
spots. The first rough spot cropped up when doing a GET on the EditURI, where we encounter
a
slight mismatch between the formulation of the AtomAPI and this wiki implementation.
The
problem is that
according to version 9 the draft AtomAPI, when doing a GET on an EditURI, the
issued
element is required. Since PikiPiki only stores the raw
contents in a file, and doesn't store any other data, we are limited to using the
last
modified date stored in the file system for each file, which isn't the same as the
issued
element.
The second rough spot is in the area of content. The only only type of content we accept is 'text/plain', but that isn't the only type of content that a client could post. In fact, most may be able to produce 'text/html' and some may even be able to produce 'application/xhtml+xml'. Now we may be able to add code to this implementation to convert HTML into WikiML, but the broader question still stands: how does a client know what kinds of content, i.e. which mime-types, an AtomAPI server will accept? This is an open question as of today.
Summary
Using Python and the XPath facilities of libxml2, it was straightforward to build
an
AtomAPI implementation for a wiki. There isn't even very much code: atom.cgi
is
just 146 lines of code, while atomfeed.cgi
is just 122 lines.
This is just a basic client that does the minimum to support the AtomAPI. In a future article the way the server handles HTTP can be enhanced to provide significant performance boosts by using the full capabilities of HTTP. In addition, the SOAP enabling of the server will require some changes. After that we can add the ability to edit the wiki's templates.