Dispatching in a REST Protocol Application
August 17, 2005
In my last column I covered how to dispatch based on mime type. That's only part of the dispatching story. The other two major pieces of information to use when dispatching are the HTTP method and the URI. Let's look at the HTTP method first.
The first and easiest way to handle the HTTP method is to handle different methods
within
the same CGI application. When a request comes in, the HTTP method is passed in via
the
REQUEST_METHOD
environment variable. We can just look up the right
handler:
#!/usr/bin/python import os method = os.environ.get('REQUEST_METHOD', 'GET') print "Content-type: text/plain" print "" if method in ['GET', 'HEAD']: print method.lower() elif method == 'POST': print method.lower()
That's not our only choice though, because we are being RESTful: using only a handful of methods with well-defined semantics, we can dispatch based on method at completely different levels.
We can dispatch requests to the same URI to different handlers based on the method.
For
example, let's pretend we have different CGI applications, one for each of the methods
we
use: get.cgi
, post.cgi
, put.cgi
,
delete.cgi
.
And further assume that these CGI applications are located in the directory
/myresource/
. A GET
request to /myresource/
needs
to be dispatched to /myresource/get.cgi
, and a POST request to
/myresource/
needs to be dispatched to /myresource/post.cgi
,
etc. This is easy to do with Apache's mod_rewrite.
Ok, "easy to do" is a little misleading. Many things are easy in mod_rewrite because it is so powerful. But any powerful module can also be dangerous. Ask me sometime about that recursive mod_rewrite rule I wrote that brought my shared host to its knees.
The great thing about mod_rewrite is it gives you all the configurability and flexibility of Sendmail. The downside to mod_rewrite is that it gives you all the configurability and flexibility of Sendmail.
-- Brian Behlendorf, Apache Group
One of the many things that mod_rewrite can do is rewrite a URI based on those same environment variables we used in our CGI application. Here is a .htaccess file that does the above rewriting of URIs based on the request method:
RewriteEngine On RewriteBase /myresource/ RewriteCond %{REQUEST_METHOD} ^GET$ RewriteRule (^.*$) get.cgi [L] RewriteCond %{REQUEST_METHOD} ^POST$ RewriteRule (^.*$) post.cgi [L]
Lather, rinse, and repeat for each method that you want to support. Not only can
the
requests be redirected to other CGI applications on the local server, they can be
redirected
to a completely different server. That means we could parcel out requests across many
servers, distributing the load. Of course, that means that this server is acting as
a
reverse proxy, for which there's already an Apache module: mod_proxy
. But we
won't go there right now.
The Simplest Thing that Could Possibly Work
When it comes to dispatching on the URI, we'll start with the simplest thing that can possibly work: a single CGI application that handles all of our requests. As our service grows wildly in popularity we may have to change how we dispatch, but more on that later.
Dispatching in Python
Let's build a simple Python module to help dispatch incoming requests. The design goals for this module are:
- Not a framework, just a library.
- Provide a simple function robustly.
- Allow dispatching based on method and mime type.
The module dispatch.py
defines a single class for dispatching:
BaseHttpDispatch
. To use the module just subclass
BaseHttpDispatch
and define your own member functions for the types of
requests that you want to handle. Dispatching is a matter of instantiating the derived
class
and calling dispatch()
with the requested method and media range. For example,
if you wanted to handle POST
s with a mime type of
application/xbel+xml
and have those requests routed to a member function
called POST_xbel()
, here is the class you would define:
class MyHandler(BaseHttpDispatch): def __init__(self): BaseHttpDispatch.__init__(self,\ {'application/xbel+xml':'xbel'}) def POST_xbel(self): pass
Note how the __init__()
function of BaseHttpDispatch
takes a
mapping from application/xbel+xml
to xbel
. That mapping is used
when looking up the function name to call:
handler = MyHandler() handler.dispatch('POST', 'application/xbel+xml')
This will call POST_xbel()
member function. You can handle any number of mime
types and methods, and even create fallback functions, such as
def GET(self): pass
This will get called if no other GET
function with a mime-type specifier
matches.
Here is the dispatch.py
module:
class BaseHttpDispatch: """Dispatch HTTP events based on the method and requested mime-type""" def __init__(self, mime_types_supported = {}): """mime_types_supported is a dictionary that maps supported mime-type names to the shortened names that are used in dispatching. """ self.mime_types_supported = mime_types_supported def nomatch(self, method, mime_type): """This is the default handler called if there is no match found. Overload to add your own behaviour.""" return ({"Status": "404 Not Found", "Content-type": "text/plain"}, StringIO("The requested URL was not found on this server.")) def exception(self, method, mime_type, exception): """This is the default handler called if an exception occurs while processing.""" return ({"Status": "500 Internal Server Error", "Content-type": "text/plain"}, StringIO("The server encountered an unexpected condition\ which prevented it from fulfilling the request.")) def _call_fn(self, fun_name, method, mime_type): try: return getattr(self, fun_name)() except Exception, e: return self.exception(method, mime_type, e) def dispatch(self, method, mime_type): """Pass in the method and the mime-type and the best matching function will be called. For example, if BaseHttpDispatch is constructed with a mime type map that maps 'text/xml' to 'xml', then if dispatch is called with 'POST' and 'text/xml' will first look for 'POST_xml' and then if that fails it will try to call 'POST'. Each function so defined must return a tuple (headers, body) where 'headers' is a dictionary of headers for the response and 'body' is any object that simulates a file. """ returnValue = ({}, StringIO("")) if mime_type and self.mime_types_supported: match = mimeparse.best_match(self.mime_types_supported.keys(), mime_type) mime_type_short_name = self.mime_types_supported.get(match , '') else: mime_type_short_name = "" fun_name = method + "_" + mime_type_short_name if fun_name in dir(self) and callable(getattr(self, fun_name)): returnValue = self._call_fn(fun_name, method, mime_type) elif method in dir(self) and callable(getattr(self, method)): returnValue = self._call_fn(method, method, mime_type) else: returnValue = self.nomatch(method, mime_type) return returnValue
Example
Let's stub out our bookmark service using dispatch.py
. Here are the target
URIs we want to handle:
URI |
Type of Resource |
Description |
---|---|---|
[user]/bookmark/[id]/ |
Bookmark |
A single bookmark for "user." |
[user]/bookmarks/ |
Bookmark Collection |
The 20 most recent bookmarks for "user." |
[user]/bookmarks/all/ |
Bookmark Collection |
All the bookmarks for "user." |
[user]/bookmarks/tags/[tag] |
Bookmark Collection |
The 20 most recent bookmarks for "user" that were filed in the category "tag." |
[user]/bookmarks/date/[Y]/[M]/ |
Bookmark Collection |
All the bookmarks for "user" that were created in a certain year [Y] or month [M]. |
[user]/config/ |
Keyword List |
A list of all the "tags" a user has ever used. |
We'll assume that there is a single CGI application, bookmark.cgi
, that
handles all of these URIs. So, for example, our first URI is
http://example.com/bookmark.cgi/[user]/bookmark/[id]/
We'll subclass BaseHttpDispatch
for each of the types of URIs. Here is the
class that will handle the Bookmark URI:
class Bookmark(BaseHttpDispatch): def __init__(self): BaseHttpDispatch.__init__(self, {'application/xbel+xml':'xbel'}) def GET_xbel(self): pass def PUT_xbel(self): pass def DELETE(self): pass
This is just a stub, and when we come back to fill out this class we'll replace the
stubbed
code with code that actually, you know, does something. Here is the class that handles
the
[user]/bookmarks/
resource:
class Bookmarks(BaseHttpDispatch): def __init__(self): BaseHttpDispatch.__init__(self, {'application/xbel+xml':'xbel'}) def GET_xbel(self): pass def POST_xbel(self): pass
You get the idea. The last, missing piece is mapping from URIs into instances of
our
classes. We can do this by picking the class to instantiate based on the path. Remember
at
the beginning of this article I said that the path components after the CGI application
come
in on the PATH_INFO
environment variable. Just look at PATH_INFO
,
figure out which class to instantiate, and then call dispatch()
on it. We'll
leave that bit of code as an exercise.
Given our simple dispatching class we actually have lots of different ways that we can break up our service. For example, consider this URI from our bookmark service:
http://example.org/bookmark/[user]/bookmarks/date/[Y]/[M]/
We can implement this in any of the following ways:
http://example.org/bookmark.cgi/[user]/bookmarks/date/[Y]/[M]/ http://example.org/bookmark/[user]/bookmarks.cgi/date/[Y]/[M]/ http://example.org/bookmark/[user]/bookmarks/date.cgi/[Y]/[M]/
And don't get too hung up on the fact that [user] comes so early in the URI; we can also use mod_rewrite to move the user to the end, or even tack it on the end as a query parameter to the CGI application that ultimately gets called.
More from |
Implementing the Atom Publishing Protocol httplib2: HTTP Persistence and Authentication Doing HTTP Caching Right: Introducing httplib2 |
There are two things to note here. First, even though we are using a .cgi
extension for our CGI applications, we really don't have to have that as our exposed,
public
URI. We can use mod_rewrite to mask the extension. That's a good idea since we
don't want our URI structure beholden to our current implementation details.
The second thing to note is that all of this flexibility didn't just land in our laps out of sheer luck. That power and flexibility came by deciding up front to design a RESTful service that uses separate resources for all the things in our service. Once we make that decision then we get a service that has a lot of options for scaling.
Scaling Up
Now our initial implementation was simple and put all the functionality into a single CGI application. Here are some ways we can modify our bookmark service to handle increased load.
Make some content static. Apache is fantastic at serving up static content, so one
way to optimize the system would be to keep static versions of frequently requested
resources and to route GET
requests to those static versions.
-> (POST,PUT,DELETE) -> /bookmark.cgi (GET) -> /some-static-document-uri
Mod_proxy. Remember I mentioned mod_proxy
earlier? That can be used to
distribute the requests over a group of machines.
-> [reverse proxy] -> server1/bookmark.cgi -> server2/bookmark.cgi -> server3/bookmark.cgi
That might work if each bookmark collection could be updated from any server. If not,
then
distribute the GET
s while keeping all the PUT
s,
POST
s, and DELETE
s to one server.
-> [reverse proxy] -> POSTPUTDELserver/bookmark.cgi -> GETserver1/bookmark.cgi -> GETserver2/bookmark.cgi -> GETserver3/bookmark.cgi
These aren't the only ways to handle an increased load. Part III: Advanced Setup and Performance of the mod_perl User Guide is a good starting point for learning the options, and the pros and cons, of each strategy.
Source
The source for dispatch.py is, of course, freely available.