Introducing WSGI: Python's Secret Web Weapon
September 27, 2006
Web Server Gateway Interface Part I: Getting Started
The recent Python 2.5 release
features the addition of the Web Server Gateway Interface Utilities and Reference
Implementation package (wsgiref
) to Python's standard library.
In this two-part article, we will look at what the Web Server Gateway Interface is, how to use it to write web applications, and how to use middleware components to quickly add powerful functionality. Before diving into these topics, we will also take a brief look at why the specification was created in the first place.
The Many Frameworks Problem
Python is a great language for web development. It is straightforward to learn, has a broad and powerful standard library, and benefits from an active community of developers who maintain a range of XML and database tools, templating languages, servers, and application frameworks. In 2003, when the Web Server Gateway Interface specification was drawn up, the Python community also had one major problem. It was often easier for developers to write their own solutions to web-development problems from scratch rather than reusing and improving existing projects. This resulted in a proliferation of largely incompatible web frameworks. If developers wanted a full and powerful solution, they could use Zope, but many of them coming into the community in search of a lightweight framework found it hard to know where to start.
Developers within the Python community quickly recognized that it would be preferable to have fewer and better-supported frameworks, but since each framework had its strengths and weaknesses at the time, none stood out as a clear candidate for adoption.
In the Java world, the servlet architecture meant that applications written with one framework could run on any server supporting the servlet API. The Web Server Gateway Interface (often written WSGI, pronounced "whiskey") was designed to bring the same interoperability that the Java world enjoyed to Python, and to go some way toward unifying the Python web-framework world without stifling the diversity.
The full specification is defined in PEP 333. (PEP is an acronym for Python Enhancement Proposal.) The abstract of the PEP sums up the specification's goals very clearly, stating:
"This document specifies a proposed standard interface between web servers and Python web applications or frameworks, to promote web application portability across a variety of web servers."
Most Python web frameworks today have a WSGI adapter, and most server technologies (Apache, mod_python, FastCGI, CGI, etc.) can be used to run WSGI applications, so the vision of web-application portability is fast becoming a reality.
The separation of server from application has the clear benefit that application developers supporting the API do not have to worry about how their applications are served, and server developers need only create one interface to support the majority of Python web applications. Simply supporting the WSGI is enough for server and application developers to guarantee a large degree of interoperability.
In the next sections, we will look at how to develop and deploy WSGI applications; in Part II of this article, we will look at how to use middleware components to provide facilities such as session handling, interactive debugging, and much more.
The HTTP Protocol
The Web is largely built on the HTTP protocol, so to understand the WSGI, it is essential to understand how the Web works at the HTTP protocol level.
It is very useful to be able to see the HTTP information being sent back and forth
between
a web browser and a server. One good tool for doing this is the LiveHTTPHeaders extension for the Firefox web browser which, when loaded, is
visible in the sidebar and displays the HTTP information sent and received on each
request.
Once it's installed, you can select View->Sidebar->LiveHTTPHeaders
from
the menu to load the extension in the sidebar.
When you request a page, the browser sends an HTTP request to the server. When the server receives that request, it will perform some action (typically running an application or serving a file) and return an HTTP response. Below is the HTTP information sent when visiting the PEP 333 page:
Figure 1. HTTP Request
and here is the response returned:
Figure 2. HTTP Response
As you can see, the browser sends quite a lot of information to the server. The server
makes this and other information available to the application. The application then
returns
a status code (in this case, 200 OK
) along with any HTTP headers it wishes to
send back. Of particular note is the Content-type
header, which tells the
browser what sort of content is going to be sent (in this case, text/html
).
Finally, the application returns the content that will be displayed by the browser
(in this
case, the HTML that makes up the page). The server may add extra HTTP headers or perform
other modifications before the response is returned.
Hello World!
Here is a simple CGI application that would produce a Hello World!
message in
the browser:
print "Content-Type: text/html\n\n" print "<html><body>Hello World!</body></html>"
and here is the same application as a WSGI application:
def application(environ, start_response): start_response('200 OK',[('Content-type','text/html')]) return ['<html><body>Hello World!</body></html>']
At first glance, this looks a little complicated, so let's think about what information a server needs from an application and vice versa. First of all, an application needs to know about the environment in which it is running. In a CGI application, you can obtain this information with this code:
import os environ = os.environ
In a WSGI application, this information is supplied directly by the WSGI server as
the
first parameter to the application callable; in our example above, this parameter
is named
environ
.
The server also needs to know the status and headers to set before any content is
returned
to the browser, so it supplies a second argument to the application -- a callback
function
taking the status and a list of tuple pairs of headers as arguments. In our example,
it is
named start_response
. Our application calls start_response()
to
set a status of 200 OK
, which means everything went fine, and to set the
Content-type
header to text/html
.
Finally, the server needs to know the content to return to the browser. Rather than
requiring an application to return all the content in one go, the server iterates
over the
application, returning data to the browser as the application returns it. In our example,
this result is achieved by returning a simple list of strings containing the HTML
to display
the Hello World!
message.
In summary then, a WSGI application is any callable (in our case, a simple function)
taking
an environment dictionary and start_response
callable as positional parameters.
The application should call start_response()
with a status code and a list of
tuple pairs of headers before it returns an iterable response (in our case, a list
with just
one string). In normal circumstances, applications can only call
start_response()
once; after all, it wouldn't make a lot of sense to start
the response twice.
These requirements for applications can also be met by generators, classes that
override the __call__()
method or the __iter__()
method. It is
possible to use these as WSGI applications instead of following the example above
and using
a simple function.
The WSGI specification also discusses a third parameter to start_response()
named exc_info
, used in error handling, and a writable object returned by
start_response()
, used for backward compatibility -- not for use in new
applications or frameworks. (We do not need to worry about these details, but they
are
mentioned for completeness).
Testing the Application
We will start by running our test application as a CGI script. Create a new file named
test.py
and add the following to it:
def application(environ, start_response): start_response('200 OK',[('Content-type','text/html')]) return ['<html><body>Hello World!</body></html>'] from wsgiref.handlers import CGIHandler CGIHandler().run(application)
If you are running a version of Python prior to 2.5, you will need to download and install the wsgiref
package. You can do this by extracting the wsgiref-0.1.2.zip
file (or most
recent update) and executing the command:
> python setup.py install
You can then run the test application from the command line to see what information it outputs:
> python test.py
You will see the following:
Figure 3. Test output (Click for full-size image).
Notice that the CGIHandler
has acted the way any other CGI server would, and
added the Status
and Content-Length
to the output.
Of course, one of the original motivations for the WSGI was to run the same application
on
multiple WSGI servers without modification. As an example, this is how you would serve
the
same application using FastCGI instead of CGI, using the flup
package:
from flup.server.fcgi import WSGIServer WSGIServer(application).run()
You can also run this same, unmodified application on all other WSGI-compliant servers.
The environ
Dictionary
We've seen how to write a simple WSGI application and deploy it as a CGI script or
use it
with FastCGI, but to write any real application, we'll need access to the environment.
This
is provided by the first parameter passed to the application by the server, the
environ
dictionary.
The environ
dictionary contains all the information about the environment and
the request that the application needs. Below is an application named
show_environ
that displays a list of all the keys in the environ
dictionary:
def show_environ(environ, start_response): start_response('200 OK',[('Content-type','text/html')]) sorted_keys = environ.keys() sorted_keys.sort() return [ '<html><body><h1>Keys in <tt>environ</tt></h1><p>', '<br />'.join(sorted_keys), '</p></body></html>', ] from wsgiref import simple_server httpd = simple_server.WSGIServer( ('',8000), simple_server.WSGIRequestHandler, ) httpd.set_app(show_environ) httpd.serve_forever()
This example serves the application using a standalone server. If you run this example,
you
will be able to see the output from the application by visiting
http://localhost:8000. You will notice that environ
contains all the
usual keys you would expect to see in a CGI environment, such as PATH_INFO
,
REMOTE_ADDR
, etc. It also contains the Web Server Gateway Interface keys
wsgi.errors
, wsgi.file_wrapper
, wsgi.input
,
wsgi.multiprocess
, wsgi.multithread
, wsgi.run_once
,
wsgi.url_scheme
, and wsgi.version
, which provide information
about the WSGI environment. The use of these variables is described in the environ variables
section of the specification.
The wsgi.errors
key contains an output stream (filelike object). Any
information written to the output stream is sent by the server to its error log. This
can be
useful for debugging purposes, as well as issuing errors or warnings, and it can be
used
like this:
environ['wsgi.errors'].write('This will be sent to the error log')
Adding this line to the example above, restarting the server, and refreshing the page outputs the message to the console as expected:
Figure 4. Console (Click for full-size image).
From the information in environ
, you can build any web application, and tools
already exist to simplify common tasks, such as extracting the variables submitted
in a form
from the QUERY_STRING
or rebuilding the original URL. Have a look at Paste for the lower-level tools or Pylons for a full WSGI framework.
The environ
dictionary can actually contain one other type of key: a key
defined by the server or a middleware component. Such keys are named using only lowercase
letters, numbers, dots, and underscores, and are prefixed with a name that is unique
to the
defining server or middleware. In Part II of this article, we will look at WSGI middleware
and how to add it to existing applications. We will also see an example of a middleware
component that adds a key to the environ
dictionary to provide session
functionality to an application.