Wednesday, September 12, 2007

Parallel Python discussion and mod_wsgi.

The sad fact is that many high profile Python developers like to ignore what has been done in relation to the use of Python in conjunction with the Apache web server. I'm not sure whether this is because of a bias towards pure Python solutions, whether they just can't be bothered, or that they simply don't have the time to look properly at what is being done by others. Anyway, the latest comment which shows a lack of understanding of what already exists comes from Guido himself in his blog entry made in response to Bruce Eckel's blog on Python 3K or Python 2.9.

In his blog entry Guido says the following in relation to concurrency in Python:

"
Another route I'd like to see explored is integrating one such solution into an existing web framework (or perhaps as WSGI middleware) so that web applications have an easy way out without redesigning their architecture."

Reality is that this ability to spread a Python based web application across multiple processes already exists with Apache when using mod_python or mod_wsgi. This is because Apache itself (on UNIX at least) is implemented as a multi process web server. As such, incoming requests are distributed across the numerous Apache child processes dealing with requests. When using Apache at least, there is therefore no problem with properly utilising multiple core processors. As I have also blogged before in 'Web hosting landscape and mod_wsgi', the fact that a lot of other stuff is also going on in Apache at the same time, which is not implemented using Python, adds to the fact that for solutions embedding Python into Apache the GIL is not the big issue people think it is.

In addition to solutions such as mod_python and mod_wsgi, which embed Python into the Apache child processors, there are also other solutions such as mod_fastcgi and daemon mode of mod_wsgi, which are able to create multiple distinct daemon processes to which requests are proxied. This again results in requests being distributed across multiple processors.

The WSGI specification even takes into consideration that such multi process web servers exist through the existence of the 'wsgi.multiprocess' flag in the WSGI environment passed to a WSGI application.

Now, it may be the case that Guido more had in mind the ability within a WSGI application, using WSGI midleware, to on forward some subset of URLs to another processes. But then, even this can already be achieved using existing WSGI middleware for proxying requests to another web server. To use such a feature though means making a conscious decision and changing the code of your application, although using something like Paste Deploy may at least limit that to being a configuration change.

In addition to proxy middleware, mod_wsgi also has an ability to divide up an existing monolithic application to run across multiple processes. In the case of mod_wsgi no changes at all need to be made to the structure of the WSGI application. Instead, the mapping of a particular subset of URLs to a distinct process is handled by mod_wsgi even before the specific WSGI application is invoked.

As an example, imagine that one was running Django and wanted all the '/admin' pages to be executed within the context of their own process. To achieve this, all that is required is for the following Apache configuration to be used:

WSGIDaemonProcess django processes=3 threads=10
WSGIDaemonProcess django-admin processes=1 threads=10

WSGIProcessGroup django

WSGIScriptAlias / /usr/local/django/mysite/apache/django.wsgi

<Location /admin>
WSGIProcessGroup django-admin
</Location>
This results in the bulk of the Django application being distributed across 3 multi thread processes. Using a combination of the 'Location' and 'WSGIProcessGroup' directives, the process group to be used for '/admin' URL is then overridden. The result is that any handlers related to '/admin', and URLs underneath that point, are instead executed by a different process.

So, the ability for distributing execution of a Python web application across multiple processes and thereby reducing the impact of the Python GIL already exists. Future changes to mod_wsgi should make this even more flexible, with the introduction of transient daemon processes and an ability to anchor a user session to a specific daemon process using cookies where required.

Wednesday, September 5, 2007

Version 1.0 of mod_wsgi is now available.

Version 1.0 of mod_wsgi is now available and can be downloaded from http://www.modwsgi.org. The package is regarded as being quite stable and therefore suitable for use in production environments.

For those not familiar with mod_wsgi, the aim in developing it is to implement a simple to use Apache module which can host any Python application which supports the Python WSGI interface. In addition, the module should be suitable for use in hosting high performance production web sites, as well as your average personal sites running on commodity web hosting services.

This initial version of mod_wsgi is suitable for use on dedicated systems or virtual private servers. With suitable configuration it could also be used by web hosting companies specialising in providing hosting for Python web applications.

With a bit more work and encouragement, future versions of mod_wsgi will include additional features which should help it to also break into the truly low cost commodity web hosting market where Python is currently sadly lacking as an option. So, stay tuned for more updates.

Sunday, September 2, 2007

Relative popularity of Python web frameworks.

The battle between the different Python web frameworks over which is technically best is always interesting to watch. In watching these battles and monitoring the various discussion forums for each, one gets a pretty good feel as to which at least is winning the popularity contest. All the same it would be nice to see some actual figures to backup the assumptions one makes. One way that I can see of doing this is to look at the number of unique visits to the mod_wsgi documentation describing how to host each framework on top of mod_wsgi. Although there be lots of caveats, the result of doing such an analysis is pretty well what I expected, with Django coming out on top.

The web frameworks (or non frameworks as some like to call themselves) for which instructions are currently provided for mod_wsgi are CherryPy, Django, Karrigell, Pylons, TurboGears and web.py. Instructions for each have all been up for more than a month on the mod_wsgi web site, so for the analysis I have taken the statistics for the month of August. For that period, the number of unique page views against each was as follows:


PackageCount
Django332
Pylons96
TurboGears89
web.py75
CherryPy71
Karrigell34

FWIW, the mod_wsgi documentation also provides instructions for using Trac and MoinMoin on top of mod_wsgi. The number of unique page views for these packages was:


PackageCount
Trac324
MoinMoin44

Although interesting, these results cannot tell the whole picture for a variety of reasons. These include whether or not respective packages actually reference mod_wsgi (or how prominently) as a hosting solution in their own documentation and how often I have personally referred to mod_wsgi on each packages mailing lists as an alternate solution to a particular persons problem.

Beyond those issues, there are actually a number of technically related reasons as to why for a particular package there may not have been as much traffic to the mod_wsgi web site.

The main issue is that although all are capable of being hosted using Apache and mod_wsgi, of the web frameworks only Django promotes strongly the idea that for production sites one should use Apache. At the moment the recommendation in that respect is mod_python, but at least the idea of using Apache is not a foreign concept. Thus for Django, the builtin web server is only seen as being a practical hosting solution for a development instance of Django.

For most of the other packages they instead see the builtin web server they provide as being capable enough to support a production site. Thus, although they may describe or reference other ways of hosting a site developed using the package, the only way that Apache generally factors into the equation is as a proxy to their own web server and as a means of hosting static files.

As far as using a web server implemented in pure Python as opposed to hosting on top of Apache, there does also seem to be a reasonable amount of bias against using Apache. In part this appears to be due to some ignorance as to the pros and cons of using Apache and how to set it up properly, but also partly because of Python zealotry. In other words, just like every programming language, some are so strongly enamoured by Python that they simply cannot except that there are other ways of doing things.

Such a pro Python only stance could actually be seen as being detrimental to the chances of Python being accepted within commodity web hosting. This is because commodity web hosting companies will not find it acceptable that they would have to setup and support pure Python back end web server applications to which they merely proxy requests.

Instead, commodity web hosting want a system that can be easily integrated with their existing Apache installations (normally setup for PHP), yet doesn't place undue memory requirements and overhead on Apache. Above all, the ability to provide hosting for Python web applications must be very simple to configure and fit in with the large scale automated systems they have for configuring the many sites they would host using one Apache installation.

Thus in some respects, packages which try to steer developers to always using the builtin web server are only going to make it harder for that package to be accepted by web hosting companies. Some thought must be given to ensuring that packages are easy to deploy and setup under Apache in a web hosting environment. If this is not done, then you will not see those packages being supported by web hosting companies and as a result people will simply move to those packages which have put in the effort to make it easy to deploy under Apache.