Friday, March 21, 2008

Version 2.0 of mod_wsgi is now available.

Due to the arrival of a baby 1.0, version 2.0 of mod_wsgi has been a bit slower in coming than originally planned, but the wait is now over and it is available for download. The major improvements in version 2.0 of mod_wsgi are detailed below, but there are various other little goodies as well, so check out the change notes on the wiki.


Process Reloading Mechanism

When using daemon mode of mod_wsgi and the WSGI script file for your application is changed, the default behaviour is to now restart the daemon processes automatically upon receipt of the next request against that application. Thus, when making changes to any code or configuration data for your application, all you now need to do is touch the WSGI script file and the daemon processes for just that application will be automatically restarted and the application reloaded. This means that it is no longer necessary to send signals explicitly to the daemon processes, or restart the whole of Apache. This means that elevated privileges are not required by users and applications owned by other users in a shared hosting environment will not be affected when one users application is restarted.

Apache Authentication Provider

When using Apache 2.2, mod_wsgi provides the means to implement Apache authentication providers in Python. This means that password authentication for HTTP Basic and Digest authentication, plus other custom authentication mechanisms implemented by other Apache modules, can be delegated to your Python application. This for example can be used to implement HTTP authentication for a Trac instance against a user database maintained within a Django instance running on the same site. If using Apache 2.0 the mechanism is also available, but only in support of standard Apache HTTP Basic authentication.

Python Virtual Environments

More integrated support for Python virtual environments such as 'virtualenv' is now provided. These changes make it possible for different daemon process groups to be easily associated with distinct Python virtual environments. Where daemon process groups are being setup for different users, or to separate different applications, the use of Python virtual environments means that each can use different versions of modules or packages and not interfere with each other.

WSGI File Wrapper Extension

Support for 'wsgi.file_wrapper' extension has been added with operating system mechanisms such as sendfile() and mmap() being used when possible to speed up sending of any data back to a client. Provided an application is written to use this optional extension, then serving up of static files by the application should be greatly improved.

Daemon Mode Now Even Faster

Some underperforming code related to the socket used to communicate between the Apache child processes and the daemon processes has been replaced. This has result in a 40% improvement in base level performance for a simple hello world program. This means that daemon mode now performs even faster relative to competing solutions. Do remember though that the network level is usually never the bottleneck and it is the Python application and database queries where things slow down. Thus, although it is quicker, in the grander scheme of things the improvement wouldn't be noticed in most applications.

Sunday, November 18, 2007

Version 1.3 of mod_wsgi is now available.

Version 1.3 of mod_wsgi is a bug fix only release, addressing issues with mod_wsgi daemon processes hanging under certain conditions. It is highly recommended that users of mod_wsgi who use daemon mode of mod_wsgi upgrade to this new version. A third release candidate for version 2.0 is also now being made available incorporating the same fix and adding an additional feature to detect errant Python C extension modules that don't release the Python GIL when running long, potentially blocking, operations.

Sunday, November 11, 2007

Poor man's Python virtual environment.

There have now been a number of attempts at implementing virtual environments for Python. That is, providing a means of having multiple isolated environments for the one Python installation on a system, such that it would be possible to run different applications on the same system, but using different sets of installed Python packages. The prime standalone examples of these are virtual-python, workingenv and virtualenv.

It may well just be that I choose to use MacOS X with the older Python 2.3.5 that comes with the operating system, but even with the more recent virtualenv, it just doesn't seem to always want to work properly. Even when I have ensured that PATH includes first the 'bin' directory for the virtual environment, such that the environment specific versions of tools such as 'easy_install' are found first, for whatever reason, some packages will still want to write back into the operating system '/Library/Python' directory when I wouldn't expect them to. I also don't seem to be alone in having such problems as evidenced by comments to Ian Bicking's blog where he originally announced virtualenv as a replacement for workingenv.

Now, I will admit that I still haven't found time to properly dig into the internals of Python eggs and so may be missing something, but for creating Python virtual environments, what I don't understand is why simply setting the PYTHONHOME environment variable isn't sufficient to get it all working for the typical case. Yes it means that the environment variable has to always be set, as well as PATH including the 'bin' directory for the virtual environment, but it avoids the idiosyncrasies arising from the way that Python tries to work out where the installed Python library directory is.

To understand what the PYTHONHOME environment variable is all about, one has to consult the source code comments in the file 'Modules/getpath.c' of the Python source code, as any other online documentation seems to be rather lacking. Read the source code as well as the comments and you will find that Python goes through a number of steps to try and determine where the Python lib directory is located when it is run. These can be summarised as:

  1. Look relative to argv[0] to determine if being run out of Python source code build directory.
  2. Consult the PYTHONHOME environment variable for directory prefix corresponding to where Python was installed.
  3. Look relative to argv[0] to determine if being run out of Python installation directories. If argv[0] isn't an absolute path, search PATH for the executable which was used and look relative to that instead.
  4. Look relative to directory prefix where Python was supposed to have been installed.
The way the virtualenv appears to work is that it tries to set things up so that step 3 still applies.

On MacOS X things gets a bit tricky though, as it doesn't use the method exactly as described. Instead it seems that an absolute path to the Python framework encoded into the executable itself is somehow used. This means that to get virtualenv to work, it is necessary for it copy the Python executable and then use the MacOS X 'install_name_tool' program to change where the Python executable is picking up the Python framework from else it will continue to use the Python lib directory from the original Python installation.

Windows also doesn't follow the rules either, with the location of the Python DLL somehow determining where the Python lib directory needs to be.

Either way, once Python has found what it believes is the location of the Python lib directory it will use that and will skip the subsequent steps.

Now, although how step 3 works is different based on what platform you are running on, step 2 is the same. As such setting the PYTHONHOME environment seems to be a simpler and more deterministic way of specifying where the Python lib directory is located, avoiding the need to perform fixups to the Python executable on MacOS X.

As to how to setup a Python virtual environment based on using the PYTHONHOME environment variable, for UNIX based systems it is just a matter of creating a parallel copy of the installed Python installation using symlinks. In some respects it is therefore quite similar to the original virtual-python, except that the Python executable itself is also just a symlink and not a copy.

On MacOS X with Python 2.3 the required steps would therefore be:

mkdir $HOME/pythonenv
cd $HOME/pythonenv

mkdir -p ENV1/bin
mkdir -p ENV1/include
mkdir -p ENV1/lib/python2.3

ln -s /usr/bin/python2.3 ENV1/bin/
ln -s python2.3 ENV1/bin/python

ln -s /usr/include/python2.3 ENV1/include/

for i in /usr/lib/python2.3/*; do ln -s $i ENV1/lib/python2.3/; done

rm ENV1/lib/python2.3/site-packages
mkdir ENV1/lib/python2.3/site-packages

To use the virtual environment the 'bin' directory would be added to the head of your PATH and the PYTHONHOME environment variable set.

PATH="$HOME/pythonenv/ENV1/bin:$PATH"
export PATH

PYTHONHOME="$HOME/pythonenv/ENV1"
export PYTHONHOME

Note that it is the specific intent here that the 'site-packages' directory from the original Python installation is ignored. It would therefore be necessary to reinstall all required packages, including 'setuptools', once the PATH and PYTHONHOME variables had been setup.

Obviously, setting PYTHONHOME has implications if you want to run scripts from one Python application which are themselves standalone Python scripts which refer to a different Python virtual environment. Other issues come up if trying to run scripts which use a completely different version of Python. As such, this poor man's version of Python virtual environments isn't going to work for everyone, but for what I am doing with web applications and mod_wsgi it works fine, not giving me the problems that virtualenv does on MacOS X.

How exactly Python virtual environments, of any variety, can be used with mod_wsgi and how mod_wsgi version 2.0 has been enhanced to make it all reasonably simple to manage I'll cover in a subsequent blog entry. If you can't wait, then you can also check out a non sanitised version of a description about it on the mod_wsgi user group.

Wednesday, October 31, 2007

Version 1.2 of mod_wsgi is now available.

Mark beat me to the punch again and got word out about mod_wsgi 1.2 before I myself got a chance to sit down and blog about it. I'll have to start paying him as my publicist soon.

Version 1.2 of mod_wsgi is a bug fix only release, addressing issues with WSGI specification compliance, sub process invocation from Python in a mod_wsgi daemon process and most importantly of all, an issue whereby a second sub interpreter instance could be created for each WSGI application group when targeted by a specifically formed URL.

This latter issue of a second sub interpreter being created only affects users of Apache 1.3 and 2.0. Because it can have the affect of doubling the memory in use by the application, it is highly recommended that users of these Apache versions upgrade to mod_wsgi 1.2, given that in a memory constrained environment the bug could be exploited as a form of remote denial of service attack.

At the same time as mod_wsgi 1.2 has been released, the first release candidate for mod_wsgi 2.0 has also been released. This version provides a number of new features including, integration with Apache authentication and authorisation mechanisms in Apache 2.2, a new process reloading option for mod_wsgi daemon processes which makes reloading a Python application when changes are made trivial, and direct support for Python virtual environments such as workingenv and virtualenv. I'll blog about these and other new features in mod_wsgi 2.0 in the coming weeks.

If you want to discuss any of the new mod_wsgi 2.0 features in the mean time, check out the change notes or pop on over to the mod_wsgi Google Group.

Sunday, October 14, 2007

Google hates mod_wsgi.

According to Google Analytics, 99% of all search engine traffic landing at the mod_wsgi site is via Google. As a consequence, the results are going to be pretty significant when Google, for no good reason that I can see, recently dropped the mod_wsgi site home page from its search results. What makes even less sense is that mod_wsgi is hosted on the Google Code hosting service.

Hopefully the famed Google page rank algorithm will wake from its slumber at some point and start listing it again, otherwise it is going to make it hard for people to find the site. More worrying is that it looks like it might be starting to drop individual pages on the site as well, as searching for quite specific terms which appear within the site pages aren't showing up either, although they did previously.

Is this Google's way of getting retribution on me for grumbling so much about how the search on Google groups has been stuffed up so often of late and is always quite far behind with its search results, or is it just symptomatic of the rot starting to set in at Google. :-(

Monday, October 1, 2007

Version 1.1 of mod_wsgi is now available.

Version 1.1 of mod_wsgi is now available and can be downloaded from http://www.modwsgi.org. This is a bug fix release only and no new features are included. Two main problems addressed are possibility of processes crashing when multiple threads hit race condition on sending output via sys.stdout/sys.stderr, and conflict with the Apache mod_logio module which would result in mod_wsgi daemon processes crashing. A description of all changes in this version can be found in the change notes. Updating to this version is recommended for all users.

Wednesday, September 12, 2007

Parallel Python discussion and mod_wsgi.

The sad fact is that many high profile Python developers like to ignore what has been done in relation to the use of Python in conjunction with the Apache web server. I'm not sure whether this is because of a bias towards pure Python solutions, whether they just can't be bothered, or that they simply don't have the time to look properly at what is being done by others. Anyway, the latest comment which shows a lack of understanding of what already exists comes from Guido himself in his blog entry made in response to Bruce Eckel's blog on Python 3K or Python 2.9.

In his blog entry Guido says the following in relation to concurrency in Python:

"
Another route I'd like to see explored is integrating one such solution into an existing web framework (or perhaps as WSGI middleware) so that web applications have an easy way out without redesigning their architecture."

Reality is that this ability to spread a Python based web application across multiple processes already exists with Apache when using mod_python or mod_wsgi. This is because Apache itself (on UNIX at least) is implemented as a multi process web server. As such, incoming requests are distributed across the numerous Apache child processes dealing with requests. When using Apache at least, there is therefore no problem with properly utilising multiple core processors. As I have also blogged before in 'Web hosting landscape and mod_wsgi', the fact that a lot of other stuff is also going on in Apache at the same time, which is not implemented using Python, adds to the fact that for solutions embedding Python into Apache the GIL is not the big issue people think it is.

In addition to solutions such as mod_python and mod_wsgi, which embed Python into the Apache child processors, there are also other solutions such as mod_fastcgi and daemon mode of mod_wsgi, which are able to create multiple distinct daemon processes to which requests are proxied. This again results in requests being distributed across multiple processors.

The WSGI specification even takes into consideration that such multi process web servers exist through the existence of the 'wsgi.multiprocess' flag in the WSGI environment passed to a WSGI application.

Now, it may be the case that Guido more had in mind the ability within a WSGI application, using WSGI midleware, to on forward some subset of URLs to another processes. But then, even this can already be achieved using existing WSGI middleware for proxying requests to another web server. To use such a feature though means making a conscious decision and changing the code of your application, although using something like Paste Deploy may at least limit that to being a configuration change.

In addition to proxy middleware, mod_wsgi also has an ability to divide up an existing monolithic application to run across multiple processes. In the case of mod_wsgi no changes at all need to be made to the structure of the WSGI application. Instead, the mapping of a particular subset of URLs to a distinct process is handled by mod_wsgi even before the specific WSGI application is invoked.

As an example, imagine that one was running Django and wanted all the '/admin' pages to be executed within the context of their own process. To achieve this, all that is required is for the following Apache configuration to be used:


WSGIDaemonProcess django processes=3 threads=10
WSGIDaemonProcess django-admin processes=1 threads=10

WSGIProcessGroup django

WSGIScriptAlias / /usr/local/django/mysite/apache/django.wsgi

<Location /admin>
WSGIProcessGroup django-admin
</Location>
This results in the bulk of the Django application being distributed across 3 multi thread processes. Using a combination of the 'Location' and 'WSGIProcessGroup' directives, the process group to be used for '/admin' URL is then overridden. The result is that any handlers related to '/admin', and URLs underneath that point, are instead executed by a different process.

So, the ability for distributing execution of a Python web application across multiple processes and thereby reducing the impact of the Python GIL already exists. Future changes to mod_wsgi should make this even more flexible, with the introduction of transient daemon processes and an ability to anchor a user session to a specific daemon process using cookies where required.