- http://www.modpython.org
/pipermail/mod_python/2006 -December/022841.html - http://www.modpython.org
/pipermail/mod_python/2006 -December/022881.html
As I described in those posts I saw mod_wsgi as only being a first step. Before considering again what one might do beyond mod_wsgi though, it is worthwhile to look at what mod_wsgi has become and how the result fits into the web hosting landscape. In particular, does it actually have the potential to improve the lot of Python developers by providing a compelling solution which will be attractive to companies providing commodity web hosting.
To understand this, one needs to look at what features mod_wsgi provides and specifically the two different modes of operation that have been implemented.
The first mode of operation I tend to refer to as 'embedded' mode. This is where your Python web application runs in the context of the standard Apache child processes. At least in terms of how Python sub interpreters are used, this is the same as how things work with mod_python. Thus, if you have both mod_python and mod_wsgi loaded, applications running under each will share the same process, although they generally would at least run in distinct Python sub interpreters. As far as sharing goes, the process may also be host to PHP or mod_perl applications as well.
Running applications in the Apache child processes would generally always result in the best performance possible when compared to other alternatives available for using Python with Apache such as mod_fastcgi and mod_scgi or even a second web server behind mod_proxy. This is because the Python application is running in the same process that is accepting and performing the initial parsing of the request from a client. In other words, overhead is as low as it can be as everything is done together in the one process.
In addition to the low overhead, there are also other positive benefits deriving from how Apache works when using this mode. The first is that Apache uses multiple child processes to handle requests. As a result, any contention for the Python GIL within the context of a single process is not an issue, as each process will be independent. Thus there is no impediment when using multi processor systems.
That said, the GIL is not as big a deal as some people make out, even when using Apache with only one multi-threaded child process for accepting requests. This is because the code which handles accepting of requests, determines which Apache handler should process the request, along with the code for reading the request content and writing out the response content, is all written in C and is in no way linked to Python. As a consequence there are large sections of code where the GIL is not being held. On top of that, the same web server may also be serving up static files where again the GIL doesn't even come into the picture. So, more than enough opportunity for making good use of those multiple processors.
The second major benefit comes from Apache's ability to scale up to meet increases in load. The way this works is that Apache will only initially create a certain number of child processes to handle requests. If however the number of requests builds up to the point that the processes wouldn't be able to keep pace, Apache will create additional child processes to meet the demand. It will keep doing this as needs be, although eventually it will stop based on whatever the maximum number of child process is set to, so as not to totally overload your system.
When the number of requests finally starts to drop down once more, to recover resources Apache will start to kill off any child processes now deemed as unnecessary, eventually getting back to the starting level. So it is that Apache is able to comfortably deal with the ebb and flow of demand without unduly choking.
So there is a lot of good to be had from how Apache works when using mod_wsgi in this mode. At the same time however a number of issues also arise.
The first is that the child processes generally run as a special non privileged user. This means that this user needs to be given access to the files which make up an application or which the application in turn needs to read. This user will also need to be given special access to files or directories the application needs to write any data to. Because Apache may be used to host a number of different applications, it means however that all applications can read files making up any other application and make changes to any writable directories or files used by those other applications which are writable to the user.
The second problem is that although in mod_wsgi distinct Python sub interpreters are used to keep different applications separate, this isn't fool proof. Problems can arise where different applications attempt to use different versions of a particular C extension module, as Python only loads C extensions once for the whole process and not separately for each sub interpreter. Thus, which application gets to load their version first wins out and when subsequent applications load it, they will get the correct version of any Python wrappers, but that code may not match the API provided by the C extension module itself.
A third more serious problem however, is that since Python supports C extension modules, it would be possible for someone with nefarious intent to load a module which gives them access to other sub interpreters data and code thereby bypassing the firewalls put in place by mod_wsgi. Such a module would thus allow them to spy into another application, change how it works or steal private information. A very wily hacker may take this even further and poke into the internals of Apache, possibly inserting special handler code into various phases of the request processing cycle, or modifying configuration data used by other modules.
All up, what this means is that although mod_wsgi goes to great lengths to try and ensure that applications can't interfere with each other, it can't be made completely bullet proof. As a result, 'embedded' mode of mod_wsgi would only be suitable in situations where the owners of the web servers are also the owners of the applications running under it. At no time would it ever be recommended that 'embedded' mode would be suitable as a basis for running applications owned by different users in a web hosting environment.
Do note that these problems aren't the fault of mod_wsgi specifically. Some derive from the way Apache works and others from how Python works. Using mod_python as an alternative will not offer anything better. In fact mod_python actually has more problems due to the open nature of how it hooks into Apache, thus making it easier to modify the behaviour of Apache and potentially access into other applications or steal private information.
Originally the intent in writing mod_wsgi was to only target users who also controlled the web server they were using. As a consequence, these issues weren't specifically seen as being a problem that needed to be countered. During the development of mod_wsgi however, that the existence of mod_wsgi seemed to be raising the hopes of many that a suitable simple solution for commodity Python web hosting might not be far away, meant that it was necessary to look at how one could address the problems. The end result of this was the addition of 'daemon' mode to mod_wsgi.
The main difference between 'daemon' and 'embedded' mode is that in 'daemon' mode the actual application code is not run within the context of the Apache child processes, but within separate daemon processes able to be run as a distinct user. Although there is a performance penalty resulting from having to proxy the request through to the distinct daemon process which is to handle the request, because the application is now isolated into a separate process the problems described above for 'embedded' mode are eliminated.
In the first instance, because the daemon process runs as a distinct user, only that user and not the user that the Apache child processes run as will need access to the Python code files that make up the application. The same applies to writable directories or files with them only needing to be modifiable by the user that the daemon process runs as. Thus, any actual Python code or private data pertaining to the application is protected and safe from access by other users of the system.
The only files which would still need to be readable to the user that the Apache child process runs as are any static files such as HTML pages, graphics or media files. This is because the main Apache child process would still provide the service of serving up these files.
The problem with C extension modules being global to a process is also eliminated with 'daemon' mode by the fact that multiple daemon processes can be created and each application assigned to their own process. This ability to isolate an application from others by assigning them to different processes, also prevents hackers from interfering with another users running application.
As a consequence, although 'embedded' mode would not be suitable for a server environment where applications owned by different users need to be hosted together, 'daemon' mode has the necessary protections available to make it safe to use in such a hostile environment and thus it would be suitable for shared web hosting environments.
When one looks at mod_wsgi a whole, the result is a package which is suitable both for building both high performance web sites and for commodity web hosting. In both cases configuration is simple, with the one application script file being suitable for use in both modes. A complex Python web application may even make use of both modes at the same time. For example, application components requiring better performance could be run in 'embedded' mode, but with other application components requiring special access privileges, which are memory hungry or processor intensive, being delegated off to distinct daemon processes.
In the end, this combination of abilities makes mod_wsgi a somewhat more flexible platform than other available solutions for developing WSGI applications using Apache. At the same time, because everything is in a single package all managed through Apache, configuration is much simpler and there is no need to install or manage any distinct back end infrastructure.
So, although my original plans didn't envision incorporating a 'daemon' mode, the effort in adding it has been quite worthwhile, with the elusive goal of a way of providing commodity web hosting for Python applications now perhaps being achievable after all. :-)