As well as source code archive, binary version of module for Apache 2.2/Python 2.6 on Windows is available. Both can be downloaded from:
Monday, May 11, 2009
Version 2.5 of mod_wsgi is now available.
This is a really minor update addressing a MacOS X build issue and reverting a wrongly implemented change in mod_wsgi 2.4. Details can be found at:
Friday, May 8, 2009
Blocking requests and nginx version of mod_wsgi.
My own mod_wsgi module for Apache has been available in one form or another now for over two years. What many people don't know is that there is also a mod_wsgi module for nginx, although development on it has appeared to have stalled with no changes having been made for quite some time.
The nginx version of mod_wsgi borrows some code from my original Apache version, but obviously since the internals of Apache and nginx are very different, the main parts of the code which interface with the web server are unique. Although I condoned use of the source code, I do wish I had insisted from the outset that it not be called mod_wsgi due to the confusion that has at times arisen.
Although development on the nginx version of mod_wsgi appears to no longer be happening, this isn't stopping people from using it and many are quite happy with it. The question is whether they really understand anything about how nginx works and the shortcomings in how nginx and mod_wsgi work together.
Admittedly the author of the mod_wsgi module for nginx has been up front in pointing out that because nginx is asynchronous, and with WSGI not designed for such a system, that once the WSGI application is entered all other activity by the web server is blocked. The recommendation resulting from this is that static files should not be served from the same web server. Use of multiple nginx worker processes is also suggested as a way of mitigating the problem.
All well and good, but unfortunately there is a bit more to it than that and although people may think that by using nginx underneath they are going to get the fastest system possible, it is debatable as to whether for real world applications that is going to be the case.
To understand the problems with the mod_wsgi module for nginx it is helpful to first look at how Apache works.
When using Apache on UNIX systems there is a choice of multiprocessing modules (MPM). The two main options are the prefork and worker MPM. In both cases multiple processes are used to handle requests, but where as prefork is single threaded, worker uses multiple threads within each process to handle requests. On Windows the winnt MPM is the only option. It is multithreaded as well, but only a single process is used.
The important thing about Apache, no matter which MPM is used, is that a connection is not accepted from a client until a process or thread is available to actually process and handle the request. This means that if a prefork MPM process is busy, or if all threads in a worker MPM process are busy, then that process will not accept any new connections until it is ready to. This ensures that any new connections will always be handled by another free process and you will not end up with a situation where a connection is accepted by a process, but it isn't actually able to process it at that time.
Because nginx is an asynchronous system based on an event driven system model, things work a bit differently. What would normally occur is that connections would be accepted regardless, although nginx does enforce an upper limit as defined by 'worker_connections' setting. The work on processing all those concurrent requests would then be interleaved with each other by virtue of work being stopped on one and changed to another when the first would otherwise block waiting on network activity or some other event.
For serving of static files an asynchronous system is a good model, allowing more concurrent requests to be handled with less resources. As to mod_wsgi for nginx however, the picture isn't so rosy.
The basic problem is that you are embedding within an asynchronous system a component with is synchronous. That is, the WSGI application doesn't cooperate by giving up control when it would block on waiting for some event to occur. Instead it will only hand back control when the complete request is finished. This means for example that if a request takes one second to complete, for that one second, the web server will not be able to do any processing on behalf of the concurrent requests that the process has already accepted.
The recommendation as mentioned above is thus to use multiple nginx worker processes by setting 'worker_processes' in configuration and push handling of static files onto a completely different server instance. In practice this is not enough however and you still risk having some percentage of requests being blocked while some other request is being handled.
The reason this occurs, is that the synchronous nature of a WSGI application effectively means that mod_wsgi for nginx is not much different to using single threaded prefork MPM with Apache. The big difference though is that nginx will greedily accept new connections when it should only really accept one at a time commensurate with how many concurrent connections it can truly process through the WSGI application at the same time.
End result is that all those connections which it greedily accepts between calls into the WSGI application will be processed serially and not concurrently. So, if you get a request which takes a long time, all those other requests will block.
The use of multiple nginx worker processes is supposed to mediate this, but in practice it may not. This is because new connections aren't going to get distributed out to the worker processes evenly. Instead you may actually see a tendency for the last active process to get precedence, as that process will already been in a running state and thus is quicker to wake up and detect that a new connection is pending. As a consequence, most of the time the new connection will be accepted by the same worker process rather than being accepted by another.
If you are running a finely tuned Python web application where all requests are well under one second you may not notice any delays as a result of individual requests blocking, but if the handling of some requests can take a long time, such as a large file transfer, then the effect could be quite dramatic with distinct users seeing quite noticeable delays in getting a response to their own unrelated requests.
Now I am sure the author of mod_wsgi for nginx understood all this, but the very little documentation available for it doesn't really say much. In some respects one can get an impression from the documentation that only static file serving is affected by the problem, whereas any concurrent requests, including those for the WSGI application, are affected. Use of multiple nginx worker processes may help, particularly on a machine with multiple processors, but the benefit may be quite limited.
Overall I am not sure I would want to trust a production application to mod_wsgi for nginx due to the unpredictability as to whether requests are going to be blocked or not. As much as Apache may have a larger memory footprint due to use of distinct processes and/or threads to handle requests, at least you are guaranteed that when a connection is accepted, that it will be processed promptly.
In respect of the greater memory usage of Apache, it also needs to be reiterated that the additional memory overhead of Apache is usually going to be a relatively small percentage when viewed in respect of the overall memory usage of today's even fatter Python web applications. So, anyone who likes to disparage Apache by saying it is fat, is living in a bit of fantasy land when it comes to hosting Python web applications. It simply isn't the big deal they like to make out it is. It often actually just shows they don't know how to configure Apache properly.
Finally, to those people searching for the elusive fastest web server in the world, I really suggest you give up. That isn't going to be where the bottleneck is in your application anyway. So, rather than wasting time trying to work out which system may be faster, just choose one, any one, and get on with actually optimising your application, database and caching systems instead. Work on those areas and you will see much more dramatic gains than the few piddly percent difference in request throughput you may get from using a different web hosting mechanism.
Tuesday, April 28, 2009
Python 3.0 support in mod_wsgi to be disabled.
I have been hanging off releasing mod_wsgi 3.0 due to uncertainties in how WSGI 1.0 specification should be implemented in Python 3.0. Despite a number of discussions on Python WEB-SIG about it, there has never really been any final consensus on the issue. This isn't helped one bit by the fact there is no formal process for agreeing, by way of vote or otherwise, on changes or amendments to the WSGI specification. Thus, all the conversation just amounts to a lot of hot air at the end of the day as nothing ever gets agreed upon.
As a result, I am going to move ahead with releasing mod_wsgi 3.0, but am going to disable support for Python 3.0. I will not remove the code entirely, but will make it necessary to go through some hoops to allow you to build mod_wsgi with Python 3.0, with suitable large disclaimers that if you use what is there, you will likely have to change your code at a future date when any amendments are agreed upon. In the mean time I will not be supporting the code related to Python 3.0.
It has come to this as it appears that other WSGI adapters attempting to support Python 3.0 are not implementing it the same way. This isn't a complaint against the authors of those other WSGI adapters as they are in the same boat as I am. The real problem is that there is no WSGI specification which covers Python 3.0. It is already bad enough that the WSGI 1.0 specification has various areas that aren't well defined or which are too restricting, to the extent that many of the major frameworks possibly don't even adhere to it. So, one can see this as being my protest at the lack of any formal processes for the development of the WSGI specification.
And before you have a go at me and say then that I should instigate such a formal process, let it be known that I have tried that already and there was no interest. So called consensus was that consensus was sufficient.
Some have suggested to me that I would be effectively setting the standard with what ever I released in mod_wsgi. Well, I don't want that to be the case. Although I wrote mod_wsgi I don't write web applications myself and am not across all the nuances of what would or wouldn't work for Python 3.0. Thus, I rely on the experience of others in helping define what the WSGI specification should be and merely implement that specification. So, no specification, no support.
As a result, I am going to move ahead with releasing mod_wsgi 3.0, but am going to disable support for Python 3.0. I will not remove the code entirely, but will make it necessary to go through some hoops to allow you to build mod_wsgi with Python 3.0, with suitable large disclaimers that if you use what is there, you will likely have to change your code at a future date when any amendments are agreed upon. In the mean time I will not be supporting the code related to Python 3.0.
It has come to this as it appears that other WSGI adapters attempting to support Python 3.0 are not implementing it the same way. This isn't a complaint against the authors of those other WSGI adapters as they are in the same boat as I am. The real problem is that there is no WSGI specification which covers Python 3.0. It is already bad enough that the WSGI 1.0 specification has various areas that aren't well defined or which are too restricting, to the extent that many of the major frameworks possibly don't even adhere to it. So, one can see this as being my protest at the lack of any formal processes for the development of the WSGI specification.
And before you have a go at me and say then that I should instigate such a formal process, let it be known that I have tried that already and there was no interest. So called consensus was that consensus was sufficient.
Some have suggested to me that I would be effectively setting the standard with what ever I released in mod_wsgi. Well, I don't want that to be the case. Although I wrote mod_wsgi I don't write web applications myself and am not across all the nuances of what would or wouldn't work for Python 3.0. Thus, I rely on the experience of others in helping define what the WSGI specification should be and merely implement that specification. So, no specification, no support.
Monday, April 20, 2009
Accreditation For Python Web Hosting
I have already described the need for better Python web hosting. Some of the comments in that discussion make me wonder if we need some sort of accreditation for Python web hosting. After all, there is a huge difference between a web hosting company who only offers CGI and whose only goal is to maximise profit by cramming as many unsuspecting users into one machine as possible, and a web hosting company who consciously regards quality of service as being equally or more important, and as such offers a much higher quality of hosting than just CGI, with a ratio of users to machines which also benefits the users and not just themselves.
So, maybe one of the things that could come out of any project to improve the quality of Python web hosting, is a checklist of what would be regarded as the minimum criteria to be regarded as a provider of quality Python web hosting services. If the Python community saw this as worthwhile, maybe the Python Software Foundation itself might want to give it its blessing. In order to get the accreditation, as well as satisfy the criteria, part of the deal could be that web hosting companies give some sort of donation back to the Python Software Foundation for the right to carry the accreditation.
Obviously trying to run such a program could be fraught with danger and maybe accreditation should be only for some set period before being reviewed, but maybe something to think about.
Friday, April 17, 2009
Improving Commercial Python/WSGI Hosting Options
I'd like to think that through my work with mod_python and mod_wsgi that Python web hosting options have improved, but truth is that neither mod_python nor mod_wsgi (at this stage) are really suitable for mass virtual hosting. As such, for low cost commodity Python web hosting the only real options are still CGI and FASTCGI.
In the case of FASTCGI this usually means mod_fastcgi or mod_fcgid under Apache, and although many web hosting companies do use these modules and so can provide support for Python, they often don't, or the support provided is less than ideal.
In taking the view that support for Python isn't very good, one does have to be careful however. This is because when you read support forums and irc channels, you obviously are only going to see the complaints and the calls for help to get things working. It may well be the case that this is an outspoken minority and the bulk of people are having no problem at all. Either way, there is still a perception that the Python community isn't being well serviced by web hosting companies and that something better is required.
As I have previously described in the mod_wsgi roadmap, the intention is to support features that would allow mod_wsgi to be used in mass virtual hosting, but there is a lot more to it than just providing yet another option that they might be able to use. In fact, there is no real reason why good Python web hosting couldn't be offered using FASTCGI right now.
I tend to think that the real problem is in part one of education. That is, lack of good documentation on how to setup FASTCGI for running Python within a commercial web hosting operation, and a clear indication of what the Python communities expectations are as to what should be available.
Some of the problems which arise are web hosting companies that provide only woefully out of date Python versions, no easy ability to install Python modules/packages, and in the case of FASTCGI, not even providing flup or some other FASTCGI bridge. End result is that although one may be able to use Python, it isn't necessarily easy and a lot of the hard work is pushed onto the user, rather than the web hosting company providing an environment which is easy to use to begin with.
With that in mind I am currently contemplating whether to start up a distinct uber project which has the specific goal of improving commercial Python/WSGI hosting options. This would not be done with the intent of just pushing my separate mod_wsgi software, but would look at all available software and come up with guidelines and other documentation on how best to use whatever is available, including CGI and FASTCGI.
I can also see this going beyond just documentation, with it also producing code libraries and applications. For example, at the moment for someone to host a Python WSGI web application under CGI they need to know about what CGI/WSGI adapters are available. Similarly for FASTCGI you need to know about what FASTCGI/WSGI adapters are are available. That or you need for the Python web application being used to internally somehow support CGI or FASTCGI directly.
Frankly, with WSGI, these days it is pretty stupid for Python web applications themselves to be worried about CGI or FASTCGI. At the same time, the user also should not have to need to know about them either. What would be much better is that no matter what underlying Python hosting mechanism is used, that the web hosting company provide a means of hosting WSGI applications themselves.
As example, when using mod_wsgi all you need to do is provide a WSGI script file which contains an 'application' object as entry point for the WSGI application. That WSGI script can also include any other code required to set up the environment for the WSGI application. There is no reason why this couldn't also be applied to CGI and FASTCGI.
So, instead of a user having to provide a .cgi or .fcgi file, they would provide a .wsgi file. It would then be up to the web hosting company to automatically ensure that the right thing happens.
Obviously, web hosting companies are going to be clueless as how to make that work and this is where one product of the project would be to provide a small set of Python wrapper applications which perform that mapping along with the instructions on how a web hosting company would integrate that into their systems. This would therefore need to include guidelines on how to set up Apache, including how to integrate it into suexec or cgiwrap as appropriate.
One of the problems that this wrapper application can solve is fixing up WSGI variables like SCRIPT_NAME and PATH_INFO. At the moment Python web applications often have hacks in them, or the user themselves are forced to have hacks in the WSGI script file, to adjust these variables where they aren't passed through correctly from the web server.
Another problem than that can be solved here is ensuring that logging from Python web applications ends up somewhere where the user can actually see and make use of it. One often sees instances where people are having trouble with something like FASTCGI, but due to how the system is set up, any error messages output when the FASTCGI script is being started disappear, making it really hard to debug problems. Because the wrapper application is in control of loading the WSGI script file, it can ensure that any log files are setup properly. It could even provide a feature to capture the errors and return them in a error page to the browser rather than them going to the log only.
So, that is the dream. In part I need to indirectly do some of the ground work for this in order to work out what features I need to add to make mod_wsgi more useful in a mass virtual hosting setup. It would be nice though if there are others out there who have some measure of passion for seeing Python web hosting options improved contribute as well. Most of all, I would dearly like to get the web hosting companies themselves directly involved.
In respect of dealing with web hosting companies, to date my experiences in dealing with them have not been very inspiring. Where I have actively tried to contact them to try and learn how they run things, so I can work out what features mod_wsgi should provide to make it easy for them to use, they have been quite unwilling to give up any information. Even when web hosting companies have contacted me about mod_wsgi, it seems the contact is coming from managers or sales people and not the technical people. Even at the requests of these same people, their own technical people aren't necessarily forthcoming with the information I really need. Overall it has been quite frustrating to say the least.
Hopefully then if this project were to get off the ground and were seen to have active backing from the Python community, we might be able to make some progress. We may even be able to make web hosting companies see that there is more than just PHP out there.
Right now any feedback you may want to give on the whole idea and whether there is a need for it would be most helpful. Maybe I am barking up the wrong tree and all is actually wonderful after all. As much as I may believe there is a problem here needing to be solved, am sure that existing mod_wsgi users would prefer I concentrate on just mod_wsgi and not worry about all this other stuff. :-)
Saturday, April 11, 2009
Version 2.4 of mod_wsgi is now available.
Version 2.4 of mod_wsgi is a bug fix update. The most important of the bug fixes addresses a response data truncation issue when using wsgi.file_wrapper extension on UNIX with keep alive enabled in Apache.
A number of other issues are also addressed, including memory leaks, configuration corruption and request content truncation. A small number of other minor improvements have also been made.
Because of the issue related to truncation of response data, it is highly recommended that if you are using any prior version of mod_wsgi 2.X with a web application that make use of the wsgi.file_wrapper extension, such as Trac, that you upgrade.
A description of changes in version 2.4 can be found in the change notes at:
http://code.google.com/p/modwsgi/wiki/ChangesInVersion0204
If you have any questions about mod_wsgi or wish to provide feedback, use the Google group for mod_wsgi found at:
http://groups.google.com/group/modwsgi
A number of other issues are also addressed, including memory leaks, configuration corruption and request content truncation. A small number of other minor improvements have also been made.
Because of the issue related to truncation of response data, it is highly recommended that if you are using any prior version of mod_wsgi 2.X with a web application that make use of the wsgi.file_wrapper extension, such as Trac, that you upgrade.
A description of changes in version 2.4 can be found in the change notes at:
http://code.google.com/p/modwsgi/wiki/ChangesInVersion0204
If you have any questions about mod_wsgi or wish to provide feedback, use the Google group for mod_wsgi found at:
http://groups.google.com/group/modwsgi
Friday, April 3, 2009
WSGI and printing to standard output.
If you use WSGI on top of CGI, the WSGI adapter communicates with the web server using standard input (sys.stdin) and standard output (sys.stdout). Available WSGI adapters for CGI do not do anything to try and protect the original sys.stdin and sys.stdout. This means that if you use 'print' to output debug messages for your application, without redirecting 'print' to sys.stderr explicitly within your code, then you will actually screw up the response from your WSGI application.
Although CGI may not be the most popular platform to host WSGI applications, with the intent of trying to promote the cause of writing portable WSGI application code, in mod_wsgi the decision was made to restrict access to sys.stdin and sys.stdout to highlight when non portable WSGI code was being written.
The result of doing this is that when 'print' was used in a WSGI application hosted by mod_wsgi, a Python exception would be raised of the type:
IOError: sys.stdout access restricted by mod_wsgi
This was all done with good intention, but what has been found is that people can't be bothered reading the documentation which explains why it was done and even when they do, they still can't be bothered fixing up the code not to use 'print'. It seems the convenience of using 'print' out weighs the ideal of writing code that may actually work across different WSGI hosting mechanisms.
More annoying is that whenever questions arise about this error on the irc channels, rather than people being told to read the documentation and/or fix their code not to use 'print', voodoo is summoned and they are instead told to use the magic incantation of:
sys.stdout = sys.stderr
Yes this is given as one of the workarounds in the documentation, the other being to disable the restriction using the configuration directive specifically for the purpose, but the only reason the workaround is given is for where you have no choice because you cannot change the code to remove the 'print' statement. People aren't told this though, all they are told is to make that change and effectively ignore the whole issue.
The whole mythology that is developing around this is now getting to the extent that some have been saying that neither 'sys.stdout' or 'sys.stderr' are working in mod_wsgi. The suggestion is starting to come out now that if you want to get any debug output from your WSGI application that you have to use a separate log file of your own creation, optionally hooked up to the 'logging' module. In one case, a BuildOut recipe is explicitly providing an option to define the separate log file that they believe has be used to replace 'sys.stdout' and 'sys.stderr'.
So, what is the real answer? Well, if you care about writing portable WSGI application code, then do not use 'print' by itself, instead redirect it to 'sys.stderr' by writing:
print >> sys.stderr, 'message ...'
This is especially important if you are writing framework libraries or plugins to be used in some other application or by other users. You shouldn't be making an assumption that 'sys.stdout' can always be used. If it is a debug or error message, then use 'sys.stderr' as it is meant to be.
If for some reason you really don't want to care about the issue, then rather than use the magic voodoo above, you should simply disable the restrictions that mod_wsgi puts into place altogether. This is done by putting in the main Apache configuration file:
WSGIRestrictStdin Off
WSGIRestrictStdout Off
Anyway, because of all the contention arising over all of this, in mod_wsgi 3.0 I will be giving up and will be making the restrictions off by default. If you want to write non portable WSGI application, you can quite happily do so. If you do care about portable WSGI application code, then you will be able to optionally reenable the restriction using the same directives above.
Subscribe to:
Posts (Atom)