Tuesday, September 2, 2014

Debugging with pdb when using mod_wsgi.

In the early days of mod_wsgi I made a decision to impose a restriction on the use of stdin and stdout by Python WSGI web applications. My reasoning around this was that if you want to make a WSGI application portable to any WSGI deployment mechanism, then you should not be attempting to use stdin/stdout. This includes either reading or writing to these file objects, or even performing a check on them to try and determine if code is running in a process attached to a TTY device.

The restriction was generally driven by the fact that WSGI adapters for CGI relied on stdin and stdout to communicate with the web server the script was running in. Although such CGI/WSGI adapters could have saved away the original stdin and stdout for their own use and then replaced the original 'sys.stdin' and 'sys.stdout' with working alternatives so users code didn't care, the original example CGI/WSGI adapter in the WSGI specification never did that, so no one as a result thought about the issue and did something about it themselves when implementing CGI/WSGI adapters.

As to what the problem was, the issue was that if any user code decided to use 'print()' to dump out debugging information so it appeared in a WSGI server log, when that WSGI application was hosted using a CGI/WSGI adapter, that debug output would end up in the HTTP response sent back to the client, as stdout is used by a CGI script to communicate with the web server.

So all well and good and I thought I was doing a good thing by encouraging people to write portable WSGI application code. This isn't how users saw things though, they didn't care about such things and because they got an exception when they tried to use stdin or stdout they blamed mod_wsgi and not that what they were doing wasn't portable.

What happened therefore is that documentation for some Python web frameworks and various blog posts started to say that mod_wsgi has these restrictions and/or was broken and here is how you workaround it. The Flask documentation even today still carries such a warning even though it isn't relevant to more recent mod_wsgi versions, with the restriction removed back in mod_wsgi 3.0, which was released on 21st November 2009, almost five years ago.

For some more background on this issue you can read my prior blog post back in 2009 about it. In short though, if you are using:

WSGIRestrictStdout Off

in the Apache configuration file, or using:

import sys
sys.stdout = sys.stderr

in the WSGI script file, you do not need to if using mod_wsgi version 3.0 or later. 

The reason this issue came up in my discussions with people during the hallway track of DjangoCon was because we were discussing the Django debug toolbar and Python debuggers such as pdb.

In the case of pdb, in order for it to work, it needs to have access to the original stdin and stdout attached to your console in order to provide you with an interactive session.

When you remap 'sys.stdout' to 'sys.stderr' in your WSGI script file you are replacing the original stdout with stderr where stderr is always going to be connected to the Apache error log. Any output from pdb would therefore end up in the Apache error log and would not show in your interactive console.

But wait you say, Apache/mod_wsgi runs all the processes which run your actual WSGI application as background processes so how could it work anyway. There is no way at that point that stdin and stdout would still be connected to any console shell and since Apache is generally started as root on system startup, how is that even helpful.

What is little known is that it is in fact possible to run Apache with mod_wsgi in a single process mode where Apache is run in the foreground and where stdin and stdout are attached to your console, allowing you to potentially interact with the process.

If using a standard Apache setup, the steps required are admittedly a bit fiddly to get this running.

The first thing you need to do is if you are using mod_wsgi daemon mode, you have to comment out the mod_wsgi directives which set that up. This then defaults your WSGI application back to running in embedded mode.

The next thing you need to do is if you are using the worker or event MPMs of Apache, you need to change the MPM configuration to only create a single worker thread per process.

Finally, you then need to manually start the Apache server from a shell, giving it the '-DONE_PROCESS'  or -X' option.

/usr/sbin/httpd -X

 If you are on a Linux system, it is possible you will also need to set the 'APACHE_RUN_USER' and 'APACHE_RUN_GROUP' environment variables as well. This is because on some Linux systems, the standard Apache configuration is dependent on these environment variables having been set by the 'apachectl' script. If needing to set them, they should be set to the user and group of the standard Apache user.

Do all that and you can now place in your code:

import pdb; pdb.set_trace()

and when that code is executed you will be thrown into an interactive pdb session where you can interact with your WSGI application. To exit out of the pdb session enter 'cont' and it will continue with the request.

You can find further information about all this in the mod_wsgi documentation about pdb. Do be warned that the WSGI middleware described there isn't strictly correct and only intercepts an exception which occurs when creating the iterable to be returned, which for a generator is even before your code gets executed. It may therefore be best to stick with 'pdb.set_trace()' for now until I fix that WSGI middleware.

So it is possible to use pdb with WSGI applications hosted using Apache/mod_wsgi, but the steps do make it a bit onerous.

This is the point where some of the more recent work I am doing on mod_wsgi makes this more practical.

With the newer mod_wsgi express variant you don't have to worry about the Apache configuration, making it an ideal way to run up Apache/mod_wsgi in a development environment.

For this specific use case of wanting to run pdb, the next version of mod_wsgi (4.3.0), supports a new option for mod_wsgi express which allows it to be run in this single process mode for you automatically, thus making it easier to use pdb to debug a WSGI application running under Apache/mod_wsgi.

2 comments:

Unknown said...

Thank you for this article. Your posts have been tremendously helpful to me. mod_wsgi is great!

Graham Dumpleton said...

If you are interesting in using pdb to debug your application, you would be much better off using mod_wsgi-express.

mod_wsgi-express start-server script.wsgi --debug-mode --enable-debugger

It worries about all the stuff for getting pdb running and can be done on the command line distinct from your main Apache.