Wednesday, March 31, 2010
Going To Singapore PyCon.
It is still a bit over 2 months away, but have organised myself a holiday for Singapore the second week of June. This just so happens to coincide with the PyCon conference being held in Singapore this year. So, as well as getting some shopping done with the family and eating all the great Singapore food, I intend to drop in on the conference talks and also maybe the tutorials. I will be in Singapore for the whole week, so if anyone is interested to meetup and talk about mod_wsgi related stuff, let me know.
Sunday, March 28, 2010
An improved WSGI script for use with Django.
Far too often one sees complaints on the Django users list and #django IRC channel that code that worked fine with the Django development server doesn't work with Apache/mod_wsgi. For a number of those cases you will see the accusation that Apache/mod_wsgi must be wrong or is somehow broken. The real reason however is that when using the Django development server various setup steps are carried out which aren't performed if you use the WSGI handler interface provided by Django. The available Django documentation on using the WSGI interface doesn't however go into a great deal of technical detail. The end result is that it isn't obvious what needs to be done when using the Django WSGI interface so as to have the process environment setup to be equivalent to the Django development server, therefore guaranteeing trouble free porting of an application to a production environment using Apache/mod_wsgi, or any other WSGI hosting mechanism.
The purpose of the post is to explain what actually happens when you use the Django development server as far as the setup of critical parts of the process environment, and compare that to what happens if you use the Django WSGI interface in the manner as described in the Django documentation. From that I will describe an alternate way of setting up and configuring Django for use with the supplied WGSI interface so as to better replicate how things are done within the Django development server.
To help track down what happens we will instrument the 'settings.py' from a Django site to include the following.
import sys, os
print "__name__ =", __name__
print "__file__ =", __file__
print "os.getpid() =", os.getpid()
print "os.getcwd() =", os.getcwd()
print "os.curdir =", os.curdir
print "sys.path =", repr(sys.path)
print "sys.modules.keys() =", repr(sys.modules.keys())
print "sys.modules.has_key('mysite') =", sys.modules.has_key('mysite')
if sys.modules.has_key('mysite'):
print "sys.modules['mysite'].__name__ =", sys.modules['mysite'].__name__
print "sys.modules['mysite'].__file__ =", sys.modules['mysite'].__file__
print "os.environ['DJANGO_SETTINGS_MODULE'] =", os.environ.get('DJANGO_SETTINGS_MODULE', None)
Now, for my example the Django site is located at '/usr/local/django/mysite'. To run the Django development server I now from within that directory run 'python manage.py runserver'. The result of that is the following.
__name__ = settings
__file__ = /usr/local/django/mysite/settings.pyc
os.getpid() = 3441
os.getcwd() = /usr/local/django/mysite
os.curdir = .
sys.path = ['/usr/local/django/mysite', ...]
sys.modules.keys() = [..., 'settings', ...]
sys.modules.has_key('mysite') = False
os.environ['DJANGO_SETTINGS_MODULE'] = None
__name__ = mysite.settings
__file__ = /usr/local/django/mysite/../mysite/settings.pyc
os.getpid() = 3441
os.getcwd() = /usr/local/django/mysite
os.curdir = .
sys.path = ['/usr/local/django/mysite', ...]
sys.modules.keys() = [..., 'mysite.settings', ..., 'mysite.sys', 'mysite.os', ..., 'mysite', ..., 'settings', ...]
sys.modules.has_key('mysite') = True
sys.modules['mysite'].__name__ = mysite
sys.modules['mysite'].__file__ = /usr/local/django/mysite/../mysite/__init__.pyc
os.environ['DJANGO_SETTINGS_MODULE'] = mysite.settings
__name__ = settings
__file__ = /usr/local/django/mysite/settings.pyc
os.getpid() = 3442
os.getcwd() = /usr/local/django/mysite
os.curdir = .
sys.path = ['/usr/local/django/mysite', ...]
sys.modules.keys() = [..., 'settings', ...]
sys.modules.has_key('mysite') = False
os.environ['DJANGO_SETTINGS_MODULE'] = None
__name__ = mysite.settings
__file__ = /usr/local/django/mysite/../mysite/settings.pyc
os.getpid() = 3442
os.getcwd() = /usr/local/django/mysite
os.curdir = .
sys.path = ['/usr/local/django/mysite', ...]
sys.modules.keys() = [..., 'mysite.settings', ..., 'mysite.sys', 'mysite.os', ..., 'mysite', ..., 'settings', ...]
sys.modules.has_key('mysite') = True
sys.modules['mysite'].__name__ = mysite
sys.modules['mysite'].__file__ = /usr/local/django/mysite/../mysite/__init__.pyc
os.environ['DJANGO_SETTINGS_MODULE'] = mysite.settings
Two things stand out from this. The first is that there are two different processes involved and the second is that the same settings file is imported twice by each process but using a different Python module name in each instance.
The existence of the two processes is explained by the fact that when running the Django development server it has a reload option whereby if changes are made to any code, that it will automatically restart the application. To do this it is necessary to have a supervisor or monitor process and an actual worker process. Each time that a code change is made and detected, the worker process is killed off and the supervisor process will create a new process to replace it. In that way the worker process, which is what is accepting the HTTP requests and handling them, will always have the most up to date code.
That the settings file is imported more than once is a bit more tricky and it is likely that the majority wouldn't even know that this occurs. Possibly people would only notice if they had placed debugging statements in the settings file like above, or they had added code to it other than simple variable settings and the code performed an action which was cumulative and thus a problem occurred through the action occurring twice.
So, how does the settings module get imported twice?
The first time it gets imported is when you run the 'python manage.py runserver' command, the 'manage.py' file will import the settings file as the 'settings' module from the same directory.
Worth noting at this point is that 'sys.path' includes the path '/usr/local/django/mysite', which is the directory the 'manage.py' and 'settings.py' are located in. This appears in 'sys.path' as it is standard behaviour of Python to add the directory that a script is contained in to 'sys.path' when a script is passed to Python to execute.
Moving on, after having imported the settings module, the 'manage.py' file will eventually call 'django.core.management.execute_manager()' where the argument supplied is the reference to the 'settings' module it just imported. The code for the 'execute_manager()' functions is as follows.
def execute_manager(settings_mod, argv=None):
"""
Like execute_from_command_line(), but for use by manage.py, a
project-specific django-admin.py utility.
"""
setup_environ(settings_mod)
utility = ManagementUtility(argv)
utility.execute()
The first function called by this is 'setup_environ()'. This function does two important things.
The first thing the 'setup_environ()' function does is set the 'DJANGO_SETTINGS_MODULE' environment variable. Where as the settings module was originally imported as 'settings', the environment variable is instead referenced as a sub module of the package which is the Django site. Thus, instead of 'settings' it is referenced as 'mysite.settings' in this example.
The second thing that is done is that the parent directory of the site is added to 'sys.path'. That is, where the site directory is '/usr/local/django/mysite', the directory '/usr/local/django' is added. Having done that, the site package root is imported. For this example, this means that 'mysite' is imported. Immediately after this has been imported however, the directory which was added, ie., '/usr/local/django' in this case, is immediately removed from 'sys.path'.
After having does this initialisation, the 'execute_manager()' function creates an instance of the Django 'ManagementUtility' class. Control is then handed off to this class by calling the 'execute()' method.
Delving down into the 'ManagementUtility' class, the next important function to be called is one called 'get_commands()' in the 'django.core.management' module.
What this function does is come up with a list of all the possible management commands. These can be management commands that are supplied as standard with Django, such as 'runserver', or can be management commands associated with installed Django applications for the site as listed in the 'INSTALLED_APPS' variable of the settings module.
To get the 'INSTALLED_APPS' variable however, it has to load the settings module. To get this the 'django.conf' module is imported and a global object called 'settings' within that module is accessed. That object isn't however the settings module itself, but an instance of the 'LazySettings' class.
This 'LazySettings' object acts as a wrapper around the actual settings module. So, it does still import the settings module, but also provides a mechanism for user code to configure various settings variables such that they will override those from the actual settings module.
Before it can do that however, it still has to import the original settings module. When it does this it ignores the fact that it was originally imported by the 'manage.py' command and instead loads the settings module based on the name which is recorded in the 'DJANGO_SETTINGS_MODULE' environment variable which was set above by the 'setup_environ()' function.
It is this section of code where the notorious error 'Could not import settings '%s' (Is it on sys.path? Does it have syntax errors?)' comes from that so often afflicts people using mod_python and mod_wsgi when the Python module search path isn't set up correctly or the Apache user doesn't have the appropriate permissions to read the file.
Anyway, this is where the second import of the settings module is triggered.
We aren't quite done however, as there are a couple of other important things that the initialisation of the settings object does which are worth mentioning.
The first is that if the 'INSTALLED_APPS' uses a wildcard to refer to a group of applications contained with a module, then in order to enumerate what those applications are, the module containing them is imported to determine where the module is located. That directory is then scanned for sub directories and each of those is taken to be an actual application.
The second thing that is done is that if the 'TIME_ZONE' variable is set within the settings module, then the 'TZ' environment variable is set and the 'time.tzset()' function called.
Okay, we went on a little side trip there to understand what happens when the settings module is imported. As pointed out though, this had to be done to get that list of installed applications, including those enumerated when a wildcard was used. Having got that list, it is then possible to generate the list of commands that those installed applications provide.
To round out the picture, once the list of commands is generated it will for the case of 'python manage.py runserver' load the command for 'runserver'. This will result in 'django.core.management.commands.runserver' module being imported and control passed to it. The effect of that will be in this case to run the Django development server and start serving HTTP requests.
Although the command module for the 'runserver' command principally deals with creating an instance of the development web server, it also does a couple of other configuration related steps which may be significant when we come to talk about Apache/mod_wsgi later.
from django.conf import settings
from django.utils import translation
print "Validating models..."
self.validate(display_num_errors=True)
print "\nDjango version %s, using settings %r" % (django.get_version(), settings.SETTINGS_MODULE)
print "Development server is running at http://%s:%s/" % (addr, port)
print "Quit the server with %s." % quit_command
# django.core.management.base forces the locale to en-us. We should
# set it up correctly for the first request (particularly important
# in the "--noreload" case).
translation.activate(settings.LANGUAGE_CODE)
The first of these is that the models used by the application are validated. The second is that support for language locale is activated.
So, we have worked out why it is the settings file was imported twice. We have also worked out why the package root for the site has been able to be imported even though after the fact the parent directory of the site isn't listed in 'sys.path'. The analysis also shows that various other side effects can occur, including importing of parts of the application, validation of data models, setting up of the language locale and time zone setting.
Now let us compare all this to what happens when the WSGI interface is used under Apache/mod_wsgi.
The guidance has always been that all you really needed to do for WSGI was to use:
import os
import sys
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
That in itself is not necessarily going to be sufficient. This is because Python isn't going to know where to find 'mysite' when it goes to import the settings module unless it had been installed in the standard Python 'site-packages' directory, which is unlikely. Thus, what you really need is as follows.
import os
import sys
sys.path.insert(0, '/usr/local/django')
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
That is, we have inserted the parent directory of the site into 'sys.path'.
The result now if we are to startup Apache/mod_wsgi and make a request will be the following.
__name__ = mysite.settings
__file__ = /usr/local/django/mysite/settings.pyc
os.getpid() = 3733
os.getcwd() = /Users/grahamd
os.curdir = .
sys.path = ['/usr/local/django', ...]
sys.modules.keys() = [..., 'mysite.settings', ..., 'mysite.os', ..., 'mysite.sys', ..., 'mysite', ...]
sys.modules.has_key('mysite') = True
sys.modules['mysite'].__name__ = mysite
sys.modules['mysite'].__file__ = /usr/local/django/mysite/__init__.pyc
os.environ['DJANGO_SETTINGS_MODULE'] = mysite.settings
What is different?
First off there is only one process, but that is simply because there is no supervisor or monitor process to handle reloading like with the Django development server.
The second difference is that the settings module is only imported once and that is from the site package and not as a top level module. That is, it is imported as 'mysite.settings' and not 'settings'.
The final difference is that 'sys.path' lists '/usr/local/django' where as in the Django development server it listed '/usr/local/django/mysite'.
Remember though that with the Django development server the directory '/usr/local/django' was added to 'sys.path' but only long enough to have imported the 'mysite' package root for the site.
The consequence of the directory having been removed when using the Django development server, is that if you wanted to import Python packages from a sibling directory to the site directory, you would need to explicitly add it to the 'PYTHONPATH' variable in the user environment from which 'python manage.py runserver' was run.
In the case of using the WSGI interface directly as shown, the directory has to be included such that the settings module can be imported. Although by being added explicitly, it does mean that you have to be careful about what is contained in any sibling directories if you hadn't explicitly added the directory when using the Django development server. This is because those sibling directories will be considered when doing later module imports where as with Django development server they wouldn't.
The bigger issue in respect of the differences between 'sys.path' for each hosting mechanism is that under Apache/mod_wsgi the directory '/usr/local/django/mysite' is missing.
Why this causes a problem is that when using the Django development server people become used to being able to reference parts of a site without needing to use the site package prefix. This is especially problematic when references are within strings in the URL mappings contained in 'urls.py' but can also occur for Python module imports as well where they are within a further subdirectory of the site. Where either is used, the code will fail if the site is migrated to run under Apache/mod_wsgi.
Obviously, the way around that is in the WSGI script file used for Apache/mod_wsgi to also add the directory '/usr/local/django/mysite' to 'sys.path', thus yielding the following.
import os
import sys
sys.path.insert(0, '/usr/local/django/mysite')
sys.path.insert(0, '/usr/local/django')
os.environ['DJANGO_SETTINGS_MODULE'] = 'mysite.settings'
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
It should be noted that having to do this highlights an arguable flaw in what Django permits when using the development server. This is because it is now possible to import the same module file via two different names. If naming isn't done consistently, you could end up with multiple copies of the module in memory and with any code within it being executed twice. If any code references global variables within the module, then different parts of the code may not end up accessing the same variable.
These points aside, is there anything else different between the Django development server and Apache/mod_wsgi?
The obvious ones are that because the Django development server is never run, that the validation of the models and the setup of the language locale are never done.
More significant is that the Django 'ManagementUtility' class is never created and the list of available management commands is never calculated. This means that any implicit actions resulting from that are never performed.
The main side effect is that of initialisation of the settings object which wraps the settings module. As described, this sets up the timezone but also can cause additional module imports to be done as the installed applications have to be loaded when wildcards are used.
Because the settings object is so key to the operation of Django, it will still be initialised at a later point when required. Even so, there is still a potential difference due to the order in which things are done.
Even more problematic than the order though is the context in which the settings modules is finally loaded and the associated initialisation performed.
For the Django development server only a single thread is used and all the initialisation is done up front before any requests are handled.
In the case of Apache/mod_wsgi where particular configurations can be multithreaded, the initialisation is only done within the context of the handling of the first request. While this is being done it is possible that a concurrent request could also be occurring.
If there is anything about the Django core code or the way in which user code makes use of it which is not completely thread safe, then there is a risk that multithreading could result in aspects of the settings and other global data being accessed before it is properly initialised from concurrent requests other than the one that got to trigger the initialisation.
One can only speculate on how such problems may manifest, but certainly it could explain a number of the odd problems people see when running under Apache/mod_wsgi, especially where the application can be under load from the moment that a process gets restarted.
I am starting to run out of steam with this blog post, so lets just jump straight to a possible solution. This is in the form of the alternate WSGI script file contents below.
import sys
sys.path.insert(0, '/usr/local/django/mysite')
import settings
import django.core.management
django.core.management.setup_environ(settings)
utility = django.core.management.ManagementUtility()
command = utility.fetch_command('runserver')
command.validate()
import django.conf
import django.utils
django.utils.translation.activate(django.conf.settings.LANGUAGE_CODE)
import django.core.handlers.wsgi
application = django.core.handlers.wsgi.WSGIHandler()
What this is doing is duplicating the way that Django development server is set up.
Key is that because an import lock is held when the WSGI script file is first imported, everything is done up front, effectively in the context of a single thread before any concurrent requests can start executing. This avoids all the problems with multithreading.
We also don't even set 'DJANGO_SETTINGS_MODULE' environment variable and instead leave it up to Django to set it just like with the Django development server. This does mean that the site directory is added to 'sys.path' and that the settings module has to be explicitly imported, with a subsequent second importing of the settings module by a different name, but this is exactly what the Django development server does. The temporary adding of the parent directory for the site into 'sys.path' to import the site package root is still even done by Django just as with the Django development server.
Further, the management command infrastructure is initialised and the loading of commands triggered by fetching the command object for 'runserver'. The validation of models is even triggered along with initialisation of the language locale.
After all that, we create the WSGI application entry point as normal and we are done.
All up, this should be nearly identical to what happens when the Django development server is used. About the only difference is that if 'python manage.py runserver' is used that the current working directory would usually be the site directory. Under Apache/mod_wsgi the current working directory is going to be something else. But then, you shouldn't ever be using relative paths for file system resources anyway.
Anyway, my brain is about fried now.
If you know anything about the Django internals, I hope you find this interesting and will validate if my analysis is correct.
If you are having problems with porting code between the Django development server and Apache/mod_wsgi, then perhaps you will give this alternate WSGI script file contents a go and see if things then work without problem.
Once I get sufficient feedback and validation that this is a better solution for the WSGI script file, then I will update the integration guide for Django on the mod_wsgi site.
What would be nice though is if Django simply supplied a WSGI application entry point that could be supplied the site directory and which would internally simply ensure that everything is done correctly so that it behaves the same as the Django development server.
Thursday, March 11, 2010
Dropping support for Apache 1.3 in mod_wsgi.
The Apache Software Foundation has finally put Apache 1.3 out to pasture. This has been a long time coming and quite overdue in my mind. Even though Apache 1.3 is quite antiquated, mod_wsgi has continued supporting it at the same time as supporting newer versions of Apache. For the record, mod_python gave up supporting Apache 1.3 about six years ago, but then, mod_python hasn't seen a release in three years and development on it is arguably dead at this point.
With this decision having been made by the ASF, it is a good time to consider whether it is necessary to keep trying to support Apache 1.3 in newer versions of mod_wsgi.
In making this decision, one has to consider that when running mod_wsgi under Apache 1.3 only embedded mode is supported, it is not possible to use daemon mode. Further, all the significant changes in most recent versions of mod_wsgi have related to daemon mode. In fact, there isn't really anything else in the way of functionality that could be added which relates to embedded mode which would provide anything above what can currently be done when using Apache 1.3.
As such, I can't see any reason for continuing to provide support for Apache 1.3 in future versions of mod_wsgi. This doesn't mean you cant use mod_wsgi with Apache 1.3, just that you would be directed to use whatever is the latest from the mod_wsgi 3.X branch.
Will users of Apache 1.3 loose out by such a decision? I somewhat doubt it. At this point mod_wsgi has shown itself to be very stable, more so when embedded mode is used, and certainly more stable than mod_python ever was. Anyway, if some major issue did ever come up which a user/company really wanted fixing, then I am not going to ignore it if they are prepared to throw money my way. ;-)
So, before I make the final decision and wield the axe to extricate the code specific to Apache 1.3 from mod_wsgi, does any one have any valid objections, or alternatively, comments in support of such a move?
Subscribe to:
Posts (Atom)