Friday, March 30, 2007

Reloading of Python code into web applications.

One of the major complaints with Python web frameworks is the need to restart the application whenever changes are made to the code. To try and avoid restarts or to make it easier to manage, different Python web frameworks and hosting technologies employ a number of different techniques. These include reloading Python modules into the existing running application, using a supervisor process to monitor for code changes and restart the actual server process automatically, or simply providing a means for a normal user, as opposed to a super user, to completely restart the web server.

As far as reloading Python modules into the existing running application goes, the most developed example of this technique is the module importer present in mod_python 3.3. The module importer in mod_python is different to other more simplistic module importers as it tracks the parent/child relationships between imported modules. This information means the module importer can determine that it needs to reload a top level request handler module even though the module itself hasn't changed but where some other module it is dependent on has changed.

Whereas module importers normally keep modules in sys.modules and they must all be uniquely named, the mod_python module importer also avoids a whole host of problems by not holding the web applications modules in sys.modules but in a separate caching system whereby they are identified by the absolute path name of the modules code file. This means it is possible for the same name to be used for a code file in multiple directories without the need to artificially hold modules in a Python package to avoid name space collisions. When a module is reloaded it is also not reloaded on top of the existing module as is done for modules in sys.modules, therefore eliminating problems with module pollution when attributes are deleted from the code but still exist in the loaded module, as well as various multi-threading problems which can arise due to reloading new code and data on top of existing code and data that may be getting used at the time.

All these features do come with some cost in performance. More of an issue though is that the importer is bound to mod_python and thus is only of use in web applications which themselves bind closely to the mod_python API. As such, although mod_python has this quite sophisticated module importer, it is absolutely useless if you are running some WSGI based application on top of mod_python as by the very nature of what WSGI is, such an application will only use features that can be hosted on top of multiple hosting technologies so can't make use of it.

One could separate the module importer from mod_python and make it a standalone package, but even then you are unlikely to see it adopted by any of the existing Python web frameworks. This is because these Python web frameworks already have their own way of doing things, and even if the module importer may be a better solution, to use the module importer would more than likely break compatibility of older applications and require users to perform some restructuring of their code. Use of the module importer may also not be able to be made totally transparent and thus one would be forcing new concepts on the user that they have to deal with. Finally, as good as the mod_python module importer may be, it still isn't going to be suitable in all situations, and thus you will still end up with some subset of modules that cannot be safely reloaded into an existing running application.

So, all in all it is very unlikely that one will see any form of sophisticated module importer system for reloading modules on the fly that works properly and is used as some sort of standard across the various Python web frameworks. Instead one will continue to see half baked solutions which don't really work. This will not necessarily be from lack of trying on the efforts of the people implementing them, it is just that reloading code safely into Python is hard for the general case, if not impossible.

Given the above, the only real practical solution that will work with all Python web frameworks is to throw away the interpreter contents and start over whenever one needs to pick up any code changes. To date this has always meant killing off the whole process and restarting it. This brute force approach may be fine where you manage and control you own web hosting environment, but isn't really practical for all those who rely on shared web hosting implemented using Apache for their web services. This type of service is problematic because the company managing the web server is hardly likely to be amendable to a constant stream of user requests to restart the web server every time they make a change to their code.

Packages for Apache such as mod_fastcgi, mod_scgi and mod_proxy_ajp have tried to address this problem by moving the actual web application into a specialised back end process and merely using Apache as a proxy for requests. Again using proxying, one could even use another web server as the back end process.

By using a back end process in this way a number of problems can be solved. The first is that because the back end process can be dedicated to a specific user and run as that user, control for restarting it can then be handed to the user. The second problem that is solved is that any code is no longer executing as the user that Apache would run as. This eliminates problems with user code accessing parts of the file system they potentially shouldn't and with user code interfering with a different users code as they can be given their own dedicated file system space to write to.

Although these provide the control that a user needs, a solution that doesn't just use another Apache server instance as the back end process is going to be foreign to most web hosting companies and can as a result be be hard for them to setup. This is because not only does one need to build and install the required Apache module, there are multiple choices as to how to implement the back end parts of the system and it may not be obvious as to why one should be used over another. This isn't made any better by virtue of a lack of good solid documentation and less than adequate support for running and managing such systems. For a web hosting company that wants something that is quick and cheap to get working and requires little management they currently appear not to be particularly attractive solutions.

In terms of how else one can solve this problem, the only other alternative for Python is that instead of killing off the whole process which is hosting a web application, one could just destroy the particular interpreter instance within the running process. If one is to pursue this approach, what is required though is the ability to be able to create and control additional Python sub interpreters and be able to run the whole web application or parts of it, inside of a sub interpreter rather than the main Python interpreter. Having that, it would then be possible to kill off a particular sub interpreter thereby destroying that part of the web application and recreate it in a new sub interpreter using the new code base.

At present your standard Python runtime doesn't support such manipulation of Python sub interpreters. Using Python sub interpreters is not new though, with mod_python using sub interpreters to provide separation between web applications or parts of them. What mod_python doesn't allow though is for new sub interpreters to be able to be created and used from a web application itself in some way.

That there is currently no way of manipulating Python sub interpreters from within a Python application doesn't mean it can't be done though. All that is required is a C extension module that provides a means of creating the sub interpreters and then mediating a way of making a call from one sub interpreter to another. Although it sounds simple in practice, there are various gotchas in getting it to work correctly. It also potentially opens up a big can of worms due to issues that can arise with sharing objects between sub interpreters as well as safely managing the destruction of interpreters.

That is it for now though. In the next blog installment I'll go more into this idea of using disposable sub interpreters within Python web applications, explaining the various problems and also showing examples of some working code. Will also discuss how one could constrain the idea so as to make it a moderately safe technique to make use of in mod_wsgi and possibly other WSGI application stacks.

No comments: