Thursday, December 18, 2014

Apache/mod_wsgi for Windows on life support.

The critics will say that Apache/mod_wsgi as a whole is on life support and not worth saving. I have a history of creating Open Source projects that no one much wants to use, or get bypassed over time, but I am again having lots of fun with Apache/mod_wsgi so I don't particularly care what they may have to say right now.

I do have to say though, that right now it looks like continued support for the Windows platform is in dire straights unless I can get some help.

A bit of history is in order to understand how things got to this point.

First up, I hate Windows with an absolute passion. Always have and always will. Whenever I even try and use a Windows box it is like my whole body is screaming "stop, don't do it, this is not natural". My patience in having to use a Windows system is therefore very low, even when I can load it up with a decent shell and set of command line tools.

Even with my distaste for Windows I did at one point manage to get mod_wsgi ported to Windows, this was possibly more luck than anything else. A few issues have come up with the Windows support over the years as new versions of Python came out, but luckily there were very few which were Windows specific and I could in the main blissfully ignore Windows and whatever I was doing on more familiar UNIX systems just worked. The Apache ecosystem and runtime libraries helped here a lot as they hid nearly everything about the differences between Windows and UNIX. It was more Python that I had to deal with the little differences.

All my development back then for mod_wsgi on Windows was done with a 32 bit version of Windows XP. I did for a time make available precompiled binaries for two different Python versions, but only 32 bit and only for the binary distribution of Apache that the Apache Software Foundation made available at the time.

As 64 bit platforms came along I couldn't do much to help people and I also never tried third party distributions of Apache for Windows either. Luckily though someone who I don't even know stepped in and started making available precompiled Windows binaries for a range of Python and Apache versions for both 32 bit and 64 bit platforms. I am quite grateful for the effort they put in as it meant I could in the main ignore Windows and only have to deal with it when something broke because a new version of Python had come out.

Now a number of years ago I got into extreme burnout over my Open Source work and mod_wsgi got neglected for a number of years. No big deal for users because mod_wsgi had proven itself so stable over the years that it was possible to neglect it and it didn't cause anyone any problems.

When I finally managed to truly dig myself out of the hole I was in earlier this year I well and truly ripped into mod_wsgi and started making quite a lot of changes. I made no attempt though to try and ensure that it kept working on Windows. Even the makefiles I used for Windows would no longer work as I finally started breaking up the large single monolithic source code file that was mod_wsgi into many smaller code files.

To put in context how much I have been working on mod_wsgi this year, in the 6 1/2 years up to the start of this year there had been about 20 separate releases of mod_wsgi. In just this year alone I am already nearly up to 20 releases. The most recent release at this time is version 4.4.2.

So I haven't been idle, doing quite a lot of work on mod_wsgi and doing much more frequent releases. Now if only the Linux distributions would actually bother to catch up, with some still shipping versions of mod_wsgi from a number of years ago.

What this all means is that Windows support has been neglected for more than 20 versions and obviously a lot could have changed in that time in the mod_wsgi source code. There has also been a number of new Python releases since the last time I even tried to build mod_wsgi myself on Windows.

With the last version of mod_wsgi for Windows available being version 3.5, the calls for updated binaries has been growing. My benefactor who builds the Windows binaries again stepped in and tried himself to get some newer builds compiled. He did manage this and started making them available.

Unfortunately the reports of a range of problems started to come in. These range from Apache crashing on start up or after a number of requests and also in some cases the Apache configuration to have requests sent through to Apache weren't even working.

Eventually I relented and figured I better try and sort it out. Just getting myself to a point where I could start debugging things was huge drama. First up my trusty old copy of Windows XP was too old to even be usable with current Microsoft tools required to compile code for Python. I therefore had to stoop to actually going out and buying a copy of Windows 8.1. That was a sad day for me.

Being the most recent version of Windows I would have thought the experience of setting things up would have improved over the years. Not so, it turned out to be even more aggravating. Just getting Windows 8.1 installed into a VM and applying all of the operating system patches took over 6 hours. Did I say I hated Windows.

Next I need to get the Microsoft compilers installed. Which version of these you need for Python these days appear to be documented in some dusty filing cabinet in a distant galaxy. At least I couldn't find it in an obvious place in the documentation, which was frustrating. Result was I wasted time installing the wrong version of Visual Studio Express only to find it wouldn't work.

So I work out I need to use Visual Studio Express 2008, but is there is an obvious link in the documentation of where to get that from. Of course not, or not that I could find and you have to go searching with Google and weeding out what are the dead links until you finally find the right one.

I had been at this for a few days now.

Rather stupidly I thought I would ignore my existing makefiles for Windows and try and build mod_wsgi using a Python 'setup.py' file like I have been doing with the 'pip' installable version for UNIX systems. That failed immediately because I had installed a 64 bit version of Python and distutils can only build 32 bit binaries using the Express edition of Visual Studio 2008. So I had to remove the 64 bit versions of Python and Apache and install 32 bit versions.

So I got distutils compiling the code, but then found I had to create a dummy Python module initialisation function that wasn't required on UNIX systems. I finally did though manage to get it all compiled and the one 'setup.py' file would work for different Python versions.

The joy was short lived though as Apache would crash on startup with every single Python version I tried.

My first suspicion was that it was the dreaded DLL manifest problem that has occasionally cropped up over the years.

The problem here is that long ago when building Python modules using distutils, it would attach a manifest to the resulting Python extension module which matched that used for the main Python binary itself. This meant that any DLL runtimes such as 'msvcr90.dll' would be explicitly referenced by the extension module DLL.

Some bright spark decided though that this was redundant and changed distutils to stop doing this. This would be fine so long as when using Python in an embedded system that the main executable, Apache in this instance, was compiled with the same compiler and so linked that runtime DLL.

When Apache builds started though to use newer versions of the Microsoft compilers Apache stopped linking that DLL. So right now the most recent Apache builds don't link 'msvcr90.dll'. The result therefore of distutils not linking that DLL any more meant the 'msvcr90.dll' DLL was missing when mod_wsgi was loaded into Apache and it crashed.

I therefore went back to my previous makefiles. This actually turned out to be less painful than I was expecting and I even managed to clean up the makefiles, separating out the key rules into a common file with the version specific versions having just the locations of Apache and Python. I even managed to avoid needing to define where the Microsoft compilers were located, which varied depending on various factors. It did mean that to do the compilation you had to do it from the special Windows command shell you can startup from the application menus with all the compiled specific shell variables already set, but I could live with that.

Great, all things were compiling again. I even managed to use a DLL dependency walker to verify that 'msvcr90.dll' was being linked in properly so it would be found okay when mod_wsgi was loaded into Apache.

First up I tried Python 2.7 and I was happy to see that it actually appeared to work. When I tried in turn Python 2.6, 3.2, 3.3 and 3.4 though, they either crashed immediately when initialising Python or when handling the first request.

I have tried for a while to narrow down where the problems are by adding in extra logging, but the main culprit is deep down in Python somewhere. Although I did actually manage to get the Microsoft debugger attached to the process before it crashed, because neither of the precompiled binaries for Apache and Python have debugging information, what one got was absolutely useless.

So that is where I am at. I have already wasted quite a lot of time on this and lack the patience but also the skills for debugging stuff on Windows. The most likely next step I guess would be to try and build up versions of Apache and Python from scratch with debugging symbols so that the exact point of the crashes can be determined. I can easily see that being an even larger time suck and so am not prepared to go there.

The end result is that unless someone else can some in and rescue the situation that is likely the end of Windows support for Apache/mod_wsgi.

So summarising where things are at.

1. I have multiple versions of Python installed on a Windows 8.1 system. These are python 2.6, 2.7, 3.2, 3.3 and 3.4. They are all 32 bit because getting Visual Studio Express to compile 64 bit binaries is apparently a drama in itself if using Python distutils, as one has to modify the installed version of distutil. I decided to skip on that problem and just use 32 bit versions. Whether having many versions of Python installed at the same time is a problem in itself I have no idea.

2. I have both Visual Studio Express 2008 and 2012 installed. Again don't know whether having both installed at the same time will cause conflicts of some sort.

3. I am using Apache 2.4 32 bit binaries from Apache Lounge. I haven't bothered to try and find older Apache 2.2 versions at this point.

4. Latest mod_wsgi version compiled from Python 2.7 does actually appear to work, or at least for the few requests I gave it.

5. Use Python version 2.6, 3.2, 3.3 or 3.4 and mod_wsgi will always crash on Apache startup or when handling the first request.

Now I know from experience that it is a pretty rare person who even considers whether they might be able to contribute to mod_wsgi. The combination of Apache, embedding Python using Python C APIs, non trivial use of Python sub interpreters and multithreading seem to scare people away. I can't understand why, such complexity can actually be fun.

Anyway, I throw all this information out here in case there is anyone brave enough to want to help me out.

If you are, then the latest mod_wsgi source code can be found on github at:

https://github.com/GrahamDumpleton/mod_wsgi

The source code has a 'win32' subdirectory. You need to create a Visual Studio 2008 Command Prompt window and get yourself into that directory. You can then build specific Apache/Python versions by running:

nmake -f ap24py33.mk clean

nmake -f ap24py33.mk

nmake -f ap24py33.mk install

Just point at the appropriate makefile.

The 'install' target will tell you what to add into the Apache configuration file to load the mod_wsgi module after it has copied it to the Apache modules directory.

You can then follow normal steps for configuring a WSGI application to run with Apache, start up Apache and see what happens.

Have success and learn anything then jump on the mod_wsgi mailing list and let me know. We can see where we go from there.

Much thanks and kudos in advance to anyone who does try.

Tuesday, December 16, 2014

Launching applications in Docker containers.

So far in this current series of blog posts I introduced the Docker image I have created for hosting Python WSGI applications using Apache/mod_wsgi. I then went on to explain what happens when you build your own image derived from it which incorporates your specific Python web application. In this blog post I am going to explain what happens when you run the image and how your Python web application gets started.

As shown in the previous blog posts I gave an example of the Dockerfile you would use for a simple WSGI hello world application:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "wsgi.py" ]

I also presented a more complicated example for a Django site. The Dockerfile for that was still only:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "--working-directory", "example", \
 "--url-alias", "/static", "example/htdocs", \
 "--application-type", "module", "example.wsgi" ]

In neither of these though is there anything that looks like a command to start up Apache/mod_wsgi or any other WSGI server. So first up lets explore how an application is started up inside of a Docker container.

Containers as an application

There are actually two different approaches one can take as to how to start up an application inside of a Docker container.

The first is to create a base image which contains the application you want to run. When you now start up the Docker container, you provide the full application command line as part of the command to launch the container.

For example, if you had created an image containing the Python interpreter and a Django web application with all dependencies installed, you could start up the Django web application using the Django development server using:

docker run my-python-env python example/manage.py runserver

If you wanted to start up an interactive shell so you could explore the environment of the container and/or manually run up your application, you could start it with:

docker run -it my-python-env bash

The second approach entails viewing the container itself as an application. If setup up in this way the image would be hardwired to start up a specific command when the container is launched.

For example, if when building your container you specified in the Dockerfile:

CMD [ “python”, “example/manage.py”, “runserver” ]

when you now start the container, you do not need to provide the command to run. In other words, running:

docker run my-python-env

would automatically start the Django development server.

You could if you wish still provide a command, but it will override the default specified by the ‘CMD’ instruction and cannot be used to extend the existing command with additional options.

If you therefore wanted to change for some reason the port the Django development server was listening on within the container, you would have to duplicate the whole command.

docker run my-python-env python example/manage.py runserver 80

The ‘CMD’ instruction, although it allows you to supply a default command, doesn’t therefore go so far as to make the container behave like it is an application in its own right, which can accept arbitrary command line arguments when run.

To have a container behave like that, we use an alternative instruction to ‘CMD’ called ‘ENTRYPOINT'.

So we swap the ‘CMD’ instruction with ‘ENTRYPOINT’, setting it to the default command for the container.

ENTRYPOINT [ “python”, “example/manage.py”, “runserver” ]

When we now run the container as:

docker run my-python-env

the Django development server will again be run, but we can now supply additional command line options which will be appended to the default command.

docker run my-python-env 80

Although we changed to the ‘ENTRYPOINT’ instruction, it and the ‘CMD’ instruction are not exclusive. You could actually write in the Dockerfile:

ENTRYPOINT [ “python”, “example/manage.py”, “runserver” ]

CMD [ “80” ]

In this case when you start up the container, the combined command line of:

python example/manage.py runserver 80

would be run.

Supply any command line options when starting the container in this case, and they will override those specified by the ‘CMD’ instruction, but the ‘ENTRYPOINT’ instruction will be left as is. Running:

docker run my-python-env 8080

would therefore result in the combined command line of:

python example/manage.py runserver 8080

Those therefore are the basic principals around how to launch an application within a container when it is started. Now lets look at what the Docker image I have created for running a Python WSGI application is using.

Inheritance of an ENTRYPOINT

In the prior blog post I explained how the Docker image I provide isn’t just one image but a pair of images. The base image packaged up all the required tools, including Python, Apache and mod_wsgi. The derived image was what was called an ‘onbuild’ image and controlled how your specific Python web application would be built and combined with the base image.

The other thing that the ‘onbuild’ image did was to define the process for how the web application would then be started up when the container was run. The complete Dockerfile for the ‘onbuild’ image was:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7

WORKDIR /app

ONBUILD COPY . /app
ONBUILD RUN mod_wsgi-docker-build

EXPOSE 80

ENTRYPOINT [ "mod_wsgi-docker-start" ]

As can be seen, this specified an ‘ENTRYPOINT’ instruction, which as we now know from above, will result in the ‘mod_wsgi-docker-start’ command being run automatically when the container is started.

Remember though that to create an image with your specific Python web application you actually had to create a further image deriving from this ‘onbuild’ image. For our Django web application that was:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "--working-directory", "example", \
 "--url-alias", "/static", "example/htdocs", \
 "--application-type", "module", "example.wsgi" ]

This final Dockerfile doesn’t itself specify an ‘ENTRYPOINT’ instruction, but it does define a ‘CMD’ instruction.

This highlights an important point. That is that an ‘ENTRYPOINT’ instruction will be inherited from a base image and will also be applied to the derived image. Thus when we startup up the container corresponding to the final image, that command will still be run.

As it turns out, a ‘CMD’ instruction will also be inherited by a derived image, but in this case the final image specified its own ‘CMD’ instruction. The final result of all this was that when the container was started up, the full command that was executed was:

mod_wsgi-docker-start —working-directory example \
    —url-alias /static example/htdocs \
    —application-type module example.wsgi

As was the case when building the final image and the ‘mod_wsgi-docker-build’ was being run in the context of building that image, there is various magic going on inside ‘mod_wsgi-docker-start’ so time to delve into what it is doing.

Preparing the environment on start up

When the image containing your Python web application was being built, the ‘mod_wsgi-docker-build’ script allowed you to provide special hook scripts which would be run during the process of building the image. These were a ‘pre-build’ and ‘build’ hook. These allowed you to perform special actions prior to ‘pip’ being run to install any Python packages, as well as afterwards, to perform any subsequent application specific setup.

When deploying the web application and starting it up, it is common practice to again provide an ability to have special hook scripts to allow you to perform additional steps. These are sometimes called a ‘deploy’ and ‘post-deploy’ hook.

One of the first things that the ‘mod_wsgi-docker-start’ script therefore does is execute any ‘deploy’ hook.

if [ -x .docker/action_hooks/deploy ]; then
  echo " -----> Running .docker/action_hooks/deploy"
  .docker/action_hooks/deploy
fi

As with the hooks for the build process, this should reside in the ‘.docker/action_hooks’ directory and the script needs to be executable.

Now normally for a web site if you have a backend database it would be persistent and the data stored in it would have a life independent of the life time of individual containers running your web application.

If however you were using Docker to create a throw away instance of a Python web application and it was paired to a transient database instance that only existed for the life of the Python web application, then one of the steps you would need to perform is that of preparing the database, creating any tables and possibly populating it with data. This would need to be done before starting the actual Python web application.

One use therefore of the ‘deploy’ hook if running a Django web application is to run inside of the hook script the ‘migrate’ Django management command.

#!/usr/bin/env bash

python example/manage.py migrate

There is one important thing to note here which is different to what happens when hook scripts are run during the build phase. That is that during the build phase the ‘mod_wsgi-docker-build’ script itself and any hook scripts would use:

set -eo pipefail

The intent of this was to cause a hard failure when building the image rather than leaving the image in an inconsistent state.

What to do about errors during the actual deployment phase is a bit more problematic. If one causes a hard failure then it would cause the container to shutdown immediately. Taking such a drastic action may be undesirable as it robs you of the possibility of trying to deal with any errors and/or alerting of a problem and then running in a degraded state while someone is able to look at and rectify any issue.

It is something I don’t know what the best answer is right now so am open to suggestions of how to deal with it. For now therefore if an error does occur in a ‘deploy’ hook, the actual Python web application will be started up anyway.

Starting up the Python web application

Having run any ‘deploy’ hooks we are now ready to start up the actual Python web application. The part of the ‘mod_wsgi-docker-start’ script which does this is:

SERVER_ARGS="--log-to-terminal --startup-log --port 80"

exec mod_wsgi-express start-server ${SERVER_ARGS} "$@"

The key step in this is the execution of the ‘mod_wsgi-express’ command. I will defer to a subsequent blog post to talk more about what ‘mod_wsgi-express' is doing, but it is enough to know right now that it is what is actually starting up Apache and loading up your Python WSGI application so that it can then handle any web requests.

In running ‘mod_wsgi-express’ we by default supply it with the ‘—log-to-terminal’ option to have it log to stdout/stderr so that Docker can collect the logs automatically, making them available to commands such as ‘docker logs’.

When we do run ‘mod_wsgi-express’, it is important to note that we do it via an ‘exec’. This is done so that Apache will replace the current script process and so inherit process ID 1. This is done because Docker treats process ID 1 as being special and it is that process where it delivers any signals injected into the container from outside. Having Apache run as process ID 1 therefore ensures that it receives shutdown signals for the container properly and can attempt to shutdown the Python web application in an orderly manner.

What process ID 1 should be in a Docker container is actually something that sees a bit of debate. In one corner the Docker philosophy has that a container should only run one application and so it is sufficient for that application process to run as process ID 1.

In the other corner you have others who argue that in the limited environment of a Docker container you are missing many of the things that the normal init process running in a UNIX system as process ID 1 would do such as to cleanup zombie processes and the like. You also lack the ability for an application process to be restarted if it crashes.

You therefore will see people who will run something like ‘runit’ or ‘supervisord’ inside of the container with those starting up and managing the actual application processes.

For the image I am providing I am relying on the fact that Apache is its own process supervisor for its managed child processes and has demonstrated stability and although its child process may crash, the Apache parent process is rock solid and doesn’t crash.

I did contemplate the use of ‘supervisord’ for other reasons, but a big problem with ‘supervisord' is that it still has not been ported to Python 3. This is an issue because when using Docker it is common to provide separate images for different Python versions. This is done to cut down on the amount of fat in the images. This means that for a Python 3 image there only exists Python 3. Having to install a copy of Python 2 as well just so one can run ‘supervisord’ is therefore somewhat annoying.

The problem of ‘post-deploy’ actions

Now I mentioned previously that it is common to have both a ‘deploy’ and ‘post-deploy’ hook script, with the ‘deploy’ script as shown being run prior to starting the actual Python web application. The idea with the ‘post-deploy’ script is that it would be run after the Python web application has been started.

Such ‘post-deploy’ scripts cause a number of complications.

The first is that to satisfy the requirement that the Apache process be process ID 1, it was necessary to ‘exec’ the ‘mod_wsgi-express’ command when run. Because an ‘exec’ was done, then nothing else can be done after that within the ‘mod_wsgi-docker-start’ script as it is no longer running.

One might be able to support a ‘post-deploy’ script when using something like ‘runit’ or ‘supervisord’ if they allow ordering relationships to be defined for when commands are started up, and they allow for one off commands to be run rather than attempting to rerun the command if it exits straight away.

Even so, this doesn’t solve all the problems that exist with what might be run from a ‘post-deploy’ script.

To understand what these issues are we need to look at a couple of examples of what might be done in a ‘post-deploy’ script.

The first example is that you might want to use a ‘post-deploy’ script to hit your Python web application with requests against certain URLs to forcibly preload parts of the application code base before it starts handling web requests. This might be desirable with Django in particular because of the way it lazily loads view handlers in most cases only when they are accessed.

The second example is where you need to poll the Python web application to ensure that it has actually started up properly and is serving requests okay. When all is okay, you might then use some sort of notification system to announce availability of this instance of the Python web application, resulting in it being added into a server group of a front end load balancer.

In both these cases, such a script has to be tolerant of the web application being slow to start and cannot assume that it will actually be ready when the scripts run. Since this is the case and they would need to poll for availability of the Python web application, the need for having a separate ‘post-deploy’ phase is somewhat diminished. One one can instead do is start these actions as background processes during the ‘deploy’ phase instead.

So due to the requirement that Apache needs to run as process ID 1, there is at this point no distinct ‘post-deploy’ phase. To achieve the same result, these should be run as background tasks from the ‘deploy’ phase instead, polling for availability of the Python web application before then running what ever it is intended they should do.

Starting up an interactive shell

As previously described, one of the things that one can do when ‘ENTRYPOINT’ is used, is override the actual command that would be run by a container, even when the Dockerfile for the image defined a ‘CMD’ instruction. It was thus possible to run:

docker run -it my-python-env bash

This gives us access to an interactive shell to explore the container or run any commands manually.

When we use ‘ENTRYPOINT’ it would appear that we loose this ability.

All is not lost though, it just means we need to handle this a bit differently.

Before I show how that is done, one feature of Docker that is worth pointing out is that it is actually possible to get an interactive shell into an already running container. This is something that originally would have required running ‘sshd’ inside of the container, or required using a mechanism called ’nsenter’. Because this was a common requirement though, Docker has for a while now provided the ‘exec’ command.

What you can therefore do if you already have a running container you want to inspect is run:

docker exec -it hungry_bardeen bash

where the name is the auto generated name for the container or one you have assigned when the container was started.

If we don’t already have a running container though, what do we do?

Normally we can supply ‘bash’ as the command and it will work because there is an implicit ‘ENTRYPOINT’ for a container of ‘bash -c’. This is what allows us to specify any command.

In our case the implicit ‘ENTRYPOINT’ has been replaced with ‘mod_wsgi-docker-start’ and anything we supply on the command line when the container is run will be passed as options to it instead.

The first thing we have to do is work out how we can reset the value of the ‘ENTRYPOINT’ from the command line. Luckily this can be done using the ‘—entrypoint’ command.

So we can try running:

docker run -it —entrypoint bash my-python-app

This will have the desired affect of running bash for us, but will generally fail.

The reason it will fail is that any ‘CMD’ defined within the image will be passed to the ‘bash’ command as arguments when run as the entry point.

Trying to wipe out the options specified by ‘CMD’ using:

docker run -it —entrypoint bash my-python-app ''

doesn’t help as ‘bash’ will then look for and try to execute a script with an empty name.

To get around this problem and make things a little easier, the image I supply for hosting Python WSGI applications supplies an additional command called ‘mod_wsgi-docker-shell’. What one would therefore run is:

docker run -it —entrypoint mod_wsgi-docker-shell my-python-app

In this case the ‘mod_wsgi-docker-shell’ script would be run and although what is defined by the ‘CMD’ instruction is still passed as arguments to it, they will ignored and ‘mod_wsgi-docker-shell’ will ‘exec’ the ‘bash’ command with no arguments.

That therefore is my little backdoor. The only other way I found of getting around the issue, is the non obvious command of:

docker run -it —entrypoint nice my-python-app bash

In this case we are relying on the fact that the ‘nice’ command will in turn execute any arguments it is supplied. I thought that ‘mod_wsgi-docker-shell’ may be more obvious though.

What is ‘mod_wsgi-express’?

So what exactly is ‘mod_wsgi-express’ then? In my next blog I will introduce what ‘mod_wsgi-express’ is doing.

The ‘mod_wsgi-express’ program is something that I have been working on for over a year now and has been available since early this year. Although I have commented a bit about it on Twitter, and in a conference talk I did, I haven’t as yet written any blog posts about it. My next blog post will therefore be the first proper public debut of ‘mod_wsgi-express’ and what it can do.

Wednesday, December 10, 2014

Deferred build actions for Docker images.

In my last blog post I introduced what I have been doing on creating a production quality Docker image for hosting Python WSGI applications using Apache/mod_wsgi.

In that post I gave an example of the Dockerfile you would use for a simple WSGI hello world application:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "wsgi.py" ]

I also presented a more complicated example for a Django site. The Dockerfile for that was still only:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "--working-directory", "example", \
  "--url-alias", "/static", "example/htdocs", \
  "--application-type", "module", "example.wsgi" ]

In the case of the Django site there are actually going to be quite a number of files that are required, including the Python code and static file assets. There was also a list of Python packages needing to be installed defined in a pip 'requirements.txt' file.

The big question therefore is how with only that small Dockerfile were all those required files getting copied across and how were all the Python packages getting installed?

Using an ONBUILD Docker image

The name of the Docker image which the local Dockerfile derived from was in this case called:

grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

The clue as to what is going on here is the 'onbuild' qualifier in the name of the image.

Looking at the Dockerfile used to build that 'onbuild' image we find:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7

WORKDIR /app

ONBUILD COPY . /app
ONBUILD RUN mod_wsgi-docker-build

EXPOSE 80
ENTRYPOINT [ "mod_wsgi-docker-start" ]

If you were expecting to see here a whole lot of instructions for creating a Docker image consisting of Python, Apache and mod_wsgi you will be sadly disappointed. This is because all that more complicated stuff is actually contained in a further base image called:

grahamdumpleton/mod-wsgi-docker:python-2.7

So what has been done is to create a pair of Docker images. An underlying base image is what provides all the different software packages we need. The derived 'onbuild' image is what defines how a specific users application is combined with the base image to form the final deployable application image.

This type of split is a common pattern one sees for Docker images which are trying to provide a base level image for deploying applications written in a specific programming language.

Using a derived image to define how the application is combined with the base image means that if someone doesn't agree with the specific way that is being done, they can ignore the 'onbuild' image and simply derive direct from the base image and define things their own way.

Deferring execution using ONBUILD

When creating a Dockerfile, you will often use an instruction such as:

COPY . /app

What this does is that at the point that the Docker image is being built, it will copy the contents of the directory the Dockerfile is contained in into the Docker image. In this case everything will be copied into the '/app' directory of the final Docker image.

If you are doing everything in one Dockerfile this is fine. In this case though I want to include that sort of boiler plate that is always required to be run in a base image, but for the instruction to only be executed when a user is creating their own derived image with their specific application code.

To achieve this I used the 'ONBUILD' instruction, prefixing the original 'COPY' instruction.

The affect of using 'ONBUILD' is that the 'COPY' instruction will not actually be run. All that will happen is that the 'COPY' instruction will be recorded. Any instruction prefixed with 'ONBUILD' will then only be replayed when a further derived image is being created which derives from this image.

Specifically, any such instructions will be run as the first steps after the 'FROM' instruction in a derived image, with:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7
WORKDIR /app
ONBUILD COPY . /app
ONBUILD RUN mod_wsgi-docker-build
EXPOSE 80
ENTRYPOINT [ "mod_wsgi-docker-start" ]

and:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild
CMD [ "wsgi.py" ]

being the same as if we had written:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7
WORKDIR /app
EXPOSE 80
ENTRYPOINT [ "mod_wsgi-docker-start" ]

and:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild
COPY . /app
RUN mod_wsgi-docker-build
CMD [ "wsgi.py" ]

You can determine whether a specific image you want to use is using 'ONBUILD' instructions by inspecting the image:

$ docker inspect grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

Look in the output for the 'OnBuild' section and you should find all the 'ONBUILD' instructions listed.

"OnBuild": [
 "COPY . /app",
 "RUN mod_wsgi-docker-build"
 ],

Anyway, as noted before, the point here is to make what the final user does as simple as possible and avoid the need to always include specific boilerplate, thus the reason for using 'ONBUILD'.

Installing extra system packages

For this specific Docker image, the deferred steps triggered by the 'ONBUILD' instructions will perform a number of tasks.

The first and most obvious action is that performed by the instruction:

COPY . /app

As explained before, this will copy the contents of the application directory into the Docker image.

The next step which is performed is:

RUN mod_wsgi-docker-build

This will execute the program 'mod_wsgi-docker-build'. This magic program is what is going to do all the hard work. The actual program was originally included into the image by way of the base image that the 'onbuild' image derived from. That is, in:

grahamdumpleton/mod-wsgi-docker:python-2.7

The key thing that the 'mod_wsgi-docker-build' script will do is run 'pip' to install any Python packages which are listed in the 'requirements.txt' file. Before 'pip' is run though there are a few other things we also want to do.

The first thing that the 'mod_wsgi-docker-build' script does is:

if [ -x .docker/action_hooks/pre-build ]; then
  echo " -----> Running .docker/action_hooks/pre-build"
  .docker/action_hooks/pre-build
fi

This executes an optional 'pre-build' hook script which can be supplied by the user and allows a user to perform additional steps prior to any Python packages being installed using 'pip'. Such a hook script would be placed into the '.docker/action_hooks' directory.

The main purpose of the 'pre-build' hook script would be to install additional system packages that may be required to be present when installing the Python packages. This would be necessary due to the fact that the base image is a minimal image that only installs the minimum system packages required for Python and Apache to run. It does not try and anticipate every single common package that may be required to cover the majority of use cases.

If one did try and guess what additional packages people might need, then the size of the base image would end up being significantly larger in size. As such, installation of such additional system packages is up to the user based on their specific requirements. Being Docker though, there are no restrictions on installing additional system packages. This is in contrast to your typical PaaS providers, where you are limited to the packages they decided you might need.

As an example, if the Python web application to be run needed to talk to a Postgres database using the Python 'psycopg2' module, then the Postgres client libraries will need to be first installed. To ensure that they are installed a 'pre-build' script containing the following would be used.

#!/usr/bin/env bash

set -eo pipefail

apt-get update
apt-get install -y libpq-dev

rm -r /var/lib/apt/lists/*

The line:

set -eo pipefail

in this script is important as it ensures that the shell script will exit immediately on any error. The 'mod_wsgi-docker-build' script does the same, with the reason being that you want the build of the image to fail on any error, rather than errors being silently ignored and an incomplete image created.

The line:

rm -r /var/lib/apt/lists/*

is an attempt to clean up any temporary files so that the image isn't bloated with files that aren't required at runtime.

Do note that the script itself must also be executable else it will be ignored. It would be nice to not have this requirement and for the script to be made executable automatically if it wasn't but doing so seems to trip a bug in Docker.

Installing required Python packages

The next thing that the 'mod_wsgi-docker-build' script does is:

if [ ! -f requirements.txt ]; then
  if [ -f setup.py ]; then
    echo "-e ." > requirements.txt
  fi
fi

In this case we are actually checking to see whether there even is a 'requirements.txt' file. Not having one is okay and we simply wouldn't install any Python packages.

There is though a special case we check for if there is no 'requirements.txt' file. That is if the directory for the application contains a 'setup.py' file. If it does, then we assume that the application directory is itself a Python package that needs to first be installed, or at least, that the 'setup.py' has defined a set of dependencies that need to be installed.

Now normally when you have a 'setup.py' file you would run:

python setup.py install

We can though achieve the same result by listing the currently directory as '.' in the 'requirements.txt' file, preceded by the special '-e' option.

This approach of using a 'setup.py' as a means of installing a set of Python packages was an approach often used before 'pip' became available and the preferred option. It is still actually the only option that some PaaS providers allow for installing packages, with them not supporting the use of a 'requirements.txt' file and automatic installation of packages using 'pip'.

Interestingly, this ability to use a 'setup.py' file and installing the application as a package is something that the official Docker images for Python can't do at this time. This is because they use:

RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app

ONBUILD COPY requirements.txt /usr/src/app/
ONBUILD RUN pip install -r requirements.txt

ONBUILD COPY . /usr/src/app

The problem they have is that they only copy the 'requirements.txt' file into the image before running 'pip' and because the current working directory when running 'pip' only contains that 'requirements.txt' file, then the 'requirements.txt' file cannot refer to anything contained in the application directory.

So the order in which they do things prohibits it from working. I am not sure I fathom why they have chosen to do it the way they have and whether they have a specific reason. It is perhaps something they should change, but I don't rule out they may have a valid reason for doing that so haven't reported it as an issue at this point.

After seeing if what packages to install may be governed by a 'setup.py' file the next steps run are:

if [ -f requirements.txt ]; then
  if (grep -Fiq "git+" requirements.txt); then
    echo " -----> Installing git"
    apt-get update && \
      apt-get install -y git --no-install-recommends && \
      rm -r /var/lib/apt/lists/*
  fi
fi

if [ -f requirements.txt ]; then
  if (grep -Fiq "hg+" requirements.txt); then
    echo " -----> Installing mercurial"
    pip install -U mercurial
  fi
fi

These steps are required to determine whether the 'requirements.txt' file lists packages to be installed which are actually managed under a version control system such as git or mercurial. If there is, then since the base image provides only a minimal set of packages, then it will be necessary to install git or mercurial as necessary.

Finally we can now run 'pip' to install any packages listed in the 'requirements.txt' file:

if [ -f requirements.txt ]; then
  echo " -----> Installing dependencies with pip"
  pip install -r requirements.txt -U --allow-all-external \
    --exists-action=w --src=.docker/tmp
fi

Performing final application setup

All the system packages and Python packages are now installed. This is along with all the Python application code also having been copied into the image. There is one more thing to do though. That is to provide the ability for a user to provide a 'build' hook of their own to perform any final application setup.

if [ -x .docker/action_hooks/build ]; then
  echo " -----> Running .docker/action_hooks/build"
  .docker/action_hooks/build
fi

As with the 'pre-build' script, the 'build' script needs to be placed into the '.docker/action_hooks' directory and needs to be executable.

The purpose of this script is to allow any steps to be carried out that require all the system packages and Python packages to have first been installed.

An example of such a script would be if running Django and you wanted to have the building of the Docker image collect together all the static file assets, rather than having to do it prior to attempting to build the image.

#!/usr/bin/env bash

set -eo pipefail

python example/manage.py collectstatic --noinput

Some PaaS providers in their build systems will try and automatically detect if Django is being used and run this step for you without you knowing. Although I have no problem with using special magic, I do have an issue where the system would actually be having to guess if it was your intent that such a step be run. I therefore believe in this case it is better that it be explicit and that the user define the step themselves. This avoids any unexpected problems and eliminates the need to provide special options to disable such magic steps when they aren't desired and go wrong.

Starting the Python web application

The above steps complete the build process for the Docker image. The next phase would be to deploy the Docker image and get it running. This is where the final part of the 'onbuild' Dockerfile comes into play:

EXPOSE 80
ENTRYPOINT [ "mod_wsgi-docker-start" ]

I will explain what happens with that in my next post on this topic.

Tuesday, December 2, 2014

Hosting Python WSGI applications using Docker.

As I mentioned in my previous blog post I see a lot of promise for Docker. The key thing that I personally see as being able to gain from Docker, as a provider of a hosting solution for Python WSGI applications, is that I can get back some control over the hosting experience that developers will have.

Right now things can quickly become a bit of a mess, because the experience that developers have of Apache/mod_wsgi is going to be dictated by how a Linux distribution or hosting provider has setup Apache, and how easy they have made customising it in order to add the ability to host Python WSGI applications and then tune the Apache server. The less than optimal experience that developers usually have means they do not get to appreciate how well Apache/mod_wsgi can work and simply desert it for other options.

In the case of Docker, I can provide a pre packaged image for hosting Python WSGI applications which uses my knowledge of how to set up Apache and mod_wsgi properly to give the best experience. I can therefore hope that Docker may help me to win back some of those who otherwise never really understood the strengths of Apache and mod_wsgi.

Current offerings for Python and Docker

Although Docker is still young, to be frank, the bulk of the information around about running Python WSGI application with Docker is pretty woeful. The instructions provided focus more on how to use Docker itself rather than how to create a production capable hosting solution for Python WSGI applications within the container. Nearly all explanations I have found describe the use of builtin development servers for Python web frameworks such as Flask and Django. Using inbuilt development servers for production is generally a very bad idea.

In some cases they will suggest the use of gunicorn or CherryPy WSGI servers, but these themselves cannot handle hosting of static files. How exactly you are meant to host the static files they don't really provide details on, at most perhaps suggesting the use of S3 as a completely separate mechanism for hosting them.

There are available some Docker images for using uWSGI, but they are generally setup with the specific requirements of that user in mind, rather than trying to provide a good reusable image that can be applied across many uses cases, without you yourself having to do some measure of re-configuration. Again they aren't exactly transparent as far as handling static files and leave that mostly up to you to work out how to solve.

The final problem with the uWSGI Docker images is that they are effectively unsupported efforts and haven't been updated in some time. They therefore are not keeping up to date with any security fixes or general bug fixes in the packages they are using.

Using Apache/mod_wsgi and Docker

To date I have not seen anyone attempt to describe how to use Apache and mod_wsgi with Docker. It isn't something that I am going to do exactly either, in as much as rather than describe how you yourself could create an image for using Apache and mod_wsgi with Docker, I am simply going to provide a pre packaged image instead. What I will describe therefore is how to use that image and how best to use it in its pre packaged form.

This blog post is therefore the first introduction to this endeavour. I will show you how to use the Docker image with a couple of different WSGI applications and then in subsequent blog posts I will start peeling apart the layers and explain the different parts that go into it and what capabilities it has. Provided I don't get too carried away with doing more coding, which is obviously the fun bit, I will back things up by finally starting to upgrade the mod_wsgi documentation to cover it and all the other new features that are available in mod_wsgi these days.

Running a Hello World WSGI application

Lets start out therefore with the canonical WSGI hello world application.

def application(environ, start_response):
    status = '200 OK'
    output = 'Hello World!'

    response_headers = [('Content-type', 'text/plain'),
                        ('Content-Length', str(len(output)))]

    start_response(status, response_headers)

    return [output]

Create a new empty directory and place this in a file called 'wsgi.py'.

This Hello World program has no associated static files, nor does it require any additional Python modules to be installed. Even though no separate modules are required at this point, we will still create a 'requirements.txt' file in the same directory. This 'requirements.txt' file will be left empty for this example.

The next step is to create a 'Dockerfile' to build up our Docker image. As we are going to use the pre packaged Docker image I am providing and it embeds various magic, all that the 'Dockerfile' needs to contain is:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "wsgi.py" ]

For the image being derived from, an 'ENTRYPOINT' is already defined which will run up Apache/mod_wsgi. The 'CMD' instruction therefore only needs to provide any options, which at this point consists only of the path to the WSGI script file, which we had called 'wsgi.py'.

We can now build the Docker image for the Hello World example:

docker build -t my-python-app .

and then run it:

docker run -it --rm -p 8000:80 --name my-running-app my-python-app

The Hello World WSGI application will now be accessible by pointing your browser at port 8000 on the Docker host.

Running a Django based web site

We don't run Hello World applications as our web sites, so we also need to be able to run whole Python web sites constructed using web frameworks such as Django as well. It is with more complicated web applications that we start to also have static files that need to be hosted at the same time, so we need to deal with that somehow. The Python module search path can also require special setup so that the Python interpreter can actually find where the Python code for the web application is located.

So imagine that you have a Django web application constructed using the standard layout. From the top directory of this we would therefore have something like:

example/
example/example/
example/example/__init__.py
example/example/settings.py
example/example/urls.py
example/example/views.py
example/example/wsgi.py
example/htdocs/
example/htdocs/admin
example/htdocs/admin/...
example/manage.py
requirements.txt

The 'requirements.txt' which was used to create any local virtual environment used during development would already exist, and at the minimum would contain:

Django

Within the directory would then be the actual project directory which was created using the Django admin 'startproject' command.

As this example requires static files, we setup the Django settings file to define the location of a directory to keep the static files:

STATIC_ROOT = os.path.join(BASE_DIR, 'htdocs')
STATIC_URL = '/static/'

and then run the Django admin 'collectstatic' command. The 'collectstatic' command copies all the static file assets from any Django applications into the common 'htdocs' directory. This directory will then need to be mounted at the '/static' URL when we run Apache/mod_wsgi.

What we are going to do now is create a 'Dockerfile' in the same directory as the 'requirements.txt' file. This will be the root of our application when copied across to the Docker image.

Now normally when Apache/mod_wsgi gets run with with the pre packaged image, the root directory of the application would normally be the current working directory for the application and also be added to the Python module search path. For a Django site, what we really want is for the top level 'example' directory to be the current working directory and for it to be searched for Python modules. This is necessary so that the correct directory is searched from for the Django settings file, which for this example has the module path 'example.settings'.

With the way Django lays out the project and creates the 'wsgi.py' file such that it is importable as 'example.wsgi', it can be preferable to use it as a module rather than as a WSGI script file. I'll get into the distinction another time, but importing it as a module does allow me to show off that it is possible to use a WSGI script file, a module or even a Paste style ini configuration file as the application entry point.

With all that said, we now actually create the 'Dockerfile' and in it we place:

FROM grahamdumpleton/mod-wsgi-docker:python-2.7-onbuild

CMD [ "--working-directory", "example", \
      "--url-alias", "/static", "example/htdocs", \
      "--application-type", "module", "example.wsgi" ]

The options to the 'CMD' instruction in this case serve the following purposes.

The '--working-directory' option says that the 'example' directory should actually be set to be the current working directory for the WSGI application when run. That directory will also be added automatically to the Python module search path so that the 'example' package which contains all the code can be found.

The '--url-alias' option says that the static files in the 'examples/htdocs' directory should be mounted at the '/static' URL as was specified by the 'STATIC_URL' setting in the Django settings module.

The '--application-type' option says that rather than the WSGI application entry point being specified as a WSGI script file, it is defined by the listed module path. The default for this would have been 'script', with another possible value being 'paste' for a Paste style ini configuration file.

Finally, the 'example.wsgi' option is the Python module path for the 'wsgi.py' sub module in the project 'example' package.

As before we build the Docker image and then run it.

In a real Django site we would normally also have a database and possibly a key/value cache of some sort. Setting these up is beyond the scope of this post but would follow normal Docker practices. That or you might use a tool such as Fig to manage the linking and spin up of all the containers.

Setting up the Apache configuration file

In short there isn't one and you do not have to concern yourself with it.

This is a key point with being able to supply a pre packaged image for hosting using Apache/mod_wsgi. Since users often don't want to learn properly how to set up Apache and as such it causes so much grief, I can completely remove the need for a developer to have to worry about it.

Instead I can provide a simplified set of command line options which implement the basic features that most sites would want to use when setting up Apache/mod_wsgi. The scripts under pinning the pre packaged Docker image can then dynamically generate the Apache configuration on the fly based on the specific options provided.

In doing this, this is where I can apply my knowledge of how to set up Apache/mod_wsgi and ensure things are down correctly, securely and in a way that would give a good level of performance out of the box.

This doesn't mean you can't avoid needing to tune the settings to get Apache/mod_wsgi to run for your specific site, but the number of knobs you would have to worry about is greatly reduced as everything else would be handled automatically.

But how does this all actually work?

So this is an initial introduction to just one of a number of new things I have been working on related to mod_wsgi. As I start to peel back some of the layers to explain how all this works I will start to introduce some of the other things I have been cooking up over the last year, including alluding to other things that hopefully I will get to down the track.

If you can't wait and want to play around with these docker images they can be found, along with some basic setup information, on the Docker hub.

If you need help in working out how to use them or have any other questions, then rather than try and post questions against this blog, go and ask your questions on the mod_wsgi mailing list. Please do not use StackOverflow or related sites as I don't answer questions there any more and no one there will know anything anyway since this is all so new.