Tuesday, July 26, 2016

Installing mod_wsgi on MacOS X with native operating system tools.

Operating systems inevitably change over time, and because writing documentation is often an after thought or developers have no time, the existing instructions on how to install a piece of software can suffer bit rot and stop working. This has been the case for a while with various parts of the documentation for mod_wsgi. This post is a first step at least in getting the documentation for installing mod_wsgi on MacOS X brought up to date. The post will focus on installing mod_wsgi using the native tools that the MacOS X operating system provides.

Installing direct from source code

A precompiled binary package for mod_wsgi is actually available from Apple as part of the Mac OS X server app available from the MacOS X App Store. The last time I looked this was a very old version of mod_wsgi from many years ago. Unless for some reason you really need to use the version of mod_wsgi provided with MacOS X server, I would instead recommend you install an up to date version of mod_wsgi direct from source code.

Installation of mod_wsgi from source code on MacOS X used to be a simple matter, but with the introduction of System Integrity Protection in MacOS X El Capitan this has become a bit more complicated.  Lets step through the normal steps for installing mod_wsgi to see what the issue is.

After having downloaded and extracted the latest source code for mod_wsgi, to install mod_wsgi direct into an Apache installation involves running the traditional steps for most Open Source packages of doing a ‘configure’, ‘make’ and ‘sudo make install’.

Before you do that it is important though that you have at least installed the Xcode command line tools. This is an Apple supplied package for MacOS X which contains the C compiler we will need to build the mod_wsgi source code, as well as the headers files and other support files for the Apache httpd web server.

To check that you have the Xcode command line tools you can run ‘xcode-select --install’. If you have them installed already, you should see the message below, otherwise you should be stepped through installation of the package.

$ xcode-select --install
xcode-select: error: command line tools are already installed, use "Software Update" to install updates

Do ensure that you have run software update to get the latest version for your operating system revision if you don’t regularly update.

With the Xcode command line tools installed, you can now run the ‘configure’ script found in the mod_wsgi source directory.

$ ./configure
checking for apxs2... no
checking for apxs... /usr/sbin/apxs
checking for gcc... gcc
checking whether the C compiler works... yes
checking for C compiler default output file name... a.out
checking for suffix of executables...
checking whether we are cross compiling... no
checking for suffix of object files... o
checking whether we are using the GNU C compiler... yes
checking whether gcc accepts -g... yes
checking for gcc option to accept ISO C89... none needed
checking for prctl... no
checking Apache version... 2.4.18
checking for python... /usr/bin/python
configure: creating ./config.status
config.status: creating Makefile

Important to note here is that we want the ‘apxs’ version found to be ‘/usr/sbin/apxs’ and the ‘python’ version found to be ‘/usr/bin/python’. If these aren’t the versions found then it indicates that you have a Python or Apache httpd server installation which was installed separately and is not the native versions supplied with MacOS X. I am not going to cover using separate Python or Apache httpd server installations in this post and assume you only have the native tools.

The next step is to run ‘make’.

$ make
./apxs -c -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -DENABLE_DTRACE -DMACOSX -DNDEBUG -DNDEBUG -DENABLE_DTRACE -Wc,-g -Wc,-O2 -Wc,'-arch x86_64' src/server/mod_wsgi.c src/server/wsgi_*.c -L/System/Library/Frameworks/Python.framework/Versions/2.7/lib -L/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/config -arch x86_64 -lpython2.7 -ldl -framework CoreFoundation
./libtool --silent --mode=compile /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc -DDARWIN -DSIGPROCMASK_SETS_THREAD_MASK -DDARWIN_10 -I/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.11.Internal.sdk/usr/include/apr-1 -I/usr/include/apache2 -I/usr/include/apr-1 -I/usr/include/apr-1 -g -O2 -arch x86_64 -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -DENABLE_DTRACE -DMACOSX -DNDEBUG -DNDEBUG -DENABLE_DTRACE -c -o src/server/mod_wsgi.lo src/server/mod_wsgi.c && touch src/server/mod_wsgi.slo
...
./libtool --silent --mode=link /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc    -o src/server/mod_wsgi.la  -rpath /usr/libexec/apache2 -module -avoid-version    src/server/wsgi_validate.lo src/server/wsgi_thread.lo src/server/wsgi_stream.lo src/server/wsgi_server.lo src/server/wsgi_restrict.lo src/server/wsgi_metrics.lo src/server/wsgi_memory.lo src/server/wsgi_logger.lo src/server/wsgi_interp.lo src/server/wsgi_daemon.lo src/server/wsgi_convert.lo src/server/wsgi_buckets.lo src/server/wsgi_apache.lo src/server/mod_wsgi.lo -L/System/Library/Frameworks/Python.framework/Versions/2.7/lib -L/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/config -arch x86_64 -lpython2.7 -ldl -framework CoreFoundation

If you look closely you might note something strange in the output from running ‘make’. That is that rather than running ‘/usr/bin/apxs’ it is running a version of ‘apxs’ out of the directory where ‘make’ was run. Similarly, the system version of ‘libtool’ is ignored and a local copy used instead.

For those not familiar with what ‘apxs’ is, it is a tool supplied with the Apache httpd server package to assist in the compilation and installation of Apache modules. Unfortunately, every time that a new version of MacOS X comes out Apple somehow breaks the ‘apxs’ tool so that it doesn’t work. Typically this is because the ‘apxs’ tool embeds paths to a special variant of the C compiler used when Apple build their own packages. This is different to the C compiler which we as users can use when we install the Xcode command line tools. More specifically, the C compiler from the Xcode command line tools is installed in a different location to what ‘apxs’ expects and so it fails. A similar problem exists with ‘libtool’.

This issue with ‘apxs’ and ‘libtool’ being broken has been present for a number of MacOS X versions now and Apple seems to have no interest in fixing it. To get around the problem the ‘configure’ script of mod_wsgi creates copies of the original ‘apxs’ and ‘libtool’ programs and fixes them up so correct paths are used. This is the reason why local versions of those tools are used.

With the build of mod_wsgi now complete we just need to install it by running ‘sudo make install’. The result of this should be that the compiled ‘mod_wsgi.so’ module will be installed into the Apache httpd server installation modules directory. Because though of the System Integrity Protection feature mentioned above, this isn’t what now occurs. Instead the installation fails.

$ sudo make install
Password:
./apxs -i -S LIBEXECDIR=/usr/libexec/apache2 -n 'mod_wsgi' src/server/mod_wsgi.la
/usr/share/httpd/build/instdso.sh SH_LIBTOOL='./libtool' src/server/mod_wsgi.la /usr/libexec/apache2
./libtool --mode=install install src/server/mod_wsgi.la /usr/libexec/apache2/
libtool: install: install src/server/.libs/mod_wsgi.so /usr/libexec/apache2/mod_wsgi.so
install: /usr/libexec/apache2/mod_wsgi.so: Operation not permitted
apxs:Error: Command failed with rc=4653056
.
make: *** [install] Error 1

The rather obscure error message we get when this fails is ‘Operation not permitted’. This doesn’t exactly tell us a lot and is mighty confusing to anyone installing mod_wsgi, or any other Apache module.

The reason we get this error is that the System Integrity Protection feature means that even when running as root, it is no longer possible to copy new files into certain system directories on MacOS X. This is meant in part to protect the operating system directories from being messed up by a user, but means we are now prohibited from installing additional Apache httpd server modules into the standard modules directory of ‘/usr/libexec/apache2’.

Creating a separate modules directory

There are a few solutions to the problem that the System Integrity Protection feature causes.

Since it is the cause of the problem, you might think about disabling the System Integrity Protection feature. Although that sounds great, you really really really do not want to do this. This is part of the feature set that MacOS X uses to protect your system from malware, so disabling it is a bad idea. Do not go there nor even contemplate doing so.

The quickest solution therefore is to install the compiled ‘mod_wsgi.so’ module in a different location that we can write to and setup the Apache httpd server to reference it from that location. To do that we need only override the location using the ‘make’ variable ‘LIBEXECDIR’ when we run ‘sudo make install’. For this example we will use the directory ‘/usr/local/httpd/modules’ instead of the default on MacOS X of ‘/usr/libexec/apache2’.

$ sudo make install LIBEXECDIR=/usr/local/httpd/modules
Password:
mkdir -p /usr/local/httpd/modules
./apxs -i -S LIBEXECDIR=/usr/local/httpd/modules -n 'mod_wsgi' src/server/mod_wsgi.la
/usr/share/httpd/build/instdso.sh SH_LIBTOOL='./libtool' src/server/mod_wsgi.la /usr/local/httpd/modules
./libtool --mode=install install src/server/mod_wsgi.la /usr/local/httpd/modules/
libtool: install: install src/server/.libs/mod_wsgi.so /usr/local/httpd/modules/mod_wsgi.so
libtool: install: install src/server/.libs/mod_wsgi.lai /usr/local/httpd/modules/mod_wsgi.la
libtool: install: install src/server/.libs/mod_wsgi.a /usr/local/httpd/modules/mod_wsgi.a
libtool: install: chmod 644 /usr/local/httpd/modules/mod_wsgi.a
libtool: install: ranlib /usr/local/httpd/modules/mod_wsgi.a
libtool: install: warning: remember to run `libtool --finish /usr/libexec/apache2'
chmod 755 /usr/local/httpd/modules/mod_wsgi.so

Although the output from running this command shows a warning about running ‘libtool --finish’ you can ignore it. To be honest I am not actually sure how it even still knows about the directory ‘/usr/libexec/apache2’, but for MacOS X everything still works without doing that step.

With the mod_wsgi module installed, in the Apache httpd server configuration file you would then use:

LoadModule wsgi_module /usr/local/httpd/modules/mod_wsgi.so

rather than the normal:

LoadModule wsgi_module libexec/apache2/mod_wsgi.so

This gets us beyond the System Integrity Protection problem caused by using MacOS X El Capitan. You would then configure and set up the Apache httpd server and mod_wsgi for your specific WSGI application in the same way as you normally would.

Using mod_wsgi during development

Do note that if you only want to run mod_wsgi during development, and especially if only on a non privileged port instead of the standard port 80, you are better off installing and using mod_wsgi-express.

The benefit of using mod_wsgi-express is that it is easier to install, and gives you a command line program for starting it up, with the Apache httpd server and mod_wsgi automatically configured for you.

To install mod_wsgi-express on MacOS X you still need to ensure you have installed the Xcode command line tools as explained above, but once you have done that it is a simple matter of running:

pip install mod_wsgi

Rather than the mod_wsgi module being installed into your Apache httpd server installation, the module and ‘mod_wsgi-express’ program will be installed into your Python installation. Hosting your WSGI application with mod_wsgi can then be as simple as running:

mod_wsgi-express start-server hello.wsgi

That mod_wsgi-express installs into your Python installation makes it very easy to use mod_wsgi with different Python installations, be they different Python versions or Python virtual environments, at the same time. You can therefore much more readily run mod_wsgi for both Python 2 and Python 3 on the same system. Each mod_wsgi-express instance is distinct and would need to run on different ports, but you can if need be use your main Apache httpd server installation as a proxy in front of these if needing to make both available on the standard port 80 at the same time.

For more information on mod_wsgi-express check out the documentation on PyPi for the mod_wsgi package or read the blog post where it was introduced. I have also posted here about proxying to instances of mod_wsgi-express as well.

 

Friday, April 8, 2016

How are you using Docker in your development workflow?

If you have been reading my blog posts rather than simply flicking them out of your news feed as a bit of noise, you will know that I have been working on a project which aims to make the deployment of Python web applications easier. I wrote a bit about this in the post titled 'Building a better user experience for deploying Python web applications’.

That post got a surprising number of reads, many more than I would normally expect to see in such a short period of time. There definitely therefore seems to be a lot of interest in the topic.

For what I am developing I am targeting a local user development workflow where you work directly on your own computer, as well as then deploying direct to some host. At the same time though, I am providing a way to ease the transition to bundling up your application as a Docker image so that it can then be run using a Docker hosting service or a more comprehensive container application platform or PaaS.

In the workflow I am creating I am allowing for the ability to also iterate on changes to your code base while it is running inside of a Docker container. This is so that where your local development system is a different operating system to where it is deployed, you can still easily debug your code for the target system.

Personally I feel that most people will still likely develop on their own local system first, rather than doing development exclusively within Docker in some way.

Although this is my view, I am very much interested in how others see Docker fitting into the development workflow when implementing Python web applications. So I would like to hear your feedback so I can factor in what people are actually doing, or what they want to be able to do, within the system I am creating.

Luckily my colleague, the awesome Steve Pousty, has recently hosted up a survey asking similar questions about use of Docker in development.

It would help me immensely in what I am working on for Python if you could respond to Steve’s survey as then I can see what I can learn from the results as well. The survey is only short and should take as little as five minutes to fill in. You can take longer of course if you want to provide additional feedback on top of the short list of multiple choice questions.

When done Steve will be collating and making available the results, so it should be an interesting set of results for anyone working in this space.

You can see Steve’s original blog post about the survey at:

* Input Request: How Do You Use Docker Containers For Your Local Development?

The survey itself you can find over on Survey Monkey at:

https://www.surveymonkey.com/r/dockerdev

If you fill in the survey, make sure you mark Python in the languages you are using so I know what responses may be extra relevant to me. I would also be interested to know what Python WSGI server you are using, or whether you are using some ASYNC Python web server. So add that as extra information at the end of the survey.

In the system I am developing, I am trying to cater for all the main WSGI servers in use (gunicorn, mod_wsgi-express, uWSGI, Waitress), as well as providing ways of also running up other servers based on the ASYNC model. Knowing what servers you are using will therefore help me understand what else I should be supporting in the workflow.

Looking forward to any comments you have. Thanks.

Thursday, April 7, 2016

Learning more about using OpenShift 3.

I still have a long list of topics I could post about here on my own blog site, but over the last couple of months or so, I have been having too much fun playing with the new version of OpenShift based on Docker and Kubernetes, and understanding everything about it. The more I dig into OpenShift, the more awesome it gets as far as being the best platform around for deploying applications in our new containerised world.

A platform alone isn’t though going to give you everything you may need to provide you with the best experience for working with a particular programming language, such as Python, but this is where I am working on my own magic secret source to make everything even easier for us Python developers. I described a bit about what I was doing around improving the deployment experience for Python web applications in a prior blog post. I am going to start ramping up soon on writing all the documentation for the packages I have been working on and hope to have more to show by the time of PyCon US.

In the interim, if you are interested in OpenShift and some of the things I have been looking at and uncovering, I have been posting over on the OpenShift blog site. The blog posts which I have posted up there over the past month and a bit are:

  • Using persistent volumes with docker as a Developer on OpenShift - This explains about how to use persistent volumes with OpenShift. This is an area where OpenShift goes beyond what is possible with hosting environments which only support 12 factor or cloud native applications. That is, not only can you host up web applications, you can run applications such as databases, which require access to persistent file system based storage
  • Using an Image Source to reduce build times - This post is actually an extension to a post I did here on my own site about improving Docker build times for Python applications. In this post I show how one can include the sped up build mechanism using a Python wheelhouse within the OpenShift build and deployment infrastructure.
  • Using a generic webhook to trigger builds - This describes how to use generic web hooks to trigger a build and deployment within OpenShift. This can easily be done when using GitHub to host your web application code, but in this case I wanted to trigger the build and deployment upon the successful completion of a test when using Travis CI, rather than straight away when code was pushed up to GitHub. This necessitated implementing a web hook proxy and I show how that was done.
  • Working with OpenShift configurations - Finally, this post provides a cheat sheet for where to quickly find information about what different configuration objects in OpenShift are all about and what settings they provide.

I will be posting about more OpenShift topics on the OpenShift blog in the future, so if you are at all interested in where the next generation of Platform as a Service (PaaS), or container application platforms, are headed, ensure you follow that blog.

If you are attending PyCon US this year in Portland Oregon, you also have the opportunity to learn more about OpenShift 3. This year I will be presenting a workshop titled 'Docker, Kubernetes, and OpenShift: Python Containers for the Real World’. This is a free workshop. All you need to do if you are already attending PyCon US, is to go back to your registration details on the PyCon US web site and go into the page for adding tutorials or workshops. You will find this workshop listed there and you can add it. Repeating again, attending the workshop will not cost you anything extra, so if you are in Portland early for PyCon US then come along. I will be talking about what OpenShift is, and how it uses Docker and Kubernetes. I will also be demonstrating the deployment of a Django based web application along with a database. This will most likely be using the Wagtail CMS if you are a fan of that. Hope to see you there.

Wednesday, March 2, 2016

Speeding up Docker build times for Python applications.

I recently wrote a post where I talked about building a better user experience for deploying Python web applications. If one counts page hits as an indicator of interest in a subject then it certainly seems like an area people would like to see improvements.

In that post I talked about a system I was working on which simplified starting up a Python web server for your web application in your local environment, but also then how you can easily move to deploying that Python web application to Docker or OpenShift 3.

In moving to Docker, or OpenShift 3 (which internally also uses Docker), the beauty of the system I described was that you didn’t have to know how to create a Docker image yourself. Instead I used a package called S2I (Source to Image) to construct the Docker image for you.

What S2I does is use a Docker base image which incorporates all the system packages and the language run time environment you need for working in a specific programming language such as Python. That same Docker image also includes a special script which is run to incorporate your web application code into a new Docker image which builds off the base image. A further script within the image starts up an appropriate web server to run your web application. In the typical case, you don’t need to know anything at all about how to configure the web server as everything is done for you.

Docker build times

A problem that can arise any time you use Docker, unless you are careful, is how long it takes to actually perform the build of the Docker image for your web application. If you are making constant changes but need to rebuild the Docker image each time to test it, or redeploy it into a live environment, you could end up waiting quite a long time over the period of your work day. Decreasing the time it takes to build the Docker image can therefore be important.

The general approach usually followed is to very carefully craft your ‘Dockerfile’ so that it uses multiple layers, where the incorporation of parts which change most frequently are done last. By doing this, the fact that Docker will cache layers and start rebuilding only at the first layer changed, means you can avoid rebuilding everything every time.

This approach does break down though in various ways, especially with Python. The use of S2I can also complicate matters because it aims to construct the final image incorporating your application code, as well as all the dependent packages required by your application in a single Docker layer.

One issue with Python is the use of a ‘requirements.txt’ file and ‘pip’ to install packages. If you need to install a lot of packages and you add a single new package to the list, then all of them have to be reinstalled. Further, if those packages are being installed in the same layer as when your application code is being incorporated, as is the case with S2I, then a change to the application code causes all the packages to also be reinstalled.

So although S2I provides a really simple and clean way of constructing Docker images without you yourself needing to know how to create them, long build times are obviously not ideal.

As an example of how long a build time can be, consider the creation of a Docker image for hosting a Wagtail CMS site using Django. The ‘requirements.txt’ file in this case contains only:

Django>=1.9,<1.10
wagtail==1.3.1
psycopg2==2.6.1

Although this isn’t all that is installed. The complete list of packages which gets installed are:

beautifulsoup4==4.4.1
Django==1.9.2
django-appconf==1.0.1
django-compressor==2.0
django-modelcluster==1.1
django-taggit==0.18.0
django-treebeard==3.0
djangorestframework==3.3.2
html5lib==0.9999999
Pillow==3.1.1
psycopg2==2.6.1
pytz==2015.7
rcssmin==1.0.6
rjsmin==1.0.12
six==1.10.0
Unidecode==0.4.19
wagtail==1.3.1
wheel==0.29.0
Willow==0.2.2

Using my ‘warpdrive’ script from the previous blog post I referenced, it can take over 5 minutes over my slow Internet connection to bring down all the required Python packages, build them and construct the image.

(warpdrive+wagtail-demo-site) $ time warpdrive image wagtail
I0301 22:01:01.374459 16060 install.go:236] Using "assemble" installed from "image:///usr/local/s2i/bin/assemble"
I0301 22:01:01.374643 16060 install.go:236] Using "run" installed from "image:///usr/local/s2i/bin/run"
I0301 22:01:01.374674 16060 install.go:236] Using "save-artifacts" installed from "image:///usr/local/s2i/bin/save-artifacts"
---> Installing application source
---> Building application from source
-----> Installing dependencies with pip (requirements.txt)
Collecting Django<1.10,>=1.9 (from -r requirements.txt (line 1))
Downloading Django-1.9.2-py2.py3-none-any.whl (6.6MB)
Collecting wagtail==1.3.1 (from -r requirements.txt (line 2))
Downloading wagtail-1.3.1-py2.py3-none-any.whl (9.0MB)
Collecting psycopg2==2.6.1 (from -r requirements.txt (line 3))
Downloading psycopg2-2.6.1.tar.gz (371kB)
...
Installing collected packages: Django, djangorestframework, Unidecode, Pillow, rcssmin, rjsmin, six, django-appconf, django-compressor, Willow, html5lib, django-taggit, pytz, django-modelcluster, beautifulsoup4, django-treebeard, wagtail, psycopg2
...
Running setup.py install for psycopg2: finished with status 'done'
Successfully installed Django-1.9.2 Pillow-3.1.1 Unidecode-0.4.19 Willow-0.2.2 beautifulsoup4-4.4.1 django-appconf-1.0.1 django-compressor-2.0 django-modelcluster-1.1 django-taggit-0.18.0 django-treebeard-3.0 djangorestframework-3.3.2 html5lib-0.9999999 psycopg2-2.6.1 pytz-2015.7 rcssmin-1.0.6 rjsmin-1.0.12 six-1.10.0 wagtail-1.3.1
-----> Collecting static files for Django
...
Copying '/opt/warpdrive/demo/static/js/demo.js'
...
Copying '/usr/local/python/lib/python2.7/site-packages/django/contrib/admin/static/admin/img/gis/move_vertex_off.svg'
179 static files copied to '/home/warpdrive/django_static_root'.
---> Fix permissions on application source
real 5m40.780s
user 0m0.850s
sys 0m0.115s

If you were running ‘pip’ in your local environment and installing into a Python virtual environment, rerunning ‘pip’ on the ‘requirements.txt’ wouldn't be a big issue. This is because the packages would already be detected as being installed and so wouldn’t need to be downloaded and installed again. Even if you did blow away your Python virtual environment and recreate it, the downloaded packages would be in the cache that ‘pip’ maintains in your home directory. It could therefore just use those.

When creating Docker images however, you don’t get the benefit of those caching mechanisms because everything is done over every time. This means that all the packages have to be downloaded every time.

Using a wheelhouse

A possible solution to this is to create a wheelhouse. That is, you use ‘pip’ to create a directory of the packages you need to install as Python wheels. For pure Python packages these would just be that code, but if a Python package included C extensions, the Python wheel file would include the compiled code as object files. This means that the code doesn’t need to be recompiled every time and can simply be copied into place.

Although this can be done, working this into how you build your Docker images can get a bit messy as shown by Glyph in a blog post he wrote about it. It is therefore an area which is ripe for being simplified and so I have also been working that into what I have been doing with trying to simplify the deployment of web applications. In this post I want to show how that is progressing.

First step now is to create a special Docker image which acts as our Python wheelhouse. This can be done by running the following command.

(warpdrive+wagtail-demo-site) $ warpdrive image --build-target wheelhouse wagtail-wheelhouse
I0301 23:03:32.687290 17126 install.go:236] Using "assemble" installed from "image:///usr/local/s2i/bin/assemble"
I0301 23:03:32.687446 17126 install.go:236] Using "run" installed from "image:///usr/local/s2i/bin/run"
I0301 23:03:32.687475 17126 install.go:236] Using "save-artifacts" installed from "image:///usr/local/s2i/bin/save-artifacts"
I0301 23:03:32.709215 17126 docker.go:286] Image "wagtail-wheelhouse:latest" not available locally, pulling ...
---> Installing application source
---> Building Python wheels for packages
-----> Installing dependencies as wheels with pip (requirements.txt)
Collecting Django<1.10,>=1.9 (from -r requirements.txt (line 1))
Downloading Django-1.9.2-py2.py3-none-any.whl (6.6MB)
Saved ./.warpdrive/wheelhouse/Django-1.9.2-py2.py3-none-any.whl
Collecting wagtail==1.3.1 (from -r requirements.txt (line 2))
Downloading wagtail-1.3.1-py2.py3-none-any.whl (9.0MB)
Saved ./.warpdrive/wheelhouse/wagtail-1.3.1-py2.py3-none-any.whl
Collecting psycopg2==2.6.1 (from -r requirements.txt (line 3))
Downloading psycopg2-2.6.1.tar.gz (371kB)
...
---> Fix permissions on application source

This command is going to run a bit differently to the command above. Rather than use ‘pip install’ to install the actual packages, it will run ‘pip wheel’ to create the Python wheels we are after. At that point it will stop, as we don’t need it do additional steps such as run ‘collectstatic’ for Django to gather up static file assets. This will still take up to 5 minutes though since the bulk of the time was involved in downloading and building the packages.

Once we have our wheelhouse, when building the Docker image for our application, we can point it at the wheelhouse as a source for the prebuilt Python packages we want to install. We can even tell it to take what it provides as the authority and not consult the Python package index (PyPi) to check whether there aren’t newer versions of the packages when packages haven’t been pinned to a specific package.

(warpdrive+wagtail-demo-site) $ time warpdrive image --wheelhouse wagtail-wheelhouse --no-index wagtail
warpdrive-image-17312
I0301 23:12:54.610882 17329 install.go:236] Using "assemble" installed from "image:///usr/local/s2i/bin/assemble"
I0301 23:12:54.611033 17329 install.go:236] Using "run" installed from "image:///usr/local/s2i/bin/run"
I0301 23:12:54.611089 17329 install.go:236] Using "save-artifacts" installed from "image:///usr/local/s2i/bin/save-artifacts"
---> Installing application source
---> Building application from source
-----> Found Python wheelhouse of packages
-----> Installing dependencies with pip (requirements.txt)
Collecting Django<1.10,>=1.9 (from -r requirements.txt (line 1))
Collecting wagtail==1.3.1 (from -r requirements.txt (line 2))
Collecting psycopg2==2.6.1 (from -r requirements.txt (line 3))
...Installing collected packages: Django, Unidecode, pytz, django-modelcluster, djangorestframework, Pillow, django-treebeard, django-taggit, six, Willow, rjsmin, django-appconf, rcssmin, django-compressor, beautifulsoup4, html5lib, wagtail, psycopg2
Successfully installed Django-1.9.2 Pillow-3.1.1 Unidecode-0.4.19 Willow-0.2.2 beautifulsoup4-4.4.1 django-appconf-1.0.1 django-compressor-2.0 django-modelcluster-1.1 django-taggit-0.18.0 django-treebeard-3.0 djangorestframework-3.3.2 html5lib-0.9999999 psycopg2-2.6.1 pytz-2015.7 rcssmin-1.0.6 rjsmin-1.0.12 six-1.10.0 wagtail-1.3.1
-----> Collecting static files for Django
...
Copying '/opt/warpdrive/demo/static/js/demo.js'
...
Copying '/usr/local/python/lib/python2.7/site-packages/django/contrib/admin/static/admin/img/gis/move_vertex_off.svg'
179 static files copied to '/home/warpdrive/django_static_root'.
---> Fix permissions on application source
real 0m45.859s
user 0m3.555s
sys 0m2.575s

With our wheelhouse, building of the Docker image for our web application has dropped from over 5 minutes down to less than a minute. This is because when installing the Python packages, it is able to reuse the pre built packages from the wheelhouse. This means a quicker turnaround for creating a new application image. We will only need to rebuild the wheelhouse itself if we change what packages we need to have installed.

Incremental builds

Reuse therefore allows us to speed up the building of Docker images considerably where we have a lot of Python packages that need to be installed. The reuse of previous builds can also be used in another way, which is to reuse the prior wheelhouse itself when updating the wheelhouse after changes to the list of packages we need.

(warpdrive+wagtail-demo-site) $ time warpdrive image --build-target wheelhouse wagtail-wheelhouse
I0301 23:18:24.150533 17448 install.go:236] Using "assemble" installed from "image:///usr/local/s2i/bin/assemble"
I0301 23:18:24.151074 17448 install.go:236] Using "run" installed from "image:///usr/local/s2i/bin/run"
I0301 23:18:24.151121 17448 install.go:236] Using "save-artifacts" installed from "image:///usr/local/s2i/bin/save-artifacts"
---> Restoring wheelhouse from prior build
---> Installing application source
---> Building Python wheels for packages
-----> Installing dependencies as wheels with pip (requirements.txt)
Collecting Django<1.10,>=1.9 (from -r requirements.txt (line 1))
File was already downloaded /opt/warpdrive/.warpdrive/wheelhouse/Django-1.9.2-py2.py3-none-any.whl
Collecting wagtail==1.3.1 (from -r requirements.txt (line 2))
File was already downloaded /opt/warpdrive/.warpdrive/wheelhouse/wagtail-1.3.1-py2.py3-none-any.whl
Collecting psycopg2==2.6.1 (from -r requirements.txt (line 3))
Using cached psycopg2-2.6.1.tar.gz
...
---> Fix permissions on application source
real 1m17.180s
user 0m3.653s
sys 0m2.316s

Here we have run the exact same command as we ran before to create the wheelhouse in the first place, but instead of taking 5 minutes to build, it has taken just over 1 minute.

This speed up was achieved because we were able to copy across the ‘pip’ cache as well as the directory of Python wheel files from the previous instance of the wheelhouse.

Not a Dockerfile in sight

Now what you didn’t see here at all was a ‘Dockerfile’. For me this is a good thing.

The problem with Docker right now is that the novelty still hasn’t warn off, with it still not being seen for what it is, just another tool we can use. As a result we are still in this phase where developers using Docker like to play with it and so try and do everything themselves from scratch. We need to get beyond that phase and start incorporating best practices into canned scripts and systems and simply get on with using it.

Anyway, this is where I am at least heading with the work I am doing. That is, encapsulate all the best practices for Python web application deployment, including the building of Docker images which you can run directly, or with a PaaS using Docker such as OpenShift. The aim here being to make it so much easier for you, with you knowing that you can trust that the mechanisms have been put together will all the best practices being followed. After all, do you really want to keep reinventing the wheel all the time?

Wednesday, February 24, 2016

A walkthrough of using OpenShift 3.

Since starting with Red Hat on the OpenShift project, I have written various blogs posts here on my own site but they were mainly related to Docker. They still had some relevance to OpenShift as they talked about how to construct Docker images properly so that they will work under the more stringent security requirements imposed by a multi tenant hosting service using Docker, such as OpenShift. What I haven’t tackled head on is what it is like to use OpenShift, what role it is providing and why the Python community would be interested in it.

On that front, Grant Shipley, who is also on the OpenShift team, has just saved me a whole lot of work by posting a really good video walk through of using OpenShift 3. The blog post introducing the video can be found on the OpenShift blog, but I have embedded the video here for quick viewing as well. If at all interested in deploying web applications to a PaaS like environment, it is well worth a watch to understand, from a developer perspective, where OpenShift is headed, how it can very simply be used to host your web applications using provided images, or how you can run your own Docker images.

If you want to play around with OpenShift yourself, the easiest way is to use the All-In-One VM image for Vagrant.

The All-In-One image is what Grant uses in the video and allows you to run up OpenShift on your own laptop or desktop PC. The image is based on the Open Source upstream project for Red Hat’s own product. The upstream project is called OpenShift Origin.

If you like what you see with OpenShift and want to experiment further on some real hosts, you can install OpenShift Origin yourself onto your own physical infrastructure or using an IaaS provider using an easy to run Ansible script.

Being the upstream project, OpenShift Origin is the community supported variant of OpenShift. If you want to run OpenShift and are after all the support that comes with using a Red Hat product, including Red Hat being the one place to call for all the issues you may experience with the product, then you have a couple of options at present.

The first is OpenShift Enterprise. This supported variant of OpenShift can also be installed on your own physical infrastructure or using an IaaS provider. Rather not install and manage it yourself and instead have Red Hat look after it for you, the current option is OpenShift Dedicated. This provides you with your own OpenShift environment running on an IaaS provider, but Red Hat will install it and look after it. You still don’t share OpenShift with anyone else with this option so can use the full resources however you want.

The option which I know many who might be reading this in the Python community are going to be more interested in is OpenShift Online. Unfortunately the current OpenShift online is still using the prior OpenShift 2 and not OpenShift 3. It therefore still hasn’t switched to using Docker and Kubernetes as yet.

OpenShift Online is definitely coming though. Creating a full on public PaaS which is multi tenant and provides the security and performance that users expect is no simple undertaking and Red Hat wants to get it right. OpenShift Online therefore needs a bit more time baking in the oven before it is going to be ready.

All the complexities in creating a PaaS is something that I have only myself come to better appreciate after having seeing the effort going into OpenShift 3. If you have tried to create a DIY PaaS on top of Docker or a so called container as a service (CaaS) product, you will no doubt be aware of some of the traps and pitfalls with trying to do it yourself. Even if you are able to get something working you will find that a DIY PaaS comes with a high maintenance burden. This is a large part of what OpenShift is about, it takes away from you all the effort of understanding how to run a PaaS well and to do it securely.

So if interested in where PaaS environments are headed, I definitely recommend trying the All-In-One VM for OpenShift and watching Grant’s video. With the All-In-One VM available so you can try things yourself, I will also be starting to post more about using Python web applications with OpenShift.

If anyone does have any specific questions about hosting Python web applications with OpenShift, do let me know and perhaps it can be a good subject for a future post. Easiest thing to do is to drop me a message on Twitter (@GrahamDumpleton) with any suggestions.

Thursday, February 18, 2016

Building a better user experience for deploying Python web applications.

Yet again I missed out on a getting a talk into PyCon US. The title of my proposed talk was the same as this blog post. Since it wasn’t accepted, I thought I might instead use a blog post to give a sneak peek at some of the more recent work I have been doing on Python web application deployment, which I otherwise would have described a bit about in my talk if it had been accepted.

For those who may have been following what I have been doing in the past with creating and supplying Docker images for running Python web applications using Apache and mod_wsgi, this is a progression of that work, expanding on the scope and making it usable beyond Docker containers.

Demonstration using Django

To illustrate what it is all about a simple demonstration is in order. For that lets create a new Django web application project and get it running.

$ django-admin startproject mydjangosite
$ cd mydjangosite/
$ python manage.py runserver
Performing system checks...
System check identified no issues (0 silenced).
You have unapplied migrations; your app may not work properly until they are applied.
Run 'python manage.py migrate' to apply them.
February 18, 2016 - 01:22:25
Django version 1.9.2, using settings 'mydjangosite.settings'
Starting development server at http://127.0.0.1:8000/
Quit the server with CONTROL-C.

Nothing special here and if we go to the URL ‘http://127.0.0.1:8000/admin' we will be presented with the login page for the Django admin interface. As we are using the builtin Django development server the styling for the login page will look correct as the development server automatically worries about static file assets such as style sheets.

As you should all hopefully know, the Django development server should not be used for a production system. For development though at least, the development server can be handy due to the fact that it does handle static file assets and also offers automatic code reloading. Use of the development server can though hide certain problems that will only occur in a production environment where a multi process and/or multi threaded configuration may be used.

Setting up a production grade web server is often viewed as being a lot of trouble and people can struggle with it. Lets therefore see if we can make that a bit easier.

Simplified web application wrapper

In order to create that Django application above I first needed to have Django installed. This was simply so that the ‘django-admin’ program was available. Imagine though that I didn’t need that as I had created the project skeleton by hand, or had checked out an existing Django project repository. To emphasise this, lets use ‘virtualenvwrapper’ to create a fresh Python virtual environment. Into this I am going to install a single Python package called ‘warpdrive’.

$ mkvirtualenv warpdrive
New python executable in /Users/graham/Python/warpdrive/bin/python
Installing setuptools, pip, wheel...done.
virtualenvwrapper.user_scripts creating /Users/graham/Python/warpdrive/bin/predeactivate
virtualenvwrapper.user_scripts creating /Users/graham/Python/warpdrive/bin/postdeactivate
virtualenvwrapper.user_scripts creating /Users/graham/Python/warpdrive/bin/preactivate
virtualenvwrapper.user_scripts creating /Users/graham/Python/warpdrive/bin/postactivate
virtualenvwrapper.user_scripts creating /Users/graham/Python/warpdrive/bin/get_env_details

(warpdrive) $ pip install warpdrive
Collecting warpdrive
Installing collected packages: warpdrive
Successfully installed warpdrive-0.14.6

We will still need Django, but we definitely don’t want to install that manually as an argument to ‘pip’ on the command line. Instead we should list any such required Python packages in a ‘requirements.txt’ file for ‘pip’. We will therefore create a ‘requirements.txt’ file listing only ‘Django’ in it.

Even now I still don’t really want to run ‘pip’ by hand as setting up a Python web application project so it can be run is often more than just installing required Python packages. For example, when using Django with a production grade WSGI server, it would generally be necessary to run ‘python manage.py collectstatic’. These are steps though that can be forgotten by users. A better approach would be to automate such steps, and where necessary, record such manual steps in special build scripts that would be automatically run when setting up the environment for running a Python web application. This is where ‘warpdrive’ comes into play.

Now although I create a Python virtual environment and installed ‘warpdrive’, that was purely so that I had ‘warpdrive’ installed and show that otherwise I had an empty Python installation.

What I am now going to do is build a separate Python virtual environment for this specific Python web application, but have ‘warpdrive’ create it and set it up for me.

(warpdrive) $ eval "$(warpdrive activate mydjangosite)" 
(warpdrive+mydjangosite) $ warpdrive build
-----> Installing dependencies with pip
Collecting Django (from -r requirements.txt (line 1))
Downloading Django-1.9.2-py2.py3-none-any.whl (6.6MB)
100% |████████████████████████████████| 6.6MB 1.4MB/s
Installing collected packages: Django
Successfully installed Django-1.9.2
Collecting mod-wsgi
Installing collected packages: mod-wsgi
Successfully installed mod-wsgi-4.4.22
-----> Collecting static files for Django
Copying ‘.../django/contrib/admin/static/admin/css/base.css’
...

56 static files copied to '/Users/graham/.warpdrive/warpdrive+mydjangosite/home/django_static_root'.

The first step here was to use ‘warpdrive activate’ to create a fresh Python virtual environment and use it for the current shell. The second step was to use ‘warpdrive build’ to setup our environment.

The ‘warpdrive build’ command is doing a few things here, but the main things are that it installed all Python packages listed in the ‘requirements.txt’ file, installed ‘mod_wsgi-express’ and finally ran ‘python manage.py collectstatic’.

You may note that we didn’t actually specify that the Django management command ‘collectstatic’ should be executed. This is because ‘warpdrive’ itself knows about various ways that Python web applications may be launched, including special support for detecting when you are running a Django web application. Knowing that you are using Django it will automatically run ‘collectstatic’ for you.

The keen eyed may even notice that we didn’t modify the Django settings module and specify ‘STATIC_ROOT’ so that ‘collectstatic’ knew where to copy static file assets. Again this is the smarts of ‘warpdrive’ kicking in, with it realising that it wasn’t defined and supplying its own value of ‘STATIC_ROOT’ instead when ‘collectstatic' is run.

As you go along and make changes to static file assets or modify the ‘requirements.txt’ file, you simply need to re-run ‘warpdrive build’ to refresh the current environment.

With the environment for the web application built, we can now start it up. To do this we are going to use ‘warpdrive start'.

(warpdrive+mydjangosite) $ warpdrive start
-----> Configuring for server type of auto
-----> Running server script start-mod_wsgi
-----> Executing server command ' mod_wsgi-express start-server --log-to-terminal --startup-log --port 8080 --application-type module --entry-point mydjangosite.wsgi --callable-object application --url-alias /static/ /Users/graham/.warpdrive/warpdrive+mydjangosite/home/django_static_root/'
Server URL : http://localhost:8080/
Server Root : /tmp/mod_wsgi-localhost:8080:502
Server Conf : /tmp/mod_wsgi-localhost:8080:502/httpd.conf
Error Log File : /dev/stderr (warn)
Startup Log File : /dev/stderr
Request Capacity : 5 (1 process * 5 threads)
Request Timeout : 60 (seconds)
Queue Backlog : 100 (connections)
Queue Timeout : 45 (seconds)
Server Capacity : 20 (event/worker), 20 (prefork)
Server Backlog : 500 (connections)
Locale Setting : en_AU.UTF-8
[Thu Feb 18 12:58:37.279748 2016] [mpm_prefork:notice] [pid 9456] AH00163: Apache/2.4.16 (Unix) mod_wsgi/4.4.22 Python/2.7.10 configured -- resuming normal operations
[Thu Feb 18 12:58:37.280085 2016] [core:notice] [pid 9456] AH00094: Command line: 'httpd (mod_wsgi-express) -f /tmp/mod_wsgi-localhost:8080:502/httpd.conf -E /dev/stderr -D FOREGROUND'

Unlike before, this time the Django development server is not being run. Instead ‘warpdrive’ is running ‘mod_wsgi-express’. In doing that it has automatically determined from the Django application itself what the WSGI application entry point is, where static files assets are mounted, as well as determine where the static file assets are located. Our style sheets for the Django admin page therefore work, even if you had forgot to set up ‘STATIC_ROOT’ in the Django settings file as ‘warpdrive’ would have detected that.

With no real extra work we have got ourselves a production grade WSGI server and can thus be more confident that we have something more comparable to when we really deploy our Django application. Notionally this could even be used as the basis of your production deployment and if it was, it means that your local environment is going to be as close as possible to the actual production platform.

As far as additional configuration or setup steps, ‘warpdrive’ supports various mechanisms for supplying hook scripts which can be executed as part of the build and deployment phases. This means you can capture setup steps and have them triggered on both a local environment and production where appropriate. Additional WSGI server options or environment variables can also be supplied to override or customise the configuration, such as tuning the number of processes and threads being used.

One specific environment variable relevant to local development is ‘MOD_WSGI_RELOAD_ON_CHANGES’. Define this when running ‘warpdrive start’ and you get back the automatic code reloading feature of the builtin Django development server, meaning you can just as readily use ‘warpdrive' during development also.

That cool kid called Docker

You may be saying, but I use Docker, so how is this going to help me.

This is no problem and ‘warpdrive’ actually grew out of all the work I have been doing with Docker. You could technically create your own Docker base image and provided it satisfies a few requirements around certain system packages being available, trigger ‘warpdrive build’  from your ‘Dockerfile’ and ‘warpdrive start’ from the ‘CMD’.

The easier path though would be to use Docker base images which I have created which already incorporate all the required base packages and integrate ‘warpdrive’ already.

Having to create Docker images yourself can still be a pain though, especially when doing it from scratch and you aren’t aware of all the traps and pitfalls in doing that.

To make it all easier, ‘warpdrive’ and the Docker base images I have are S2I enabled.

Most probably wouldn’t have heard of S2I, but what it stands for is ’Source to Image’. It is effectively the concept of build packs as implemented by some hosting services, but re-imagined and modernised to use Docker.

You can read more about Source to Image at:

Having already shown that my Django web application runs, all I now need to do to create a Docker image for it is to run ‘warpdrive s2i’.

(warpdrive+mydjangosite) $ warpdrive s2i
---> Installing application source
---> Building application from source
-----> Installing dependencies with pip
Collecting Django (from -r requirements.txt (line 1))
Downloading Django-1.9.2-py2.py3-none-any.whl (6.6MB)
Installing collected packages: Django
Successfully installed Django-1.9.2
-----> Collecting static files for Django
Copying ‘.../django/contrib/admin/static/admin/img/icon-yes.svg’
...

56 static files copied to '/home/warpdrive/django_static_root'.
---> Fix permissions on application source
(warpdrive+mydjangosite) $ docker images | grep mydjangosite
warpdrive-mydjangosite latest 8d7fd16f7ab8 20 seconds ago 819.6 MB

The result of this is a Docker image incorporating my Django web application and all it needs, called ‘warpdrive-mydjangosite'. As before, ‘collectstatic’ was automatically run as part of the build phase for the Docker image.

Running the Docker image is then just a matter of executing ‘docker run’ and exposing the appropriate port.

(warpdrive+mydjangosite) $ docker run -p 8080:8080 warpdrive-mydjangosite
---> Executing the start up script
-----> Configuring for server type of auto
-----> Running server script start-mod_wsgi
-----> Executing server command ' mod_wsgi-express start-server --log-to-terminal --startup-log --port 8080 --application-type module --entry-point mydjangosite.wsgi --callable-object application --url-alias /static/ /home/warpdrive/django_static_root/'
[Thu Feb 18 03:08:22.406961 2016] [mpm_event:notice] [pid 19:tid 139789921310464] AH00489: Apache/2.4.18 (Unix) mod_wsgi/4.4.22 Python/2.7.11 configured -- resuming normal operations
[Thu Feb 18 03:08:22.407345 2016] [core:notice] [pid 19:tid 139789921310464] AH00094: Command line: 'httpd (mod_wsgi-express) -f /tmp/mod_wsgi-localhost:8080:1001/httpd.conf -E /dev/stderr -D MOD_WSGI_MPM_ENABLE_EVENT_MODULE -D MOD_WSGI_MPM_EXISTS_EVENT_MODULE -D MOD_WSGI_MPM_EXISTS_WORKER_MODULE -D MOD_WSGI_MPM_EXISTS_PREFORK_MODULE -D FOREGROUND'

You can then test further your web application running in the context of Docker and if happy, push the Docker image up to your hosting platform and run it.

Deploying to OpenShift 3

If using the latest version of OpenShift based on Docker and Kubernetes deployment is even easier. This is because you don’t need to go through the separate step yourself of creating the Docker image and uploading it to a Docker registry. This is because OpenShift itself is aware of Source to Image and can deploy web applications direct from a Git repository.

To deploy this same application to OpenShift, all I would need to do is commit my changes and push them up to my Git repository and run:

(warpdrive+mydjangosite) $ oc new-app grahamdumpleton/warp0-debian8-python27~https://github.com/GrahamDumpleton/django-hello-world-v1.git
--> Found Docker image d148eec (8 hours old) from Docker Hub for "grahamdumpleton/warp0-debian8-python27"
Python 2.7 (Warp Drive)
-----------------------
S2I builder for Python web applications.
Tags: builder, python, python27, warpdrive, warpdrive-python27
* An image stream will be created as "warp0-debian8-python27:latest" that will track the source image
* A source build using source code from https://github.com/GrahamDumpleton/django-hello-world-v1.git will be created
* The resulting image will be pushed to image stream "django-hello-world-v1:latest"
* Every time "warp0-debian8-python27:latest" changes a new build will be triggered
* This image will be deployed in deployment config "django-hello-world-v1"
* Port 8080/tcp will be load balanced by service "django-hello-world-v1"
* Other containers can access this service through the hostname "django-hello-world-v1"
--> Creating resources with label app=django-hello-world-v1 ...
imagestream "django-hello-world-v1" created
buildconfig "django-hello-world-v1" created
deploymentconfig "django-hello-world-v1" created
service "django-hello-world-v1" created
(warpdrive+mydjangosite) $ oc expose service django-hello-world-v1
route "django-hello-world-v1" exposed

OpenShift will automatically download my Docker base image with S2I support as necessary, and the Git repository containing my application source code, trigger the S2I build process to create the final Docker image and then deploy it. We then just need to run one final step to actually make the web application publicly accessible and we are done.

Alternate PaaS providers

Could ‘warpdrive’ be used with other PaaS providers?

The answer there is yes, provided they don’t lock you out completely from the build and deployment phases, and don’t screw up the Python environment too much. I haven’t tweaked ‘warpdrive’ for this, and probably won't, but I have deployed previous iterations of all this work to OpenShift 2 and Heroku.

The end result is that we have the possibility here of having one deployment story that can work with multiple hosting services, but which can still also be used on your local development platform.

Alternate web servers

In our sample application we used Django, but if using an alternate WSGI framework you just need to supply a WSGI application entrypoint in a ‘wsgi.py’ file in the top directory of your project.

By default the ‘auto’ mode of ‘warpdrive’ will use ‘mod_wsgi-express’ to host any WSGI application, including Django specific applications. This is because ‘mod_wsgi-express’ was largely purpose built for this type of deployment setup. It is therefore the best option available.

The performance of most WSGI servers is more or less the same when configured properly. If you still wish to use a different WSGI server because the characteristics of that WSGI server better suit some unique requirement of your web application, you can still use that alternate WSGI server. To do this you just need to override the ‘auto’ mode and say what WSGI server you want to use.

Alternate WSGI servers which are supported are ‘gunicorn’, ‘uwsgi’ and ‘waitress’. When these are selected you just need to ensure that they are also listed in the ‘requirements.txt’ file for ‘pip’. So long as you do that, ‘warpdrive’ will start up that WSGI server for you instead, supplying a minimal set of options required to get them to listen on the correct port for HTTP connections and log to the terminal. Any other required options to ensure the WSGI server behaves properly inside of a Docker container will also be supplied if necessary.

As well as specifying any of these alternate WSGI servers, you can also specify explicitly that ‘mod_wsgi’ should be used. Do be aware though that overriding the deployment mechanism and not using ‘auto’, means that the configuration of the WSGI server is then entirely up to you. So if specifying ‘mod_wsgi’ or an alternate WSGI server explicitly, you would then need to tell it how to host your Django application, whereas with ‘auto’ mode that is all done for you.

For those who don’t want to actually use a WSGI server, but instead for example want to use the Tornado web server, you can instead supply an ‘app.py’ file. If this file exists then that will take precedence and ‘warpdrive’ will execute it as a Python script to run your Python web application. Your web application then just needs to listen on the appropriate HTTP port.

Need even more control over startup, you can also supply an ‘app.sh’ file and so as necessary easily preform any last minute steps or set special environment variables. The only requirement at this point is that the final command in the shell script to run the actual web application use ‘exec’ so that the web application replaces the shell process. This is to ensure signals works properly when things are run under Docker. You might use an ‘app.sh’ file for example when wishing to setup and run Jupyter Notebook.

Special knowledge for other Python web frameworks could also be added if they have a unique and commonly used method of deployment. For example, ‘warpdrive’ will also recognise a ‘paste.ini’ file as might be used by Paste based web applications and configure and launch ‘mod_wsgi-express’ to run it.

Should you use this?

Right now ‘warpdrive’ is my play thing.

Bringing new Open Source projects into the open and making them public is a dangerous exercise due to the demands that users put on developers of the projects.

So right now you probably don’t want to use it because I still want the flexibility to make any sorts of changes I want to how it works. Plus I don’t really want hoards of users pestering me with simple questions.

Will I ever say it is ready to use? Maybe, maybe not. That really depends on whether there is any interest. Surprisingly I have gotten a fait bit of push back from some quarters on this whole concept in the past. This may well be a vocal minority who think they already known how to do everything themselves, but such negative reactions aren’t always encouraging to the idea of declaring it public and usable.

If you are intrigued by what I have presented, think it has merit and might be something you would use, then at least follow me on Twitter (@GrahamDumpleton) and let me know on Twitter what you think. Thanks.

Monday, January 18, 2016

Automating deployment of Lektor blog sites.

Towards the end of last year, Armin Ronacher formally announced a new project of his called Lektor. Armin is one of those developers who when he creates some new piece of software, or publishes a blog post, you should alway pay attention. His knowledge and attention to detail is something everyone should aspire to. So ever since he announced Lektor I have been aiming to put aside some time to have a play with it. I am hoping one day I can stop using Blogger for my own blog site and use something like Lektor instead. That isn’t to say Lektor is only for blog sites, it can be used for any sites where ultimately you could host them as a plain static web site and don’t need to run a full on dynamic web site framework.

Although Lektor itself handles well the task of generating the static content for a web site and has some support for deploying the generated files, I am not too keen on any of the deployment options currently provided. I thought therefore I would have a bit of a play with automated deployment of a Lektor blog using the Open Source Source to Image project for Docker and also OpenShift. The goal I initially wanted to achieve was that simply by pushing completed changes to my blog site to a GitHub repository that my web site would be automatically updated. If that worked out okay, then the next step would be to work out how I could transition my existing blog posts off Blogger, including how to implement redirects if necessary so existing URLs would still work and map into any new URL naming convention for the new design.

Creating an initial site template

Lektor more than adequately covers the creation of an initial empty site template in its quick start guide so I will not cover that here.

My only grumble with that process was that it doesn’t like to have the directory you want to use in existence already. If you try and use an existing directory you get an error like:

Error: Could not create target folder: [Errno 17] File exists: ‘/Users/graham/Projects/lektor-empty-site'
Aborted!

For me this came about because I created a repository on GitHub first, made a local checkout and then tried to populate it such that everything was at the top level of the repository rather than in a sub directory. Lektor doesn’t like this. You therefore either have to create the project in a subdirectory and then move everything manually to the top level, or create the project first and then only do ‘git init’ and link it to the remote GitHub repository. If there is a way of using Lektor such that I could have populated the current directory rather than having to use a new directory, then do please let me know.

Having created the initial site template, we can then go in and make our modifications, using ‘lektor server’ on our local machine to view our updates as we make them.

Deploying static site content

When it comes to deploying a site created using Lektor, that is where you need to move beyond the inbuilt server it provides. Because Lektor generates purely static content, you don’t need a fancy dynamic web application server and any web server capable of hosting static files will do.

There are certainly any number of hosting services still around who will host a static web site for you, or you could use S3 or GitHub pages, but I wanted something which I had a bit more control over and visibility into when there is an issue. I also don’t really want to be pushing from my local machine direct to the hosting service either. I like the idea of a workflow where things go via the Git repository where the original files for the site are located. This would allow me to coordinate with others working on a site as well, using all the same sorts of workflows one would use for normal software development, such as branching, to handle working on and finally release of the content for the site.

For hosting of minimal hand crafted static sites I have in the past used the free tiers of some of the more popular platform as a service offerings (PaaS), but because these services have traditionally been biased towards dynamic web applications, that meant wrapping up the static content within a custom Python web application, using something like the WhiteNoise WSGI middleware to handle serving the static file content.

This works, but you aren’t using a proper web server designed for static file hosting, so it isn’t the best option for a more significant site which needs to handle a lot of traffic.

What could I do then if I want to use a proper web server such as Apache and nginx?

The problem in using a traditional PaaS is that in general they do not provide either Apache or nginx as part of their standard environment and they can make it very difficult to actually install it. Alternatively, they might use Apache, but because of a fixed configuration and no ability to change it, you can’t just drop static files in and have them appear at the URL you would prefer to have them.

Using a Docker based PaaS

Now these days, because of my work with Red Hat, I get a lot of opportunity to play with Docker and Red Hat’s newest incarnation of their PaaS offering. This is OpenShift 3 and it is a complete rewrite of the prior version of OpenShift as most would know it. In OpenShift 3 Docker is used, instead of a custom container solution, with Kubernetes handling scheduling of those containers.

Because OpenShift 3 is Docker based, this means one has much greater control over what you can deploy to a container. So where as with a traditional PaaS your options may have been limited, with OpenShift 3 and Docker, you can pretty well do what ever you want in the container and use whatever software or web server you want to use.

Given the ability to use Docker, I could therefore consider setting up a traditional Apache or nginx web server. If I were to go down that path there are even existing Docker images for both Apache and nginx on the Docker Hub registry for hosting static web sites.

The problem with using such existing Docker images though is that when using Lektor, you need to trigger a build step to generate the static files from the original source files. This requires having Lektor installed to run the build step, which also means having a working Python installation as well. These base images for Apache and nginx aren’t general purpose images though and are not going to have Python installed. As a result, the generation of the static files would need to be done using a separate system first before then somehow being combined with the base image.

The alternative is to start out with one of the web server images and create a new base image based on it which adds Python and Lektor. Conversely, you could start out with a base image for Python and then install Lektor and either Apache or nginx.

With a base image which then incorporated both the web server and Lektor, and a default ‘RUN’ action to start the web server, you could within the Lektor project for your blog site add a ‘Dockerfile’ which ran the ‘lektor build’ to generate the static content as part of the build for the Docker image.

No matter what route you take here, they all seem a bit fiddly and would still entail a fair bit of work to get some sort of automated workflow going around them.

Builds using Source to Image

As it turns out, an OpenSource project already exists which has done a lot of the work to build that workflow. It is the project called Source to Image (S2I).

If you are familiar with the concept of build packs or cartridges as they existed for traditional PaaS offerings, think of S2I as the Docker replacement for those.

The idea behind S2I is that you have a Docker image which defines what is called a builder. This is effectively a Docker base image that combines all the common stuff that would be required for deploying software for a given problem domain, for example Python web applications, along with a special ‘assemble’ script which takes your source files and combines them with the base image to create a new Docker image to be run as the actual application.

When combining the source files with the base image, if they are actual application code files, they might be compiled into an application executable, or if using a scripting language simply copied into place to be executed by an application web server. Alternatively, the source files could be some sort of data input files that are to be used directly by an application, or after some translation process has been done. In other words, you aren’t restricted to using S2I builders just to create a new application. Instead an S2I builder could be used to combine a SaaS (Software as a Service) like application with the data it needs to run.

What ever the purpose of the builder and the resulting application, a further key component supplied by the S2I builder is a ‘run’ script. It is this script which is executed when the Docker image is run and which starts up the actual application.

So an S2I builder contains all the base software components that would be required for an application, plus the ‘assemble’ and ‘run’ scripts defining how the source code is combined with the builder image and then subsequently how to start the application.

What isn’t obvious is how our source files gets copied in as part of this process. This is where the ‘s2i’ program from the Source to Image package comes into play. It is this which takes the source code, injects it into our running S2I builder, triggers the ‘assemble’ script and then snapshots the container to create a new Docker image.

To make things a little clearer, lets try an example.

For this I am going to use an S2I builder which has been created for use with OpenShift for deploying Python web applications. This S2I builder can be found on the Docker Hub registry and is called ‘openshift/python-27-centos7’.

In using the ‘s2i’ program there are two ways that you can supply your source files. The first is to point at a remote Git repository hosted somewhere like GitHub. The second is to point at a local file system directory containing the source files.

In this case I am going to use the repository on GitHub located at:

  • https://github.com/GrahamDumpleton/wsgi-hello-world

The ‘s2i’ program is now run, supplying it the location of the source files, the name of the S2I builder image on the Docker Hub registry and the name to be given to the Docker image produced and which will contain our final application.

$ s2i build https://github.com/GrahamDumpleton/wsgi-hello-world.git openshift/python-27-centos7 my-python-app
---> Copying application source ...
---> Installing dependencies ...
Downloading/unpacking gunicorn (from -r requirements.txt (line 1))
Installing collected packages: gunicorn
...
Successfully installed gunicorn
Cleaning up...

$ docker images
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
my-python-app latest beda88ceb3ad 14 minutes ago 444.1 MB

With the build complete, we can now run our application.

$ docker run --rm -p 8080:8080 my-python-app
---> Serving application with gunicorn (wsgi) ...
[2016-01-17 10:49:58 +0000] [1] [INFO] Starting gunicorn 19.4.5
[2016-01-17 10:49:58 +0000] [1] [INFO] Listening at: http://0.0.0.0:8080 (1)
[2016-01-17 10:49:58 +0000] [1] [INFO] Using worker: sync
[2016-01-17 10:49:58 +0000] [30] [INFO] Booting worker with pid: 30

and access it using ‘curl’ to validate it works.

$ curl $(docker-machine ip default):8080
Hello World!

Important here to understand is that it wasn’t necessary to define how to create the Docker image. That is, all the WSGI 'Hello World’ Git repository contained was:

$ ls -las
total 40
0 drwxr-xr-x 8 graham staff 272 7 Jan 19:21 .
0 drwxr-xr-x 71 graham staff 2414 17 Jan 14:05 ..
0 drwxr-xr-x 15 graham staff 510 17 Jan 15:18 .git
8 -rw-r--r-- 1 graham staff 702 6 Jan 17:07 .gitignore
8 -rw-r--r-- 1 graham staff 1300 6 Jan 17:07 LICENSE
8 -rw-r--r-- 1 graham staff 163 6 Jan 17:09 README.rst
8 -rw-r--r-- 1 graham staff 9 6 Jan 21:05 requirements.txt
8 -rw-r--r— 1 graham staff 278 6 Jan 17:09 wsgi.py

There was no ‘Dockerfile’. It is the ‘s2i’ program in combination with the S2I builder image which does all this for you.

An S2I builder for static hosting

As you can see from above, the S2I concept already solves some of our problems of how to manage the workflow for creating a Docker image which contains our web site built using Lektor.

The part of the puzzle we still need though is a Docker base image which combines both a web server and Python runtime and which we can add in the ‘assemble’ and ‘run’ scripts to create a S2I builder image.

This is where I am going to cheat a little bit.

This is because although I demonstrated an S2I builder for Python above, I actually have my own separate S2I builder for Python web applications. My own S2I builder is more flexible in its design than the OpenShift S2I builder. One of the things it supports is the use of Apache/mod_wsgi for hosting a Python web application. To do this it is using ‘mod_wsgi-express’.

One of features that ‘mod_wsgi-express’ so happens to have is an easy ability to host static files using Apache in conjunction with your Python web application. It even has a mode whereby you can say that you are only hosting static files and don’t actually have a primary Python web application.

So although primarily designed for hosting Python web applications, my existing S2I builder for Python web applications provides exactly what we need in this case. That is, it combines in one base image a Python runtime, along with Apache, as well as an easy way to start Apache against static file content.

If we were running on our normal machine at this point and not using Docker, the steps required to build our static files from our Lektor project and host it using ‘mod_wsgi-express’ would be as simple as:

$ lektor build --output-path /tmp/data
Started build
U index.html
U about/index.html
U projects/index.html
U blog/index.html
U static/style.css
U blog/first-post/index.html
Finished build in 0.07 sec
Started prune
Finished prune in 0.00 sec
 
$ mod_wsgi-express start-server --application-type static --document-root /tmp/data
Server URL : http://localhost:8000/
Server Root : /tmp/mod_wsgi-localhost:8000:502
Server Conf : /tmp/mod_wsgi-localhost:8000:502/httpd.conf
Error Log File : /tmp/mod_wsgi-localhost:8000:502/error_log (warn)
Request Capacity : 5 (1 process * 5 threads)
Request Timeout : 60 (seconds)
Queue Backlog : 100 (connections)
Queue Timeout : 45 (seconds)
Server Capacity : 20 (event/worker), 20 (prefork)
Server Backlog : 500 (connections)
Locale Setting : en_AU.UTF-8

We could then access our web site created by Lektor at the URL ‘http://localhost:8000/'.

Even though this appears so simple, it is actually running a complete instance of Apache. It is this easy because ‘mod_wsgi-express’ does all the hard work of automatically generating the Apache configuration files to use for this specific site based only on the command line arguments provided. The configuration files for this instance are all generated totally independent of any existing Apache configuration you may have for the main Apache instance on your machine and so will not interfere with it.

An S2I builder for Lektor

In order to now create our S2I builder for Lektor, we are going to build on my existing S2I builder base image for Python web applications. I don’t believe I have specifically blogged about my S2I builder for Python before, although I have mentioned before some of the work I have been doing on Docker base images for Python web applications.

The existing Docker base image for Python web applications is on the Docker Hub registry as ‘grahamdumpleton/mod-wsgi-docker’. As to the S2I builder support I have been working on, this has been rolled into that same image, although if wishing to use it as an S2I builder you will need to instead use ‘grahamdumpleton/mod-wsgi-docker-s2i’. This latter image is pretty minimal and just sets the exposed ‘PORT’ and ‘USER’. 

# grahamdumpleton/mod-wsgi-docker-s2i:python-2.7

FROM grahamdumpleton/mod-wsgi-docker:python-2.7
USER 1001
EXPOSE 80
CMD [ "/usr/local/s2i/bin/usage" ]

For our Lektor S2I builder image, what we are now going to use is the following ‘Dockerfile’.

# grahamdumpleton/s2i-lektor:1.1

FROM grahamdumpleton/mod-wsgi-docker-s2i:python-2.7
RUN pip install Lektor==1.1
COPY .whiskey /app/.whiskey/

This ‘Dockerfile’ only does two additional things on top of the underlying S2I builder for Python. The first is to install Lektor and the second is to copy in some extra files into the Docker image. Those extra files are:

.whiskey/server_args
.whiskey/action_hooks/build

What you will note is that we aren’t actually adding any ‘assemble’ or ‘run’ scripts as we have talked about. This is because these already exist in the base image and already do everything we need to prepare the image and then start up a web server for us.

Different to how the OpenShift S2I Python builder is designed, the ‘assemble’ and ‘run’ scripts here are designed with a means for application specific hooks to be supplied to perform additional steps at the time of building the image or deploying the application. This is what these two files are about that we copied into the image.

Of these, the ‘.whiskey/action_hooks/build’ file is a shell script which is invoked by the ‘assemble’ script during the build of the Docker image. What it contains is:

#!/usr/bin/env bash

lektor build --output-path /data

This will be run by the ‘assemble’ script in the same directory as the source files that were copied into the image from either the local source directory or the remote Git repository.

This script therefore is what is going to trigger Lektor to generate the static files for our site. The files will be generated into the ‘/data’ directory.

The second file called ‘.whiskey/server_args’ contains:

--application-type static --document-root /data

With the way that the base image is setup, and ‘run’ called when the Docker image is started, it will by default automatically run up ‘mod_wsgi-express’. It will do this with a number of default options which are required when running ‘mod_wsgi-express’ in a Docker container, such as directing logging to the terminal so that Docker can capture it. What the ‘server_args’ file does is allow us to supply any additional options to ‘mod_wsgi-express’. In this case we are giving it options to specify that it is to host static files with no primary Python WSGI application being present, where the static files are located in the ‘/data’ directory.

And that is all there is to it. Because the base image is already doing lots of magic, we only had to provide the absolute minimum necessary, taking advantage of the fact that the base image is already employing all necessary best practices and smarts to make things work.

For the complete source code for this S2I builder image for Lektor you can see:

  • https://github.com/GrahamDumpleton/s2i-lektor

A Docker image corresponding to Lektor 1.1 is also already up on the Docker Hub registry as ‘grahamdumpleton/s2i-lektor:1.1’. As such, we can now run ‘s2i’ as:

$ s2i build https://github.com/GrahamDumpleton/lektor-empty-site.git grahamdumpleton/s2i-lektor:1.1 my-lektor-site
---> Installing application source
---> Building application from source
-----> Running .whiskey/action_hooks/build
Started build
U index.html
U about/index.html
U projects/index.html
U blog/index.html
U static/style.css
U blog/first-post/index.html
Finished build in 0.08 sec
Started prune
Finished prune in 0.00 sec
$ docker run --rm -p 8080:80 my-lektor-site
---> Executing the start up script
[Sun Jan 17 12:28:03.698888 2016] [mpm_event:notice] [pid 17:tid 140541365122816] AH00489: Apache/2.4.18 (Unix) mod_wsgi/4.4.21 Python/2.7.11 configured -- resuming normal operations
[Sun Jan 17 12:28:03.699328 2016] [core:notice] [pid 17:tid 140541365122816] AH00094: Command line: 'httpd (mod_wsgi-express) -f /tmp/mod_wsgi-localhost:80:1001/httpd.conf -E /dev/stderr -D MOD_WSGI_STATIC_ONLY -D MOD_WSGI_MPM_ENABLE_EVENT_MODULE -D MOD_WSGI_MPM_EXISTS_EVENT_MODULE -D MOD_WSGI_MPM_EXISTS_WORKER_MODULE -D MOD_WSGI_MPM_EXISTS_PREFORK_MODULE -D FOREGROUND'

Testing our site with ‘curl’ we get:

$ curl $(docker-machine ip default):8080
<!doctype html>
<meta charset="utf-8">
<link rel="stylesheet" href="./static/style.css">
<title>Welcome to Empty Site! — Empty Site</title>
<body>
<header>
<h1>Empty Site</h1>
<nav>
<ul class="nav navbar-nav">
<li class="active"><a href="./">Welcome</a></li>
<li><a href="./blog/">Blog</a></li>
<li><a href="./projects/">Projects</a></li>
<li><a href="./about/">About</a></li>
</ul>
</nav>
</header>
<div class="page">
<h2>Welcome to Empty Site!</h2>
<p>This is a basic demo website that shows how to use Lektor for a basic
website with some pages and a blog.</p>

</div>
<footer>
&copy; Copyright 2016 by Graham Dumpleton.
</footer>
</body>

Integration with OpenShift

As seen, the S2I system gives us a really easy way to produce a Docker image, not only for your own custom Python web application where you provide the source code, but also scenarios where you might be simply using existing data with an existing application. We did something like the latter with Lektor, although we actually also generated the required data to be hosted by the web server as part of the build process.

When running the ‘s2i’ program we were also able to use source files in a local directory, or from a remote Git repository. Even so, this still only gives us a Docker image and we would need to host that somewhere.

For most Docker based deployment systems, this would entail needing to push your Docker image from your own system, or a CI/CD system, to a Docker registry. The hosting service would then need to pull that image from the Docker registry in order to deploy it as a live web application.

If however using the latest OpenShift things are even simpler. This is because OpenShift integrates support for S2I.

Under OpenShift, all I need to do to deploy my Lektor based blog site is:

$ oc new-app grahamdumpleton/s2i-lektor:1.1~https://github.com/GrahamDumpleton/lektor-empty-site.git --name blog
--> Found Docker image a95cedc (17 hours old) from Docker Hub for "grahamdumpleton/s2i-lektor:1.1"
* An image stream will be created as "s2i-lektor:1.1" that will track this image
* A source build using source code from https://github.com/GrahamDumpleton/lektor-empty-site.git will be created
* The resulting image will be pushed to image stream "blog:latest"
* Every time "s2i-lektor:1.1" changes a new build will be triggered
* This image will be deployed in deployment config "blog"
* Port 80/tcp will be load balanced by service "blog"
--> Creating resources with label app=blog ...
ImageStream "s2i-lektor" created
ImageStream "blog" created
BuildConfig "blog" created
DeploymentConfig "blog" created
Service "blog" created
--> Success
Build scheduled for "blog" - use the logs command to track its progress.
Run 'oc status' to view your app.

$ oc expose service blog
route "blog" exposed

I can then access the blog site at the host name which OpenShift has assigned it. If I have my own host name, then I just need to edit the route which was created to make the blog site public to add in my own host name instead.

In this case I needed to use the OpenShift command line tool to create my blog site, but we can also load up a definition into our OpenShift project which will allow us to build our blog site direct from the OpenShift UI.

This definition is provided as part of the ‘s2i-lektor’ project on GitHub and so to load it we just run:

$ oc create -f https://raw.githubusercontent.com/GrahamDumpleton/s2i-lektor/master/lektor.json
imagestream "lektor" created

If we now go to the OpenShift UI for our project we have the option of adding a Lektor based site.

Openshift add to project lektor

Clicking through on the ‘lektor:1.1’ entry we can now fill out the details for the label to be given to our site and the location of the Git repository which contains the source files.

Openshift lektor parameters

Upon clicking on ‘Create’ it will then go off and build our Lektor site, including making it publicly accessible.

Openshift lektor service

By default only a single instance of our site will be created, but if it were an extremely popular site, then to handle all the traffic we would just increase the number of pods (instances) running. When a web application is scaled in this way, OpenShift will automatically handle all the load balancing of traffic across the multiple instances. We do not need to worry ourselves about needing to set up any front end router or deal with registration of the back end instances with the router.

When it comes to making changes to our site and redeploying it we have a few options.

Openshift lektor build

We could manually trigger a rebuild of the site through the UI or the command line after we have pushed up our changes to GitHub, or we could instead link the application in OpenShift with our GitHub repository. To do the latter we would configure a web hook into our repository on GitHub. What will happen then is that every time a change is made and pushed up to the Git repository, the application on OpenShift will be automatically rebuilt and redeployed for us.

We have now achieved the goal I was after and have a complete workflow in place. All that I would have to worry about is updating the content of the blog site and pushing up the changes to my Git repository when I am happy for them to be published.

Trying out OpenShift yourself

Although I showed a full end to end workflow combining Docker, S2I and OpenShift, if you aren’t interested in the OpenShift part you can definitely still use S2I with a basic Docker service. You would just need to incorporate it into an existing CI/CD pipeline.

If you are interested in the new OpenShift based on Docker and Kubernetes and want to experiment with it, then you have a few options. These are:

  • OpenShift Origin - This is the Open Source upstream project for the OpenShift products by Red Hat.
  • AWS Test Drive - This is an instance of OpenShift Enterprise which you can spin up and try on Amazon Web Services.
  • All In One VM - This is a self contained VM which you can spin up with VirtualBox on your own machine.

If you do decide to try OpenShift and my Lektor S2I builder do let me know. I also have an S2I builder for creating IPython notebook server instances as well. The IPython S2I builder can pull your notebooks and any files it needs from a Git repository just like how the Lektor S2I builder does for a Lektor site. It is also possible with the IPython images to spin up a backend IPython cluster with as many engines as you need if wishing to play around with parallel computing with ‘ipyparallel’.

Unfortunately right now the existing OpenShift Online PaaS offering from Red Hat is still the older OpenShift version so is not based around Docker and Kubernetes. Hopefully it will not be too much longer before a version of OpenShift Online using Docker and Kubernetes is available. That should make it a lot easier to experiment with the features of the new OpenShift and how easy it can be to get a web site hosted, like by Lektor example shown here.