Tuesday, August 11, 2015

Running ASYNC web applications under OpenShift.

In a previous post I explained how one could run custom WSGI servers under OpenShift. This can get a little tricky as by default the OpenShift Python cartridges are setup to use Apache and mod_wsgi to host your WSGI server. In order to be able to run an alternate WSGI server such as gunicorn, uWSGI or Waitress, you need to provide a special ‘app.py’ file which exec’s a shell script which in turn then executes the target WSGI server.

This level of indirection was needed to allow the alternate WSGI server to be started up, but at the same time use the same process name as the original process which the OpenShift platform created. If this wasn’t done then OpenShift wouldn’t be able to correctly identify the WSGI server process and would think that it may not have started up correctly and force the gear into an error state.

Although the ‘app.py’ file was used to allow us to run an alternate WSGI server, what wasn’t fully explained in that previous post was how the ‘app.py’ file would normally be used to directly run a web server which was embedded inside of the Python web application itself.

There are actually two options here as to what one could do. The first is that where a WSGI server can be run in an embedded way, then it could be used instead of running a standalone WSGI server. The second option is not to use a WSGI server or framework at all, instead using an ASYNC framework, such as the Tornado web server and framework.

The purpose of this blog post is to discuss that second option, of running an ASYNC web application implemented using the Tornado web server and framework on OpenShift. Although the new OpenShift 3 using Docker and Kubernetes was recently officially released for Enterprise customers, this post will focus on the existing OpenShift 2 and so is applicable to the current OpenShift Online offering.

Embedded web server

First up, lets explain a bit better as to what is meant by an embedded web server.

When we talk about WSGI servers, what is the more typical thing to do is to use a standalone WSGI server and to point it at a WSGI script file or application module. It is the WSGI servers job to then load the WSGI application, handle the web requests and forward the requests onto the WSGI application whose entry point was given in the WSGI script file or application module.

If for example one was using ‘mod_wsgi-express’ and had a WSGI script file, one would simply run:

mod_wsgi-express start-server /some/path/hello.wsgi

The contents of the ‘hello.wsgi’ file might then be:

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]

start_response(status, response_headers)

return [output]

Key here is that the only thing in this file is the WSGI application, there is nothing about any specific WSGI server which is being used. As such, you must use a separate WSGI server to be able to host this WSGI application.

The alternative is to embed the WSGI server in the Python web application code file itself. Thus instead you might have a file call ‘app.py’ which contains:

def application(environ, start_response):
status = '200 OK'
output = 'Hello World!'

response_headers = [('Content-type', 'text/plain'),
('Content-Length', str(len(output)))]

start_response(status, response_headers)

return [output]
if __name__ == '__main__':
from wsgiref.simple_server import make_server
httpd = make_server('', 8000, application)
httpd.serve_forever()

When this ‘app.py’ is run as:

python app.py

it will start up the WSGI server which is contained within the ‘wsgiref’ module of the Python standard library. In other worlds, the Python application script file is self contained.

One warning that should be made here, is that most WSGI servers which allow for embedding in this way are quite simplistic. Be very careful about using them and check their capabilities to ensure that they are indeed capable of being used in a production setting.

One thing that they generally do not support is a multi process configuration. That is, they only run with the one web application process. This can be an issue for CPU bound web applications as the runtime characteristics of the Python global interpreter lock will limit how much work can be done within the one process. The only solution to that is to have a web server that uses multiple processes to handle requests.

Also be careful that any embeddable WSGI server isn’t just single threaded, as this means that only one request can be handled at a time, limiting the amount of traffic it can support. This is especially the case when web requests aren’t all handled quickly, as a single long running request can start causing backlogging, delaying all subsequent requests.

ASYNC web servers

One solution for long running requests, or at least those which are principally I/O bound, is to use an ASYNC web server.

In looking at ASYNC as an alternative, just be aware that ASYNC web servers have a completely different API for implementing Python web applications. This is quite different to WSGI applications. The API for WSGI applications relies on a blocking process/thread model for handling of web requests. It is not readily possible to marry up a WSGI application with an ASYNC server, such that the WSGI application can benefit from the characteristics that an ASYNC web server and framework brings.

This means that you would have to convert your existing WSGI application to be ASYNC. More specifically, you would need to convert it to the API for the ASYNC web framework you chose to use. This is because there is no standardised API for ASYNC as there is with the WSGI API specification for synchronous or blocking web servers.

Before going down that path, also consider whether you really need to convert completely over to ASYNC. Writing and maintaining ASYNC web applications can be a lot more work than if using WSGI. You might therefore consider separating out just certain parts of an existing WSGI application and convert it to ASYNC. Only do it though where it really makes sense. ASYNC is not a magic solution to all problems.

That all said, in the Python world one of the most popular ASYNC web frameworks for implementing Python web applications is Tornado.

A simple ASYNC Tornado web application would be written as:

import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
    def get(self):
        self.write("Hello World!")
application = tornado.web.Application([
    (r"/", MainHandler),
])
if __name__ == "__main__":
    application.listen(8000)
    tornado.ioloop.IOLoop.current().start()

In this simple example you aren’t going to see much if any benefit over a WSGI application and server as it isn’t doing anything.

The benefits of an ASYNC framework come into play when the web application needs to make calls out to backend services such as databases and other web applications. Provided that the clients for such backend services are ASYNC aware and integrate with the ASYNC framework event loop, then rather than blocking, the web application can give up control and allow other requests to be handled while it is waiting.

The different web request handlers therefore effectively cooperate, explicitly yielding up control at points where they would otherwise block. This is done without the use of multithreading and so technically one isn’t encumbered with the additional overheads of using additional threads.

The theory is that with less resources being used, it is then possible to handle a much higher number of concurrent requests than might be possible if using a multithreaded server.

You do have to be careful though, as this will break down where code run by a request handler does block or where you run CPU intensive tasks. ASYNC frameworks are therefore not the answer for everything and you must be very careful in how you implement ASYNC web applications.

Running on OpenShift

In order to run this Tornado application on OpenShift, you will need to make some changes. This is necessary as when running under OpenShift, the OpenShift platform will indicate what IP address and port the web server should be listening on. You cannot use port 80 and should avoid arbitrarily selecting a port.

If you do not use the port that OpenShift allocates for you, then your web traffic will not be routed to your web application and your web application may not even start up properly.

Although your own web application will not be running on port 80, OpenShift will still forward the HTTP requests received on either port 80 or port 443 for your externally visible host name, through to your web application. So do the right thing and all should work okay.

As to the IP address and port to use, these will be passed to your web application via the environment variables:

  • OPENSHIFT_PYTHON_PORT
  • OPENSHIFT_PYTHON_IP

The modified Tornado web application which you add to the ‘app.py’ file you push up to OpenShift would therefore be:

import os
import tornado.ioloop
import tornado.web
class MainHandler(tornado.web.RequestHandler):
def get(self):
self.write("Hello World!")
application = tornado.web.Application([
(r"/", MainHandler),
])
port = int(os.environ.get('OPENSHIFT_PYTHON_PORT', '8000'))
ip = os.environ.get('OPENSHIFT_PYTHON_IP', 'localhost')
if __name__ == "__main__":
application.listen(port, ip)
tornado.ioloop.IOLoop.current().start()

So that the Tornado web framework code would be available, you also need to ensure you add ‘tornado’ to the ‘requirements.txt’ file as well. For the full details of creating a Python web application on OpenShift, I defer to the online documentation.

In the remainder of this post I want to start a discussion about a few things that you need to be careful about when using Tornado on OpenShift. Some of these issues apply to Tornado in general, but have some added relevance when running on a PaaS where the amount of memory may be less than when running on your own hardware.

Limiting content buffering

It doesn’t matter what PaaS provider you use, the typical base offering provides only 512MB. This may be greater where you have specifically paid for an extra amount of memory, or where using a large instance size.

Normally memory would only come in as a factor when considering how much memory your own web application consumes. This is especially significant where using a web server which is capable of running with multiple processes as you need to consider that each process will use up to the nominal maximum amount that your web application uses. Thus, how much memory your web application uses dictates how many separate processes you can use in a multi process configuration before you run out of system memory.

When using the Tornado web server though there is a hidden trap which I don’t believe many would appreciate exists and would have done something about.

The issue in this case is that when the Tornado web server receives a request which contains request content, it will by default read all that request content into memory before even passing the request to the handler for that request. This is noted in the Tornado documentation by the statement:

By default uploaded files are fully buffered in memory; if you need to handle files that are too large to comfortably keep in memory see the stream_request_body class decorator.

It is a little detail, but the potential impacts of it are significant. This is because of the fact that Tornado can in theory process so many requests at the same time and so those concurrent requests can each be buffering up to 100MB at the same time and so blowing out memory usage. In fact, I am not even sure if there is a hard limit.

Even if there is a hard limit it is likely to be set to be quite high. Such a limit isn’t therefore generally going to help, as Tornado will only block requests automatically when the request content length is specified as being greater than 100MB in size. If chunked request content were being sent, it can’t even block it as the amount of request content will not be known in advance, so it still has to read and buffer the request content to work out the size and whether the limit is reached.

With such a high default limit and there being no effective limit that can be applied at the outset for chunked request content, it is actually relatively easy, as a result of the buffering, to cause a Tornado web application to use up huge amounts of memory. This doesn’t even need to be the result of a concerted denial of service attack by a malicious actor. Instead, if you are using Tornado for handling large file uploads and need to deal with slow clients such as mobiles, then many concurrent requests could quite easily cause a Tornado web application to use quite a lot of memory just during the phase of initially reading the request in.

This is all in contrast to WSGI applications where no request content is read until the WSGI application itself decides to read in the content, allowing the WSGI application itself to decide how it is handled. This is possible with WSGI because of the blocking model and use of processes/threads for concurrency. Things get much harder in ASYNC systems.

Tornado 4.0+ does now offer a solution to avoid these problems but it is not the default and is an opt in mechanism for which you have to add specific code into your web application on each request handler.

This newer mechanism in Tornado is that rather than the request content being buffered up and being passed complete to your handler as part of the request, it will be passed to your handler via a ‘data_received()’ method as the data arrives. This will be the raw data though and does mean that if handling a form post or file upload, you will need to parse the raw data yourself to decode it.

Anyway, the point of raising these issue is to highlight the need to pay close attention to the Tornado web server configuration and how request content is handled. This is because the default configuration and way things are handled, is as I understand it, susceptible to memory issues and not just through the normal operation of your web application but also through deliberate attacks. In a memory constrained environment of a PaaS, the last thing you want is to run out of memory.

What the overall best practices are for Tornado for handling this I don’t know and I welcome anyone pointing out any resources where it clearly explains how best to design a Tornado web application to avoid problems with large request content.

From my limited knowledge of using Tornado, I would at least suggest look at doing the following:

  • If you are not handling large uploads, set the ‘max_buffer_size’ value to be something a lot smaller than 100MB. It needs to be just enough to handle any encoded form POST data or other file uploads you need to handle.
  • Look at the new request content streaming API in Tornado 4.0+ and consider implementing a common ‘data_received()’ method for all your handlers such that more restrictive per handler limits can be placed on the ‘max_buffer_size’. This could be handled by way of a decorator on the method. With this set the limit to ‘0’ for all handlers which would never receive any request content. Even if handling a form post, set a limit commensurate with what you expect. You would though also need new common code for parsing form post data received via ‘data_received()’.
  • For large file uploads, use ‘data_received()’ to process the data straight away, or save the data straight out to a temporary file, rather than buffer it up in memory, until you are read to process it.
Although I have looked specifically at request content as that presents a more serious problem due to possibly being used as an attack vector, also be mindful to what degree Tornado may buffer up response content as well when it cannot be written out to a client in a timely manner. It has been a while since I have looked at that side of Tornado so I can’t remember exactly how that works.
 
In closing on this issue, it needs to be stressed that this isn’t an OpenShift specific issue. It can happen in any environment. It is raised in relation to OpenShift because of it being common that PaaS offerings generally have less memory available per instance for your web application to use.
 
For those who understand the inner workings of Tornado better than I, which wouldn’t be hard, if I have misrepresented anything about how Tornado works then please let me know, providing an explanation of what does happen.

Automatic scaling of gears

Another issue which needs some attention when using ASYNC applications on OpenShift is how automatic scaling for gears works.

The issue with automatic scaling is that a main selling point of ASYNC web applications is that they can handle a much larger number of concurrent requests. Because of how automatic scaling works though, the fixed thresholds on when scaling occurs may result in some surprises if you enable use of auto scaling in OpenShift for an ASYNC web application handling a large number of concurrent requests.

As this is a more complicated issue I will look at that in a subsequent post.

No comments: