Tuesday, May 26, 2015

Performance monitoring of real WSGI application traffic.

Attempting to monitor WSGI application response times and other details of a web request using 'print' statements, as described in the last few posts, may be okay for when playing around in a development environment to understand how a WSGI server works, but it is totally inadequate when wanting to monitor traffic against a real web site. In a production setting, or even just a testing or QA environment, we need to get the raw information we are collecting for all the WSGI application traffic out into a separate system of some sort where it can be analysed and visualised.

A common set of Open Source tools which is often used for the collection and visualising of such time series data is statsd and Graphite. In this blog post I am going to show how one can hook up the WSGI middleware that was previously constructed for monitoring WSGI application requests to a statsd instance. Being a lazy sort of person though, and wishing to avoid any pain in needing to setup statsd and Graphite myself, I am actually going to use the Datadog monitoring service. Datadog has a free tier which provides the main features we will need to illustrate the approach.

If you have previously used a different Python client for statsd, you will find that the Datadog API for statsd may be somewhat different to what you may be used to. So I would suggest you check out the Datadog documentation for any differences. I will in some latter posts be using some features which I believe are specific to Datadog which make analysing of the metric data easier. You may therefore need to work out how to translate that into what functionality is provided by any direct statsd client you use.

Recording the response time

Going back and adapting the original WSGI middleware for recording the response time to use the statsd interface of Datadog we end up with:

from wrapt import decorator, ObjectProxy
from timeit import default_timer
from datadog import statsd
def record_request_metrics(application_time):
statsd.increment('wsgi.application.requests')
    statsd.timing('wsgi.application.requests.application_time',
1000.0 * application_time)
class WSGIApplicationIterable5(ObjectProxy):
    def __init__(self, wrapped, start):
super(WSGIApplicationIterable5, self).__init__(wrapped)
self._self_start = start
    def close(self):
if hasattr(self.__wrapped__, 'close'):
self.__wrapped__.close()
        duration = default_timer() - self._self_start
        record_request_metrics(duration)
@decorator
def timed_wsgi_application5(wrapped, instance, args, kwargs):
start = default_timer()
    try:
return WSGIApplicationIterable5(wrapped(*args, **kwargs), start)
    except:
duration = default_timer() - start
        record_request_metrics(duration)
        raise

This wrapper is recording two separate metrics.

The first is 'wsgi.application.requests' and is a counter recorded using 'statsd.increment()'. It tracks the number of requests which have been made.

The second is 'wsgi.application.requests.application_time' and is used to track how long it took for the request to be handled by the WSGI application using 'statsd.timing()'. Note that statsd expects the time value in this case to be supplied as milliseconds.

Although statsd doesn't enforce any specific naming structure beyond what characters you can use in a name, the name you choose is quite important. Specifically, because all the metrics reported into statsd end up in the same metric database, you should use a naming hierarchy to clearly indicate the meaning of a metric. More descriptive names modelled as part of a naming hierarchy are better.

In this case I have used a top level name of 'wsgi' as a way of indicating that I am monitoring a WSGI server. The next level of 'application' is used to indicate that the metric is then something specific to a WSGI application, as opposed for example to something related to the WSGI server process it is running in, such as memory usage.

Finally the 'requests' level of the naming hierarchy indicates we are dealing with metrics related to web requests handled by the WSGI application. I use this same top level category for 'requests' to also act as the counter for the number of requests made. Technically I could retrieve the same value from the counters associated with the tracking of the application time, but some times it makes things clearer to break out a counter separately like this.

Visualisation inside Datadog

Jumping into Datadog, we can now construct a visualisation from these metrics showing the throughput and response time.

DatadogThroughputAndResponseTimeCharts

For the top chart displaying the throughput I have charted the counter recorded as 'wsgi.application.requests'.

DatadogThroughputChartConfiguration

Being a counter, this can be displayed as the actual count, but instead I have chosen to display it as a rate metric showing the number of requests/sec. Doing this avoids a lot of confusion as the result will be the same no matter what time period you have selected in the UI. If you choose to display it as a count, then the number shown can vary dependent on the time period displayed due to how data buckets are merged together and values counted when displaying a larger time period.

For the bottom chart displaying the response time you can see there are actually three different lines on the chart. They correspond to the median, average and 95th percentile. That we have these sub categories available for this metric is a by product of having used 'statsd.timing()' to record the metric.

DatadogResponseTimeChartConfiguration

We chart more than one value for the response time as relying on the average alone can be misleading. This is because the average can be skewed by very large one off response times which are outside of the normal. By also displaying the 95th percentile then one gets a better indication of what time value the majority of the web requests come in under, but with those outliers still being excluded.

When you have more than one line on the chart, you can hover over the chart and tooltip like overlays will appear explaining what the line represents. It is also possible to expand the view of the chart itself to get a better view along with a legend.

DatadoThroughputChartExpand

Response content metrics

As well as the response time, we can also record metrics for the details of the response content being returned as was previously explored. Adding those in we now end up with:

from wrapt import decorator, ObjectProxy
from timeit import default_timer
 
from datadog import statsd
 
def record_request_metrics(application_time, output_time=None,
        output_blocks=None, output_length=None):
 
    statsd.increment('wsgi.application.requests')
 
    statsd.timing('wsgi.application.requests.application_time',
            1000.0 * application_time)
 
    if output_time is not None:
        statsd.timing('wsgi.application.requests.output_time',
                1000.0 * output_time)
 
    if output_blocks is not None:
        statsd.histogram('wsgi.application.requests.output_blocks',
                output_blocks)
 
    if output_length is not None:
        statsd.histogram('wsgi.application.requests.output_length',
                output_length)
 
class WSGIApplicationIterable5(ObjectProxy):
 
    def __init__(self, wrapped, start):
        super(WSGIApplicationIterable5, self).__init__(wrapped)
        self._self_start = start
        self._self_time = 0.0
        self._self_count = 0
        self._self_bytes = 0
 
    def __iter__(self):
        time = 0.0
        start = 0.0
        count = 0
        bytes = 0
 
        try:
            for data in self.__wrapped__:
                start = default_timer()
                yield data
                finish = default_timer()
                if finish > start:
                    time += (finish - start)
                start = 0.0
                count += 1
                bytes += len(data)
        finally:
            if start:
                finish = default_timer()
                if finish > start:
                    time += (finish - start)
 
            self._self_time = time
            self._self_count = count
            self._self_bytes = bytes
 
    def close(self):
        if hasattr(self.__wrapped__, 'close'):
            self.__wrapped__.close()
 
        duration = default_timer() - self._self_start
 
        record_request_metrics(duration, output_time=self._self_time,
                output_blocks=self._self_count,
                output_length=self._self_bytes)
 
@decorator
def timed_wsgi_application5(wrapped, instance, args, kwargs):
    start = default_timer()
 
    try:
        return WSGIApplicationIterable5(wrapped(*args, **kwargs), start)
 
    except:
        duration = default_timer() - start
 
        record_request_metrics(duration)
 
        raise

In the case of 'wsgi.application.requests.output_time', as it is a time value we have used 'statsd.timing()' once again. For the value tracking the number of bytes and number of blocks, we could have used 'statsd.gauge()', but this would have just given us the average. Instead we use 'statsd.histogram()', which like 'statsd.timing()' provides us with the ability to display the median and 95th percentile as well as the average.

DatadogResponseContentCharts

In a situation where the amount of response content generated can in practice vary between request, this may provide us additional detail about the variance in what is being generated. In this test example we were generating the same amount of data as a single block all the time, so each of these will actually end up being the same.

Problems of aggregation

Up till now we have been focusing on specific single test examples each time and so the expected data for each web request would be similar. In practice in a complex web site you will have many different handlers for different URLs and how long each takes to complete will vary, as will the amount of data being produced.

In such a real world application, although a global summary as shown in the charts above is still useful in its own way, that all data for different handlers gets aggregated together hides the specifics of how handlers for specific URLs are performing.

Unfortunately statsd alone doesn't necessarily provide a good solution for capturing both an aggregation of response time metrics across the whole WSGI application, as well as a breakdown on a per handler basis. As I understand it, one would have to record additional metrics to the rollup metrics, with different metric names for each handler.

If using the Datadog API for statsd there is a better way of handling both rollup metrics and more specific metrics which drill down into specific handlers. I will go into this in a subsequent post, but before I get to that, in the next post I first want to look at how our current WSGI middleware could be replaced by what some WSGI servers may provide themselves to support collection of metric information.

Saturday, May 23, 2015

Monitoring the response content from a WSGI application.

I have been focusing in the last couple of posts about the overheads of generating response content from a WSGI application as many separate blocks rather than as one large block. This has illustrated how there can be quite marked differences between different WSGI servers and even with a specific WSGI server depending on how it is configured or integrated into an overall architecture. 

In this post I want to look at how we could actually add monitoring which could track how much data is actually being returned in a response and how many blocks it was broken up into. We can also track how much time was being spent by the WSGI server trying to write that data back to the HTTP client.

Monitoring overall response time

Previously I presented a decorator which could be applied to a WSGI application to time the overall response time. That is, how long it took for the complete handling of the request, including the processing of the request content, as well as the generation of the response and the writing of it back to the HTTP client.

The code for the decorator was:

from __future__ import print_function
from wrapt import decorator, ObjectProxy
from timeit import default_timer

class WSGIApplicationIterable1(ObjectProxy):
    def __init__(self, wrapped, name, start):
    super(WSGIApplicationIterable1, self).__init__(wrapped)
    self._self_name = name
    self._self_start = start
    def close(self):
    if hasattr(self.__wrapped__, 'close'):
        self.__wrapped__.close()
        duration = default_timer() - self._self_start
    print('finish %s %.3fms' % (self._self_name, duration*1000.0))
@decorator
def timed_wsgi_application1(wrapped, instance, args, kwargs):
    name = wrapped.__name__
    start = default_timer()
    print('start', name)
    try:
        return WSGIApplicationIterable1(wrapped(*args, **kwargs), name, start)
    except:
        duration = default_timer() - start
        print('finish %s %.3fms' % (name, duration*1000.0))
        raise

Although described as a decorator, this is actually implementing a form of WSGI middleware. In this case the middleware was intercepting the act of calling the WSGI application to handle the request and the finalisation of the request through the calling of any 'close()' method of the returned iterable.

What we didn't intercept in any way was the request content or response content. So what we want to do now is extend this to intercept the response content as it is generated so that we can count how many bytes are being generated and as how many blocks.

Intercepting the response content

To understand what is now needed to intercept the response content we need to look at how a WSGI server actually interfaces with the WSGI application. This interaction can be seen in the sample CGI to WSGI bridge presented in the WSGI specification.

The key part of the sample code presented in the WSGI specification is as follows and has three parts.

    result = application(environ, start_response)
    try:
        for data in result:
            if data:
               write(data)
           if not headers_sent:
               write('')
    finally:
       if hasattr(result, 'close'):
         result.close()

The first part is the calling of the WSGI application. It is the result of this call which has been wrapped by our timing wrapper.

The result returned is then iterated over to yield each data block making up the response, with it being written back to the HTTP client.

Finally, the 'close()' method is then called on the iterable. The 'close()' method of the iterable is called whether or not the iterable was completely consumed. The iterable may not be completely consumed if while generating any response an exception occurred, or if when writing the response back to the HTTP client an error occurred, such as the connection being dropped.

In the middleware wrapper we are so far intercepting the original WSGI application call and the call to 'close()'.

In order to now intercept the response content we need to intercept the creation of the iterator object from the iterable when the 'in' statement is applied to it in the loop.

To do that, we need to intercept the special '__iter__()' method of the iterable. As it stands right now, due to the wrapper for the iterable being implemented as a derived class of the 'ObjectProxy' class from the 'wrapt' package, the call of the '__iter__()' method is being automatically applied to the original wrapped iterable. In effect the object proxy is doing:

    def __iter__(self):
        return iter(self.__wrapped__)

We want to do more than simply process in some way the iterator object itself. Instead we want to process each yielded item. For this we need to iterate over the original wrapped iterable object ourselves, turning the '__iter__()' method into a generator function. This can be done as:

    def __iter__(self):
        for data in self.__wrapped__:
            yield data

The archaic write() callback

The job of intercepting the response content seems simple enough but simply overriding '__iter__()' is unfortunately not enough if you want a completely correct WSGI middleware. This is because returning an iterable is not the only way that it is possible to return response content.

To support certain older style Python web applications that preceded the WSGI specification, another way of generating response content is supported. This is via a 'write()' callback function which is returned by the 'start_response()' function when called by the WSGI application.

When this 'write()' callback function is used to provide the response content, then the iterable returned by the WSGI application would normally be an empty list. The code above would therefore never actually see any response content.

I have blogged previously about this hidden 'write()' callback and although a correct WSGI middleware should support it, for the purposes of this analysis I am going to ignore it.

Ignoring the 'write()' callback function will not affect the operation of the wrapped WSGI application, but it means that if the 'write()' callback function were used, we wouldn't report anything about the response content generated that way.

Overheads of adding monitoring

Now, before we get started on adding to the basic loop for intercepting the actual response content, it is important to highlight that monitoring isn't necessarily free.

In our original timing decorator for the WSGI application we introduced a number of additional function calls, however that number was fixed for each request handled. The additional overhead from these extra calls would be acceptable so long as what those calls themselves did wasn't excessive.

When we start talking about intercepting the request content so as to accumulate metric information about the data being generated for the response, we would be adding extra calls and processing for every distinct data block.

We have already seen how an excessive number of distinct data blocks in a worst case scenario can be a problem with some WSGI servers. We therefore have to be very careful, as although the additional overhead of the monitoring code may not present a problem when there are a small number of data blocks, the extra work being performed could exacerbate the worst case scenario and severely affect the responsiveness of the monitored application at the one time the server is already stressed and so when you don't want that to occur.

Ideally the generation of metric information about a request would be embedded within the WSGI server itself so as to avoid the need to add in special wrappers which add to the number of intermediate calls being made. Unfortunately the majority of WSGI servers provide no such functionality. The next best alternative is to consider writing any such monitoring wrappers as a Python C extension to limit the potential overhead.

However monitoring is added, one key thing to try and always avoid is doing too much processing during the life of the request itself. Any extra processing that is done during the request will obviously add to the response time as seen by the user. Thus you should try and do the absolute minimum while the request is being handled and try to defer more complex processing based on any raw data collected, until after the request has completed.

By doing this the impact on the response time is at least reduced, albeit that it still potentially comes at a cost. This extra cost may be in extra memory usage due to needing to hold raw data for processing at the end of the request, but also in additional CPU usage from doing that processing.

Deferring complex processing until the end of the request will also still keep that request thread busy until it is complete, preventing it from being released to handle a subsequent request, be that a request on a totally distinct connection or a request on a keep alive connection. This delayed release of the request thread can therefore potentially cause a reduction in available server capacity.

Overall, monitoring is a bit of a balancing act. You can't avoid all overheads from monitoring and so when monitoring is first added you may see a reduction in performance as a result. The visibility one gets from adding monitoring though allows you to more easily identify potential issues and fix them. You therefore quite quickly get to a better point than where you were originally without monitoring and so the hit you take from the monitoring is worth it.

You could well at this point remove the monitoring and so eliminate the overheads once more, but you would then be running in the dark again and so have no idea when you introduce some bad code once more. Best to accept the initial overheads and just get on with improving your code from what you learn. Just be careful in designing the monitoring code to reduce the overheads as much as possible.

That all said, let's try and continue by calculating a baseline to gauge how much overhead any monitoring may be adding. For that we can first try to use our prior test example whereby we returned a large string as the iterable.

from timer1 import timed_wsgi_application1
@timed_wsgi_application1
def application(environ, start_response):
    status = '200 OK'
   output = 100000 * b'Hello World!'
   response_headers = [('Content-type', 'text/plain'),
      ('Content-Length', str(len(output)))]
   start_response(status, response_headers)
   return output

With our original decorator which measures just the length of time spent in the WSGI application, when we use mod_wsgi-express this time we get:

start application
finish application 6269.062ms

If we override the '__iter__()' method to introduce the loop but not do any processing:

    def __iter__(self):
        for data in self.__wrapped__:
            yield data

we now get:

start application
finish application 6291.978ms

Not much difference, but a further big warning has to be made.

The problem with making minor changes like this and triggering one off test runs is that it is actually really hard to truly gauge the impact of the change, especially in a case like this where one was already stressing the WSGI server quite severely, with CPU usage peaking up towards 100%.

This is because when a system is being stressed to the maximum, the variability between the results from one run to the next can be quite marked and so when there isn't much difference the results for the two variations on the code can actually overlap as the result values bounce around.

Although the normal approach to eliminating such variability is to run the benchmark test many many times and try and determine some sort of average, or at least eliminate outliers, this doesn't generally help in a situation where the WSGI server is being stressed to breaking point.

This is actually one of the big mistakes people make when trying to benchmark WSGI servers to try and work out which may be able to handle the greatest throughput. That is, they just flood the WSGI servers with requests such that they are overloaded, which only serves to generate results which are quite unpredictable.

When trying to evaluate WSGI server performance it is better to evaluate how it behaves at a level of about 50% capacity utilisation. This is a level whereby you aren't triggering pathological cases of performance degradation through overloading, yet still gives you some measure of head room for transient peaks in the request throughput.

If you are constantly pushing high levels of capacity utilisation in a WSGI server then you should really be looking at scaling out horizontally to ensure you have the spare capacity to meet demands as they arise, rather than running the WSGI server at breaking point.

So trying to gauge overheads at this point is partly a pointless exercise with the test we have, so lets just get on with implementing the code instead and we will revisit the issue of measuring overhead later.

Counting writes and bytes

The initial change we were therefore after was tracking how much data is actually being returned in a response and how many blocks it was broken up into. Overriding the '__iter__()' method we can do this using:

class WSGIApplicationIterable2(ObjectProxy):
    def __init__(self, wrapped, name, start):
        super(WSGIApplicationIterable2, self).__init__(wrapped)
        self._self_name = name
        self._self_start = start
        self._self_count = 0
        self._self_bytes = 0
    def __iter__(self):
        count = 0
        bytes = 0
        try:
            for data in self.__wrapped__:
                yield data
                count += 1
                bytes += len(data)
        finally:
            self._self_count = count
            self._self_bytes = bytes
    def close(self):
        if hasattr(self.__wrapped__, 'close'):
            self.__wrapped__.close()
        duration = default_timer() - self._self_start
        print('write %s blocks %s bytes' % (self._self_count, self._self_bytes))
        print('finish %s %.3fms' % (self._self_name, duration*1000.0))

Run this with our test example where a string was returned as the iterable rather than a list of strings, we get:

start application
write 1200000 blocks 1200000 bytes
finish application 5499.676ms

This is in contrast to what we would hope to see, which is a small as possible number of blocks, and even just the one block if that was achievable without blowing out memory usage.

start application
write 1 blocks 1200000 bytes
finish application 4.867ms

What we are obviously hoping for here is a low number of blocks, or at least a high average number of bytes per block. If the average number of bytes per block is very low, then it would be worthy of further inspection.

Time taken to write output

Tracking the number of bytes written and the number of blocks can highlight potential issues, but it doesn't actually tell us how long was spent writing the data back to the HTTP client. Knowing the time taken will help us confirm whether the response being returned as a large number of blocks is an issue or not. To capture information about the amount of time taken we can use:

class WSGIApplicationIterable3(ObjectProxy):
    def __init__(self, wrapped, name, start):
        super(WSGIApplicationIterable3, self).__init__(wrapped)
        self._self_name = name
        self._self_start = start
        self._self_time = 0.0
        self._self_count = 0
        self._self_bytes = 0
    def __iter__(self):
        time = 0.0
        start = 0.0
        count = 0
        bytes = 0
        try:
            for data in self.__wrapped__:
                start = default_timer()
                yield data
                finish = default_timer() 
                if finish > start:
                    time += (finish - start)
                start = 0.0
                count += 1
                bytes += len(data)
        finally:
            if start:
                finish = default_timer()
                if finish > start:
                    time += (finish - start)
        self._self_time = time
        self._self_count = count
        self._self_bytes = bytes
    def close(self):
        if hasattr(self.__wrapped__, 'close'):
              self.__wrapped__.close()
        duration = default_timer() - self._self_start
        print('write %s blocks %s bytes %.3fms' % (self._self_count,
                self._self_bytes, self._self_time*1000.0))
        print('finish %s %.3fms' % (self._self_name, duration*1000.0))

In this version of the wrapper we time how long it took to yield each value from the loop. This has the affect of timing how long it took any outer layer, in this case how long the WSGI server took, to process and write the block of data back to the HTTP client.

Running our test example now we get:

start application
write 1200000 blocks 1200000 bytes 6654.018ms
finish application 7812.504ms

We can therefore see that the time taken in writing out the actual data took a large proportion of the overall response time. In this case the actual test example wasn't itself doing much work so that would have been the case anyway, but the magnitude of the amount of time spent writing the response in conjunction with the number of blocks is the concern.

Do though keep in mind that even if you were writing as few blocks as possible that if a large amount of data was involved, that the WSGI server could itself block when writing to the socket if the HTTP client wasn't reading the data quickly enough.

The time taken can therefore be helpful in pointing out other issues as well and not just the cumulative impact of writing many small blocks of data.

The big question now is how you turn such information into actionable data.

Using print statements to output timing values is all well and good in a test scenario when playing around with WSGI servers, but it isn't of any use in a production environment. The next step therefore is to look at how one can capture such information in bulk in the context of a production environment and whether it still remains useful at that point.

Tuesday, May 19, 2015

Returning a string as the iterable from a WSGI application.

The possible performance consequences of returning many separate data blocks from a WSGI application were covered in the previous post. In that post the WSGI application used as an example was one which returned the contents of a file as many small blocks of data. Part of the performance problems seen arose due to how the WSGI servers would flush each individual block of data out, writing it onto the socket connection back to the HTTP client. Flushing the data in this way for many small blocks, as opposed to as few as possible larger blocks had a notable overhead, especially when writing back to an INET socket connection.

In this post I want to investigate an even more severe case of this problem that can occur. For that we need to go back and start over with the more typical way that most Python web frameworks return response data. That is, as a single large string.

def application(environ, start_response):
    status = '200 OK'
    output = 1024*1024*b'X'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]

In this example I have increased the length of the string to be returned so it is 1MB in size.

If we run this under mod_wsgi-express and use 'curl' then we get an adequately quick response as we would expect.

$ time curl -s -o /dev/null http://localhost:8000/
real 0m0.017s
user 0m0.006s
sys 0m0.005s

Response as an iterable

As is known, the response from a WSGI application needs to be an iterable of byte strings. The more typical scenarios are that a WSGI application would return either a list of byte strings, or it would be implemented as a generator function which would yield one or more byte strings.

That the WSGI server will accept any iterable does mean that people new to writing Python WSGI applications often make a simple mistake. That is that instead of returning a list of byte strings, they simply return the byte string instead.

def application(environ, start_response):
    status = '200 OK'
    output = 1024*1024*b'X'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return output

Because a byte string is also an iterable then the code will still appear to run fine, returning the response expected back to the HTTP client. For a small byte string the time taken to display the page back in the browser would appear to be normal, but for a larger byte string it would become evident that something is wrong.

Running this test with mod_wsgi-express again, we can see the time taken balloon out from 17 milliseconds to over 3 seconds:

$ time curl -s -o /dev/null http://localhost:8000/
real 0m3.659s
user 0m0.294s
sys 0m1.544s

Use gunicorn, and as we saw in the previous post the results are much worse again.

$ time curl -s -o /dev/null http://localhost:8000/
real 0m23.762s
user 0m1.446s
sys 0m7.085s

The reason the results are so bad is that when a string is returned as the iterable, when iterating over it you are dealing with the byte string one byte at a time. This means that when the WSGI server is writing the response back to the HTTP client, it is in turn doing it one byte at a time. The overhead of writing one byte at a time to the socket connection is dwarfing everything else.

As was covered in the prior post, this is entirely expected behaviour for a WSGI server. The mistake here is entirely in the example code and the code should be corrected.

Acknowledging that this would be a problem with a users code, lets now though see what happens when this example is run through uWSGI.

$ time curl -s -o /dev/null http://localhost:8000/
real 0m0.019s
user 0m0.006s
sys 0m0.007s

Rather than the larger time we would expect to see, when using uWSGI the time taken is still down at 19 milliseconds.

This may seem to be a good result, but what is happening here is that uWSGI has decided to break with conforming to the WSGI specification and has added special checking to detect this sort of user mistake. Instead of treating the result as an iterable as it is required to by the WSGI specification, it takes the whole string and uses it as is.

You may still be thinking this is great, but it isn't really and will only serve to hide the original mistake, resulting in users writing and shipping code with a latent bug. The problem will only become evident when the WSGI application is run on a different WSGI server which does conform to the WSGI specification.

This could be disastrous if the WSGI application was being shipped out to numerous customers where it was the customer who decided what WSGI server they used.

In addition to there being a problem when run on a conforming WSGI server, there will also be a problem if someone took the WSGI application and wrapped it with a WSGI middleware.

This can be illustrated by now going back and adding the WSGI application timing decorator.

from timer1 import timed_wsgi_application1
@timed_wsgi_application1
def application(environ, start_response):
    status = '200 OK'
    output = 1024*1024*b'X'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return output

With the decorator, running with uWSGI again we now get:

$ time curl -s -o /dev/null http://localhost:8000/
real 0m13.876s
user 0m0.925s
sys 0m3.407s

So only now do we get the sort of result we were expecting, with it taking 13 seconds.

The reason that the problem only now shows up with uWSGI is because the wrapper which gets added by the WSGI middleware, around the iterable passed back from the WSGI application, causes the uWSGI check for an explicit string value to fail.

This demonstrates how fragile the WSGI application now is. If it is intended to be something that is used by others, you cannot control whether those users may use any WSGI middleware around it. A user might therefore very easily and unknowingly, cause problems for themselves and have no idea why or even that there is a problem.

Application portability

As much as the WSGI specification is derided by many, it can't be denied it eliminated the problem that existed at the time with there being many different incompatible ways to host Python web applications. Adherence to the WSGI specification by both WSGI applications and servers is key to that success. It is therefore very disappointing to see where WSGI servers deviate from the WSGI specification as it is a step away from the goal of application portability.

What you instead end up with is pseudo WSGI applications which are in fact locked in to a specific WSGI server implementation and will not run correctly or perform well on other WSGI servers.

If you are developing Python web applications at the WSGI level rather than using a web framework and you value WSGI application portability, one thing you can do to try and ensure WSGI compliance is to use the WSGI validator found in the 'wsgiref.validate' module of the Python standard library.

def application(environ, start_response):
    status = '200 OK'
    output = 1024*1024*b'X'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return output

import wsgiref.validate

application = wsgiref.validate.validator(application)

This will perform a range of checks to ensure some basic measure of WSGI compliance and also good practice.

In this particular example of where a string was returned as the WSGI application iterable, the WSGI validator will flag it as a problem, with the error:

AssertionError: You should not return a string as your application iterator, instead return a single-item list containing that string.

So run the validator on your application during development or testing at times and exercise your WSGI application to get good coverage. This will at least help with some of the main issues that may come up, although by no means all given how many odd corner cases that exist within the WSGI specification.

Effects of yielding multiple blocks in a WSGI application response.

In my last post I introduced a Python decorator that can be used for measuring the overall time taken by a WSGI application to process a request and for the response to then be sent on its way back to the client.

That prior post showed an example where the complete response was generated and returned as one string. That the response has to be returned as one complete string is not a requirement of the WSGI specification, albeit for many Python web frameworks it is the more typical scenario for responses generated as the output from a page template system.

If needing to return very large responses, generating it and returning it as one complete string wouldn't necessarily be practical. This is because doing so would result in excessive memory usage due to the need to keep the complete response in memory. This problem would be further exacerbated in a multi threaded configuration where concurrent requests could all be trying to return very large responses at the same time.

In this sort of situation it will be necessary to instead return the response a piece at a time, by returning from the WSGI application an iterable that can generate the response as it is being sent.

Although this may well not be the primary way in which responses are generated from a WSGI application, it is still an important use case. It is a use case though which I have never seen covered in any of the benchmarks that people like to run when comparing WSGI servers. Instead benchmarks focus only on the case where the complete response is returned in one go as a single string.

In this post therefore I am going to look at this use case where response content is generated as multiple blocks and see how differently configured WSGI servers perform and what is going on under the covers so as to impact the response times as seen.

Returning the contents of a file

The example I am going to use here is returning the contents of a file. There is actually an optional extension that WSGI servers can implement to optimise the case of returning a file, but I am going to bypass that extension at this time and instead handle returning the contents of the file myself.

The WSGI application being used in this case is:

from timer1 import timed_wsgi_application1
@timed_wsgi_application1
def application(environ, start_response):
    status = '200 OK'
   response_headers = [('Content-type', 'text/plain')]
   start_response(status, response_headers)
   def file_wrapper(filelike, block_size=8192):
      try:
         data = filelike.read(block_size)
         while data:
            yield data
            data = filelike.read(block_size)
      finally:
         try:
            data.close()
         except Exception:
            pass
   return file_wrapper(open('/usr/share/dict/words'), 128)

On MacOS X the size of the '/usr/share/dict/words' file is about 2.5MB. In this example we are going to return the data in 128 byte blocks so as to better highlight the impacts of many separate blocks being returned.

Running this example with the three most popular WSGI servers we get from a typical run:

  • gunicorn app:application # 714.012ms
  • mod_wsgi-express start-server app.py # 159.944ms
  • uwsgi --http 127.0.0.1:8000 --module app:application  # 388.556ms

In all configurations only the WSGI server has been used, no front ends, and with each accepting requests directly via HTTP on port 8000.

What is notable from this test is the widely differing times taken by each of the WSGI servers to deliver up the same response. It highlights why one cannot rely purely on simple 'Hello World!' benchmarks. Instead you have to be cognisant of how your WSGI application delivers up its responses.

In this case if your WSGI application had a heavy requirement for delivering up large responses broken up into many separate chunks, which WSGI server you use and how you have it configured may be significant.

Flushing of data blocks

Having presented these results, lets now delve deeper into the possible reasons for the large disparity between the different WSGI servers.

The first thing to working out why there may be a difference is to understand what is actually happening when you return an iterable from a WSGI application which yields more than one data block.

The relevant part of the WSGI specification is the section on buffering and streaming. In this section it states:

WSGI servers, gateways, and middleware must not delay the transmission of any block; they must either fully transmit the block to the client, or guarantee that they will continue transmission even while the application is producing its next block. A server/gateway or middleware may provide this guarantee in one of three ways:

1. Send the entire block to the operating system (and request that any O/S buffers be flushed) before returning control to the application, OR
2. Use a different thread to ensure that the block continues to be transmitted while the application produces the next block.
3. (Middleware only) send the entire block to its parent gateway/server.

In simple terms this means that a WSGI server is not allowed to buffer the response content and must ensure that it will actually be sent back to the HTTP client immediately or at least in parallel to fetching the next data block to be sent.

In general WSGI servers adopt option (1) and will immediately send any response content on any socket back to the HTTP client, blocking until the operating system has accepted the complete data and will ensure it is sent. All three WSGI servers tested above are implemented using option (1).

For the case of there being many blocks of data, especially smaller blocks of data, there thus can be a considerable amount of overhead in having to write out the data to the socket all the time.

In this case though the same example was used for all WSGI servers, thus the number and size of the blocks was always the same and didn't differ. There must be more to the difference than just this if all WSGI servers are writing data out to the socket immediately.

As the WSGI servers are all implementing option (1), the only other apparent difference would be the overhead of the code implementing the WSGI servers themselves.

For example, gunicorn is implemented in pure Python code and so as a result could show greater overhead than mod_wsgi and uWSGI which are both implemented in C code.

Are there though other considerations, especially since both mod_wsgi-express and uWSGI are implemented as C code yet still showed quite different results?

INET versus UNIX sockets

In all the above cases the WSGI servers were configured to accept connections directly via HTTP over an INET socket connection.

For the case of mod_wsgi-express though there is a slight difference. This is because mod_wsgi-express will run up mod_wsgi under Apache in daemon mode. That is, the WSGI application will not actually be running inside of the main Apache child worker processes which are actually handling the INET socket connection back to the HTTP client.

Instead the WSGI application will be running in a separate daemon process run up by mod_wsgi and to which the Apache child worker processes are communicating as a proxy via a UNIX socket connection.

To explore whether this may account for why mod_wsgi-express shows a markedly better response time, what we can do is run mod_wsgi-express in debug mode. This is a special mode which forces Apache and mod_wsgi to run as one process, rather than the normal situation where there is an Apache parent process, Apache child worker processes and the mod_wsgi daemon processes.

This debug mode is normally used when wishing to be able to interact with the WSGI application running under mod_wsgi, such as if using the Python debugger pdb or some other interactive debugging tool which exposes a console prompt direct from the WSGI application process.

The side affect of using debug mode though means that the WSGI application is effectively running in a similar way to mod_wsgi embedded mode, meaning that when writing back a response, the data blocks will be written direct onto the INET socket connection back to the HTTP client.

Running with this configuration we get:

  • mod_wsgi-express start-server --debug-mode app.py # 470.487ms

As it turns out that there is a difference between daemon mode and embedded mode of mod_wsgi, now lets also consider uWSGI.

Although uWSGI is being used to accept HTTP connections directly over an INET connection, the more typical arrangement for uWSGI is to use it behind nginx. Obviously using nginx as a HTTP proxy isn't really going to help as one would see similar results as shown, but uWSGI also supports its own internal wire protocol for talking to nginx called 'uwsgi', so lets try that instead and see if that makes a difference.

In using the 'uwsgi' wire protocol though, we still have two possible choices we can make for configuring it. The first is that an INET socket connection is used between nginx and uWSGI and the second is to use a UNIX socket connection instead.

  • uwsgi --socket 127.0.0.1:9000 --module app:application # 284.802ms
  • uwsgi --socket /tmp/uwsgi.sock --module app12:application # 143.614ms

From this test we see two things.

The first is that even when using an INET socket connection between nginx and uWSGI, the time spent in the WSGI application is improved. This is most likely because of the more efficient 'uwsgi' wire protocol being used in place of the HTTP protocol. The uWSGI process is thus able to offload the response more quickly.

The second is that switching to a UNIX socket connection reduces the time spent in the WSGI application even more due to the lower overheads of writing to a UNIX socket connection compared to an INET socket connection.

Although the time spent in the WSGI application is reduced in both cases, it is vitally important to understand that this need not translate into an overall reduced response time to the same degree as seen by the HTTP client.

This applies equally to mod_wsgi when run in daemon mode. In both the case of mod_wsgi in daemon mode and uWSGI behind nginx the front end process is allowing the backend process running the WSGI application to more quickly offload the response only. It doesn't eliminate the fact that the front end represents an extra hop in the communications with the HTTP client.

In other words, where time is being spent has marginally been shifted to the front end proxy out of the WSGI application.

This doesn't mean that the effort isn't entirely wasted though. This is because WSGI applications have a constrained set of resources in the form of processes/threads for handling web requests. Thus the quicker you can offload the response from the WSGI application, the quicker the process or thread is freed up to be able to handle the next request.

Use of a front end proxy as exists with mod_wsgi in daemon mode, or where uWSGI is run behind nginx, actually allows both WSGI servers to perform more efficiently and so they can handle a greater load than they would otherwise be able to if they were dealing direct with HTTP clients.

Serving up of static files

Although we can use the WSGI application code used for this test to serve up static files, in general, serving up static files from a WSGI application is a bad idea. This is because the overheads will still be significantly more than serving up the static files from a proper web server.

To illustrate the difference, we can make use of the fact that mod_wsgi-express is actually Apache running mod_wsgi and have Apache serve up our file instead. We can do this using the command:

mod_wsgi-express start-server app.py --document-root /usr/share/dict/

What will happen is that if the URL maps to a physical file in '/usr/share/dict', then it will be served up directly by Apache. If the URL doesn't map to a file, then the request will fall through to the WSGI application, which will serve it up as before.

As we can't readily time in Apache how long a static file request takes to sufficient resolution, we will simply time the result of using 'curl' to make the request.

$ time curl -s -o /dev/null http://localhost:8000/
real 0m0.161s
user 0m0.018s
sys 0m0.062s
$ time curl -s -o /dev/null http://localhost:8000/words
real 0m0.013s
user 0m0.005s
sys 0m0.005s

Where as to serve up the static file took 161ms when served via the WSGI application, it took only 13ms when served as a static file.

The uWSGI WSGI server has a similar option for overlaying static files on top of a WSGI application.

uwsgi --http 127.0.0.1:8000 --module app:application --static-check /usr/share/dict/

Comparing the two methods using uWSGI we get:

$ time curl -s -o /dev/null http://localhost:8000/
real 0m0.381s
user 0m0.029s
sys 0m0.092s
$ time curl -s -o /dev/null http://localhost:8000/words
real 0m0.025s
user 0m0.006s
sys 0m0.009s

As with mod_wsgi-express, one sees a similar level of improvement.

If using nginx in front of uWSGI, you could even go one step further and offload the serving up of static files to nginx with a likely further improvement due to the elimination of one extra hop and nginx's known reputation for being a high performance web server.

Using uWSGI's ability to serve static files is still though a reasonable solution where it would be difficult or impossible to install nginx, such as on a PaaS.

The static file serving capabilities of mod_wsgi-express and uWSGI would in general certainly be better than pure Python options for serving static files, although how much better will depend on whether such Python based solutions make use of WSGI server extensions for serving static files in a performant way. Such extensions and how they work will be considered in a future post. 

What is to be learned

The key take aways from the analysis in this post are:

  1. A pure Python WSGI server such as gunicorn will not perform as well as C based WSGI servers such as mod_wsgi and uWSGI when a WSGI application is streaming response content as many separate data blocks.
  2. Any WSGI server which is interacting directly with an INET socket connection back to a HTTP client will suffer from the overheads of an INET socket connection when the response consists of many separate data blocks. Use of a front end proxy which allows a UNIX socket connection to be used for the proxy connection will improve performance and allow the WSGI server to offload responses quicker, freeing up the worker process or thread sooner to handle a subsequent request.
  3. Serving of static files is better offloaded to a separate web server, or separate features of a WSGI server designed specifically for handling of static files.

Note that in this post I only focused on the three WSGI servers of gunicorn, mod_wsgi and uWSGI. Other WSGI servers do exist and I intend to revisit the Tornado and Twisted WSGI containers and the Waitress WSGI server in future posts.

I am going to deal with those WSGI servers separately as they are all implemented on top of a core which makes use of asynchronous rather than blocking communications. Use of an asynchronous layer has impacts on the ability to properly time how long the Python process is busy handling a specific web request. These WSGI servers also have other gotchas related to their use due to the asynchronous layer. They thus require special treatment.

Friday, May 15, 2015

Measuring response time for web requests in a WSGI application.

The first topic I want to cover as part of my planned research for a possible upcoming PyCon talk titled 'Using benchmarks to understand how WSGI servers work' is how can one measure the response time of a web request being handled by a web application.

As far as benchmarks go this is usually done from the perspective of the tool used to generate the web requests. This is though a very gross value, incorporating not just the time spent in the WSGI application, but also network overheads and the time in any underlying or front end web server.

Although this is still a useful measure, it doesn't actually help you to identify where the time is being spent and as a consequence isn't particularly helpful for understanding whether you have configured the system in the best possible way to remove overheads.

As a first step in trying to get more fine grained timing information about where time is being spent for a web request, I want to look at what can be done to track the amount of time spent in the WSGI application itself.

Timing of function calls

The obvious quick solution people reach for to track the amount of time spent in a function call in Python is to place a timing decorator on the function. The decorator would apply a wrapper which would remember at what time the function call started and then dump out how long it took when it exited.

Using the wrapt package to create the decorator this could be implemented as:

from __future__ import print_function
from wrapt import decorator
from timeit import default_timer
@decorator
def timed_function(wrapped, instance, args, kwargs):
    start = default_timer()
    print('start', wrapped.__name__)
    try:
        return wrapped(*args, **kwargs)
    finally:
        duration = default_timer() - start
        print('finish %s %.3fms' % (wrapped.__name__, duration*1000.0))

For this implementation we are using 'timeit.default_timer()' as the clock source as it will ensure we use as high a resolution clock as possible for a specific platform.

In the case of a WSGI application we could then apply this to the callable entry point.

from timer1 import timed_function
@timed_function
def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    return [output]

Running this WSGI application with any WSGI server we would then get something like:

start application
finish application 0.030ms

The time of 0.03 milliseconds for a 'Hello World!' application seems to be in the right order of magnitude, but is the decorator truly sufficient. Lets test out a few more simple WSGI applications.

First up lets just add a 'sleep()' call into the WSGI application. With the response time of our most simple of cases being so small, if we sleep for a sizeable amount of time, then that base overhead should be lost as noise within the longer sleep time.

from timer1 import timed_function
from time import sleep
@timed_function
def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!'
    response_headers = [('Content-type', 'text/plain'),
        ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    sleep(1.0)
    return [output]

Trigger a web request with this WSGI application and we get:

start application
finish application 1000.653ms

And so the result is closer to the much longer sleep time as we expected.

If we take away the sleep time we end up with 0.653 milliseconds which does seem quite high compared to what we had previously. The reason for this though is likely to be the operating system overheads of waking up the process from its short sleep and resuming execution. The resulting time therefore is still reasonable.

Now in both these examples the WSGI application returned an iterable which was a list of strings. In other words the complete response was returned in one go after being generated up front.

Another way of writing a WSGI application is as a generator. That is, rather than returning all values at one time as a list, we yield each specific value making up the response. This allows the response to be generated as it is being returned.

from timer1 import timed_function
from time import sleep
@timed_function
def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    sleep(1.0)
    yield output

We again trigger a web request and the result this time is:

start application
finish application 0.095ms

In this case we are back to a sub millisecond value again for time, so obviously there is a problem with our decorator in this case.

The problem here is that as the WSGI application is a generator function, the decorator is only timing how long it took for the Python interpreter to create the generator object. The time will not actually include the execution of any code within the generator function.

This is the case because the initial code in the function, including the 'sleep()' call, will only be executed upon the first attempt to fetch a value from the generator object. This is something that will only be done by the WSGI server after the generator object has been returned and the wrapper applied by the decorator has also exited.

The traditional method of using a decorator with before and after actions executed either side of the call of the wrapped function will not therefore work for a WSGI application which is implemented as a generator function.

There are actually other situations besides a generator function which will also fail as there are in fact numerous ways that a WSGI application callable object can be implemented. Any scenario which defers work until the point when the iterable returned is being consumed will not yield the correct results with our first attempt at a timing decorator.

If interested in the various ways that WSGI application callable objects can be implemented, you can read a previous post I wrote on the topic called 'Implementing WSGI application objects'.

Marking the end of a request

The solution to our problem with not being able to use a traditional function timing decorator can be found within the WSGI protocol specification itself. Specifically it is stated in the specification:

If the iterable returned by the application has a close() method, the server or gateway must call that method upon completion of the current request, whether the request was completed normally, or terminated early due to an application error during iteration or an early disconnect of the browser. (The close() method requirement is to support resource release by the application. This protocol is intended to complement PEP 342 's generator support, and other common iterables with close() methods.)

What this means is that where the iterable from the WSGI application would otherwise be returned, we can instead return a wrapper object around that iterable and in our wrapper provide a 'close()' method. This 'close()' method is then guaranteed to be called at the end of the request regardless of whether it completes successfully or not.

We can therefore use this as the place for ending the timing of the request where the iterable was returned. One requirement in using such a wrapper though is that the wrapper itself must in turn call the 'close()' method of any iterable which was wrapped to preserve any expectation it has that its own 'close()' method is called.

from __future__ import print_function
from wrapt import decorator, ObjectProxy
from timeit import default_timer

class WSGIApplicationIterable1(ObjectProxy):
    def __init__(self, wrapped, name, start):
    super(WSGIApplicationIterable1, self).__init__(wrapped)
    self._self_name = name
    self._self_start = start
    def close(self):
    if hasattr(self.__wrapped__, 'close'):
        self.__wrapped__.close()
        duration = default_timer() - self._self_start
    print('finish %s %.3fms' % (self._self_name, duration*1000.0))
@decorator
def timed_wsgi_application1(wrapped, instance, args, kwargs):
    name = wrapped.__name__
    start = default_timer()
    print('start', name)
    try:
        return WSGIApplicationIterable1(wrapped(*args, **kwargs), name, start)
    except:
        duration = default_timer() - start
        print('finish %s %.3fms' % (name, duration*1000.0))
        raise

In the implementation of the wrapper for the WSGI application iterable I have used the 'ObjectProxy' class from the wrapt package. The 'ObjectProxy' class in this case acts as a transparent object proxy for whatever is wrapped. That is, any action on the proxy object will be performed on the wrapped object unless that action is somehow overridden in the proxy object. So in this case we have overridden the 'close()' method to allow us to insert our code for stopping the timer.

This wrapper class could have been implemented as a standalone class without relying on the 'ObjectProxy' class from wrapt, however using 'ObjectProxy' from wrapt brings some benefits which will be covered in a subsequent blog post.

Using this new decorator our test example is:

from timer1 import timed_wsgi_application1
from time import sleep
@timed_wsgi_application1
def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    sleep(1.0)
    yield output

When a web request is now issued we get:

start application
finish application 1000.593ms

This is now the result we are expecting.

Just to make sure that we are preserving properly the semantics for 'close()' being called, we can use the test example of:

from __future__ import print_function
from timer1 import timed_wsgi_application1
from time import sleep
class Iterable(object):

    def __init__(self, output):
        self.output = output
    def __iter__(self):
        return self
    def next(self):
        try:
            return self.output.pop(0)
        except IndexError:
            raise StopIteration
    def close(self):
        print('close')
@timed_wsgi_application1
def application(environ, start_response):
    status = '200 OK'
    output = b'Hello World!'
    response_headers = [('Content-type', 'text/plain'),
            ('Content-Length', str(len(output)))]
    start_response(status, response_headers)
    sleep(1.0)
    return Iterable([output])

What is being done here is that rather than returning a list or using a generator function, we return an iterable implemented as a class object. This iterable object defines a 'close()' method.

If we use this test example the result is:

start application
close
finish application 1000.851ms

That 'close' is displayed means that when the WSGI server calls the 'close()' method on the result from our WSGI application timing decorator, that we are in turn correctly then calling the 'close()' method of the iterable that was originally returned by the WSGI application.

Applying the timing decorator

We now have a timing decorator that can be applied to a WSGI application and which will correctly time from the start of when the WSGI application callable object was executed, until the point where all the response content had been written back to a client and any custom 'close()' method of the iterable, if one exists, was called.

In the next few blog posts I will start applying this timing decorator to WSGI applications which do more than return just 'Hello World!' to see if anything can be learned about the characteristics of the most popular WSGI servers being used under different use cases.

The intent of exploring the different use cases is to show why benchmarks using a single simple test case aren't sufficient to properly evaluate which WSGI server may be the best for you. Rely on such simple tests and you could well make the wrong choice and end up using a WSGI server that doesn't perform as well as alternatives for your specific use case.

Thursday, May 14, 2015

Using benchmarks to understand how WSGI servers work.

Possibly unwisely, as I am not sure I feel motivated enough to do the topic justice at this time nor whether I can definitely attend the conference, but I have submitted a talk proposal for PyCon AU to be held at the start of August. The topic of the talk is 'Using benchmarks to understand how WSGI servers work'. To read more about the intended topic you can read the text of the proposal here.

If you read the proposal you will see that it has the potential to be a quite complex topic and there is no way I will be able to sufficiently cover it within the allotted time. In other words, it is typical of the sort of talks I have done in the past where to fully appreciate what was presented you have to watch the talk multiple times.

Since I cop a fair bit a criticism that my talks are too packed with information and due to favourable feedback I got on my series of posts about decorators (which were done after I presented the talk which covered the topic), I am going to try something a bit different for this talk.

What I am going to do this time is that as I research and prepare for the talk, I am going to present a series of blog posts related to the topic. As to the actual talk, it will then attempt to be a dumbed down more sedate affair which focuses on some key take aways from what I ended up covering in the blog posts. For those who then want more detail they can go back and refer to the blog posts.

Now although I have two months to prepare the talk, the possible complexity of the topic means I can't delay, so I will start pumping out posts pretty soon. I have a rough idea of what sort of topics I want to post about, with the first one being about how one actually goes about measuring application response times for a Python web application. Even so I am certainly open to suggestions of topics related to the subject that people out there want to hear about. So if you do have your own suggestions, or want to see me expound further on specific topics I post about, please let me know as things progress.

Wednesday, May 6, 2015

Using mod_wsgi-express as a development server.

Apache/mod_wsgi has traditionally been seen as a deployment target only for production systems. To use it in a development environment is viewed as too difficult and not worth the trouble. The downside of not using Apache/mod_wsgi in a development environment when ultimately deploying to it in production, and instead using a development server provided by any web framework being used, is that situations can arise where your application may work fine in development but then fail in production.

One of the main reasons is that development servers are usually single process and single threaded. This means you are not able to properly test your web application in the face of many concurrent requests. This isn’t just restricted to multithreading issues within the same process, but also the impacts of an external resource being accessed by multiple processes at the same time.

A further example of where a development server can cause problems is that the development server can load your web application differently to what a production WSGI server does. This especially occurs where a development server is actually fronted by a management script which also performs other roles. The act of loading all the management commands when working out which to execute, can result in side effects, or your web application code being loaded in a totally different way to which it is loaded in a production WSGI server. This later issue has in the past especially been a problem with the Django development server, where how the development server loaded code would mask module import cycle issues related to Django database model code.

Development servers in pure Python code can also sometimes be an issue as far as WSGI compliance. This is because they can silently allow certain types of responses which aren’t strictly compliant with what is required by the WSGI specification. The classic example of this is where Unicode objects are used instead of byte strings. Issues arise where a development server silently allows a Unicode object to be converted to a byte string using the default encoding. Run the same code on a WSGI server which more strictly adheres to the WSGI specification and you are likely to encounter a runtime error instead.

These and other reasons are why you should strive to use the same server you intend to use in production during development. As I already said though, Apache/mod_wsgi has traditionally been seen as being too hard to run as a development server. This is no longer the case with mod_wsgi-express however, as it provides a way of easily running up an Apache/mod_wsgi instance from the command line on a non privileged port without you needing to configure Apache yourself. This is because mod_wsgi-express will automatically generate an Apache configuration for mod_wsgi and your specific web application for you and then start Apache as well.

In addition to this ability to run Apache/mod_wsgi from the command line, mod_wsgi-express also provides a range of other builtin features specifically to make it more useful in a development environment. These include automatic source code reloading, debugger, profiler and coverage support. It also has capabilities to record details about requests received and the responses generated. This is at the level of capturing the actual WSGI environ dictionary and HTTP response headers, along with request and response content. This in depth amount of information can come in handy when you don’t understand why a web application may be responding to the request received or when you need to validate the response.

Reloading of source code

Python web applications when using WSGI, would nearly always use a persistent process which will handle many requests over time. That is, the source code is loaded once and if you needed to make code changes, you would need to restart the Python web application processes. This is in contrast to a PHP web application which effectively discards any loaded code at the end of a request and reloads it all again for the next request. This difference is a constant source of frustration to PHP developers who have migrated to using Python. They are used to simply making a code change and for that to be available automatically.

Such PHP converts will keep asking why Python can’t just be changed, but due to the performance issues with loading all code on every request, it simply isn’t practical, or at least not in a production environment.

In a development environment at least, to avoid the need to manually restart the WSGI server all the time, development servers will implement a code monitor such that any code files associated with modules that have been loaded into the running Python web application will be monitored for changes. If any of those files on disk change in between the time it was loaded and when the check was made, then the web application process(es) will be automatically shutdown and restarted. In this way, when the next request is made it will see the more recent code changes.

As mod_wsgi-express is still primarily intended for use in production, such a feature isn’t enabled by default, but it can be enabled on the command line using the ‘—reload-on-changes’ option. So if running 'mod_wsgi-express’ directly, one would use:

mod_wsgi-express start-server —reload-on-changes app.wsgi

If using mod_wsgi-express integrated with a Django project, then you would use:

python manage.py runmodwsgi —reload-on-changes

The automatic reloading of your web application when code changes are made will work whether you are running with a single or multithreaded process, and even if you are using a multi process configuration.

Debugging runtime exceptions

When an exception occurs in the code for your web application, what typically happens is that the exception is caught by the web framework you are using and translated to a generic HTTP 500 server error response page. No details about the exception would normally be displayed within the response. If you are lucky, you might get details of the exception logged to an error log, but various web frameworks do not even do that by default.

What you need to do to reveal information about exceptions obviously varies based on the framework being used. In the case of the Django web framework one thing you can do is enable a debug mode which will result in details of the exception being displayed within the HTTP 500 server error response page.

# SECURITY WARNING: don't run with debug turned on in production!
DEBUG = True

This will result in the display of the exception, a stack trace, and the values of any local variables for each stack frame for the stack trace back. Obviously you don’t want to leave this enabled when deploying to a production environment.

As explained in the prior post where I detailed how to integrate mod_wsgi-express with Django as a management command, a safer option to capture at least the details of the exception and the stack trace, is to enable Django logging to log the details of the exceptions to the error log.

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'handlers': {
        'console': {
            'class': 'logging.StreamHandler',
        },
    },
    'loggers': {
        'django': {
            'handlers': ['console'],
            'level': os.getenv('DJANGO_LOG_LEVEL', 'INFO'),
        },
    },
}

This avoids the need to enable Django debug mode and is better in as much as the details will be captured persistently in the error log, rather than them only appearing on a transient web browser page, which would be lost when visiting another URL.

One final option with Django is to configure it so that it will not catch the exceptions raised within your code when handling a request and convert it to a HTTP 500 server error response page itself.

DEBUG_PROPAGATE_EXCEPTIONS = True

With this setting enabled, what will instead happen is that the exception will be allowed to propagate out from Django. Provided the Django application itself isn’t wrapped by some WSGI middleware that catches and suppresses the exception, the exception will make it all the way up to the WSGI server, which would generate its own HTTP 500 server error response page.

In the case of mod_wsgi, when an exception does propagate all the way back to the WSGI server, the exception details will be automatically logged to the error log. This logging of exceptions by mod_wsgi will actually occur for a number of different situations and covers situations where Django itself wouldn’t be able to log anything, such as where an exception occurred in the generator returned within a streaming HTTP response.

So we have three different options we could rely on, but which do we use?

As it turns out, all three are useful depending on what you want to do. The question then is how do we make them easier to use when mod_wsgi-express is being used, without you needing to go in and modify the Django settings module all the time, and potentially forgetting to revert a setting like that for Django debug mode when moving to production.

Logging of handler exceptions

Even though it isn’t the default, I would always recommend that the ‘LOGGING’ setting be configured to log to the error log when using Apache/mod_wsgi. In this way you always have a record of exceptions, even in the case where the delivery of log messages to an offsite service failed for some reason.

If for some reason you do not want to log to the error log file all the time and only wish to do it when using mod_wsgi-express, were it only being used in a development setting, then you can vary how the ‘LOGGING’ setting is configured by checking for the existence of the ‘MOD_WSGI_EXPRESS’ environment variable.

if os.environ.get('MOD_WSGI_EXPRESS'):
    LOGGING = {
        …
    }

This environment variable will only be set when using mod_wsgi-express and not when using a manually configured instance of Apache running mod_wsgi.

When using mod_wsgi-express as a development server, if having to actually go to the error log file is seen as inconvenient and you would rather see errors directly in the terminal, then you can use the ‘—log-to-terminal’ option when running mod_wsgi-express.

Exception debugging mode

Django has its debug mode which is enabled using the ‘DEBUG’ setting. When running mod_wsgi-express it also has its own special debug mode. This is enabled by supplying the ‘—debug-mode’ option. 

So if running 'mod_wsgi-express’ directly, one would use:

mod_wsgi-express start-server —debug-mode app.wsgi

If using mod_wsgi-express integrated with a Django project, then you would use:

python manage.py runmodwsgi —debug-mode

Use of this option actually makes mod_wsgi-express behave even more like traditional development servers in that it will only run with a single process and single thread. This mode also opens up access to a range of special debug features in mod_wsgi-express as well.

Do be aware though that use of debug mode in mod_wsgi-express will disable the automatic code reloading feature were it enabled and you will need to exit mod_wsgi-express explicitly and restart it to pick up code changes if running in this mode. As you would only be using this mode from the command line when debugging problems in your code, this is not seen as a big issue.

The general workflow would therefore be to run mod_wsgi-express in its default multithreaded mode with code reloading enabled as you make code changes. When a problem then occurs, you would drop down into debug mode to work out the cause of an issue.

As this debug mode of mod_wsgi-express is a special mode, what can now be done is to trigger the enabling of Django debug mode off it being run. This will then have the effect of enabling the display of exceptions details within the browser. To do this the Django setting files would be configured as:

# SECURITY WARNING: don't run with debug turned on in production!
if os.environ.get('MOD_WSGI_DEBUG_MODE'):
DEBUG = True
else:
DEBUG = False

By triggering Django debug mode off of the mod_wsgi-express debug mode, you reduce the risk that you might accidentally leave the Django debug mode enabled as you wouldn’t need to keep changing the setting. Instead Django debug mode would be enabled as a side effect purely of the ‘—debug-mode’ option being supplied to mod_wsgi-express.

Post mortem debugging

Although Django debug mode enables the display of exception details, including values of local variables on the stack frames, it doesn’t allow you to interact with the Python interpreter itself beyond that. Such post mortem debugging generally requires the use of ‘pdb’ or some other debugger IDE.

In order to use ‘pdb’ to debug such a problem, one would normally have to modify your application code to introduce special code to invoke ‘pdb’ when an exception occurs.

When using debug mode of mod_wsgi-express a simpler way is however available for performing post mortem debugging of an exception raised within your handler code.

To enable this feature if running 'mod_wsgi-express’ directly, one would use the ‘—enable-debugger’ option in conjunction with the ‘—debug-mode’ option:

mod_wsgi-express start-server —debug-mode —enable-debugger app.wsgi

Now when an exception occurs within your handler and that exception propagates back up to the WSGI server level, you will be automatically thrown into a ‘pdb’ debugger session.

> /Users/graham/Testing/django-site/mysite/mysite/views.py(4)home()
-> raise RuntimeError('xxx')
(Pdb)

If using mod_wsgi-express integrated with a Django project, then you could also use:

python manage.py runmodwsgi —debug-mode —enable-debugger

However, as explained above, exceptions under Django will be caught and translated into a generic HTTP 500 server error response page. In order to disable this, we need to configure Django to allow exceptions to be propagated back up to the WSGI server. This is so that ‘pdb’ can catch the exception and enter debugging mode.

As we only want exceptions to be propagated back up to the WSGI server when enabling post mortem debugging, then we can qualify the configuration in the Django settings module using:

if os.environ.get('MOD_WSGI_DEBUGGER_ENABLED'):
DEBUG_PROPAGATE_EXCEPTIONS = True

By checking for the ‘MOD_WSGI_DEBUGGER_ENABLED’ environment variable, we are ensuring again that we don’t accidentally leave this enabled when deploying to production.

Now the debugger as shown only provides the ability to perform post mortem debugging. What though if you wanted to debug from an arbitrary point in your code before the point where the exception occurred?

You could again resort to modifying your code to add in the calls to ‘pdb’ to invoke it at the desired location, but you would at least still need to force mod_wsgi-express to run in single process mode with stdin/stdout of the process properly attached to the terminal, using the ‘—debug-mode’ option.

An alternative provided by mod_wsgi-express to avoid needing to modify your code even in this case is available with the ‘—debugger-startup’ option.

So if running 'mod_wsgi-express’ directly, one would also add the ‘—debugger-startup’ option:

mod_wsgi-express start-server —debug-mode —enable-debugger —debugger-startup app.wsgi

If using mod_wsgi-express integrated with a Django project, then you would use:

python manage.py runmodwsgi —debug-mode —enable-debugger —debugger-startup

What this option will do is throw you into ‘pdb’ as soon as the process has been started.

> /Users/graham/.virtualenvs/django/lib/python2.7/site-packages/mod_wsgi/server/__init__.py(1040)__init__()->None
-> self.activate_console()
(Pdb)

At the ‘pdb’ prompt you can then set breakpoints for where you want the debugger to be later triggered when handling a request and resume the debugging session.

(Pdb) import mysite.views
(Pdb) break mysite.views.home
Breakpoint 1 at /Users/graham/Testing/django-site/mysite/mysite/views.py:3
(Pdb) cont

When a subsequent request then results in that code being executed, the ‘pdb’ prompt will once again be presented allowing you to step through the code from that point or interrogate any objects available in that context.

(Pdb) list
 1 from django.http import HttpResponse
 2
 3 B def home(request):
 4 -> raise RuntimeError('xxx')
 5 #return HttpResponse('Hello world!')
[EOF]
(Pdb) locals()
{'request': <WSGIRequest: GET '/'>}

Production capable by default

So what we ultimately have here is that mod_wsgi-express in its default configuration is production capable, but it can optionally be setup for use in a development environment.

The first option one has is to leave it in its more production capable mode, but enable source code reloading as a convenience. The second option is to force it into a true debug mode where everything runs in the one process, with the process properly attached to the terminal so as to allow direct interaction with the process. This debug mode then permits the use of the Python debugger ‘pdb’.

In order to allow better integration with web frameworks when working in debug mode or when specific debug mode features are enabled, environment variables are set to allow the web framework or application settings to be customised automatically, thereby ensuring the best expirience is available.

As mentioned, mod_wsgi-express also has other capabilities available once in debug mode. These include code profiling and coverage support. I will discuss those and other development centric features in subsequent posts.