David Avraamides Code and other geekery

Page Stats Middleware

Some websites have a nice feature where they will show you a summary of the performance metrics measured during the generation of a page. I thought this would be a nice feature for some Django-based web applications I was developing at work. When a page takes a while to render, I like to quickly diagnose where the problem might be.

Stats

I wanted to know the time spent generating the page in total as well as the breakdown between Python (Django) and the database. Additionally, I wanted to know how many times the database was queried as too many queries can be a hint that you aren't accessing the database efficiently.

Since I wanted the feature to work across any Django app, a custom middleware class seemed like the right approach. Writing a custom middleware is quite easy and, like so many other things in Django, well documented.

Calculating the Metrics

Luckily, the Django developers had already thought of the value of collecting some basic database statistics, so I simply had to figure out how to access them. When DEBUG mode is on (through your settings file), a database backend class wraps a cursor in a CursorDebugWrapper object which keeps a list of each SQL query executed along with its execution time. I just had to make sure DEBUG was enabled, if not already, and then add up the time of any calls incurred during this page's invocation.

The total time to process the view is measured in the middleware's process_view routine and then the time spent in Python (i.e Django) was easy to back out by timing the entire view call and then subtracting out the database time.

Viewing the Stats

The last part of the puzzle was how to put this information in the page itself. Normally in Django you would use a custom template tag for such a chore but I couldn't since I wouldn't have completed timing the view call until the template had already been rendered.

I decided to follow the spirit of template substitution by creating a special HTML comment that described the format of the output and then replacing the placeholder with the formatted output as the middleware returned the view to its caller.

The StatsMiddleware Class

The code is pretty straightforward. First, I save the state of the debug setting and then enable debugging. This triggers the use of the debug wrapper class in the database backend which will keep the stats of the database part of the execution. Then I save the size of the connection's queries where this debug info is stored so I will know which queries were additionally called during this view's invocation. Then I build the view, keeping track of its time

import re
from operator import add
from time import time
from django.db import connection

class StatsMiddleware(object):

    def process_view(self, request, view_func, view_args, view_kwargs):

        # turn on debugging in db backend to capture time
        from django.conf import settings
        debug = settings.DEBUG
        settings.DEBUG = True

        # get number of db queries before we do anything
        n = len(connection.queries)

        # time the view
        start = time()
        response = view_func(request, *view_args, **view_kwargs)
        totTime = time() - start

        # compute the db time for the queries just run
        queries = len(connection.queries) - n
        if queries:
            dbTime = reduce(add, [float(q['time']) 
                                  for q in connection.queries[n:]])
        else:
            dbTime = 0.0

        # and backout python time
        pyTime = totTime - dbTime

        # restore debugging setting
        settings.DEBUG = debug

        stats = {
            'totTime': totTime,
            'pyTime': pyTime,
            'dbTime': dbTime,
            'queries': queries,
            }

        # replace the comment if found            
        if response and response.content:
            s = response.content
            regexp = re.compile(r'(?P<cmt><!--\s*STATS:(?P<fmt>.*?)-->)')
            match = regexp.search(s)
            if match:
                s = s[:match.start('cmt')] + \
                    match.group('fmt') % stats + \
                    s[match.end('cmt'):]
                response.content = s

        return response

After the view is called, I restore the debug setting and save the stats in a dict which will be used for argument replacement during the output formatting step. As I said earlier, I use a sort of poor-man's templating to render the stats into the page on output. An HTML comment of the form

<!-- STATS: format_string -->

will be replaced with the output of format_string after formatting against the stats dict. This blog uses the string:

<!-- STATS: Total: %(totTime).2f Python: %(pyTime).2f
     DB: %(dbTime).2f Queries: %(queries)d -->

I put this inside the footer <div> and format through the CSS style sheet.

The replacement block in the code deserves a couple of comments. I wanted to make sure the string formatting was not applied against the entire content of the page so that random % symbols wouldn't cause problems. So I search for the special comment tag and then apply the formatting only to the subsection where it was found. The resulting content is then assembled with the untouched bounding parts of the original conent and the modified subsection.

Immediately Helpful

Soon after I got this working, it helped me identify a logic bug in my code. I had created a view that showed the position of an investment fund in a fairly large table (about 700 rows). The page took a few seconds to build and render. The statistics at the bottom of the page showed a whopping 2813 queries! Yikes!

The reason became immediately clear: each row in the table displayed a Position object as well as fields from the four other models it had ForeignKey relations with. Thus, each time a row was evaluated in the template, there were four individual database queries to look up the related data. I simply added select_related() to the end of my main query and the query count dropped to 4 and the execution time was cut to less than half the original 2.5 seconds. Nice.


12 Comments

Posted by
Matt Moye
Tuesday July 11, 2006
12:52 p.m.

Hi David,

I'm guessing you're formerly from Severn? If not, I have the wrong guy, but if that's you, I've been trying to get a hold of you.

Matt mmoye at smithhanley ! com


Posted by
Tim
Saturday August 5, 2006
7:01 a.m.

Just a quick note to let you know I'm using your code, it's so simple yet so effective!! This is exactly why I like using Django, thanks for sharing your code.


Posted by
David Avraamides
Sunday August 6, 2006
10:27 p.m.

Tim: Glad you could make use of it. If you have any ideas on integrating with the page besides an embedded HTML comment, I'd love to hear them. That's one aspect of this I wasn't too happy with.


Posted by
can xiang
Friday August 11, 2006
10 p.m.

Thanks david, it's nice and I'm surprised by the stats of some of my view. I have more database tweak to do now.


Posted by
Gary Wilson
Saturday August 12, 2006
11:42 p.m.

Nice work! I added you middleware to contributed middleware on the Django wiki.


Posted by
Christian Joergensen
Thursday September 7, 2006
7:38 p.m.

Your middleware application does not seem to work with the flat pages middleware. As if process_view is not called.


Posted by
Luc
Friday September 29, 2006
12:38 p.m.

Man you rule!

Kidding aside, I just dropped in your code, and it works beautifully. How long before it gets included in Django ? ;-)

Thanks!


Posted by
David Avraamides
Friday September 29, 2006
1 p.m.

Christian: Flatpages don't go through process_view. I wrote this specifically to time view processing, and the resulting database usage behind it. A flatpage won't use the database, but you could use the same approach for flatpages if you wanted to time the Python part. Just override process_request instead.

Luc: Thanks. Gary posted it to the Django wiki: http://code.djangoproject.com/wiki/PageStatsMiddleware


Posted by
Rodney Hawkins
Thursday February 1, 2007
4:40 a.m.

Thanks for such a useful middleware. I use it now on all my app's to constantly help me re-evaluate and improve my code. Definately a keeper!


Posted by
Udi
Friday March 23, 2007
3:18 p.m.

Thanks Dave, this is so obvious and useful in retrospect. Very cool :)


Posted by
Nick Fishman
Saturday April 7, 2007
4:41 p.m.

This is very, very useful. It should definitely be mentioned in Django's Middleware documentation.


Posted by
leehinde
Tuesday February 19, 2008
4:36 p.m.

Hi;

I'm a newbie to all of this (python, django, regx, blech) and this has been tremendously helpful. Thanks!


Closed for comments