Page Stats Middleware
Some websites have a nice feature where they will show you a summary of the performance metrics measured during the generation of a page. I thought this would be a nice feature for some Django-based web applications I was developing at work. When a page takes a while to render, I like to quickly diagnose where the problem might be.
I wanted to know the time spent generating the
page in total as well as the breakdown between
Python (Django) and the database. Additionally, I
wanted to know how many times the database was
queried as too many queries can be a hint that you
aren't accessing the database efficiently.
Since I wanted the feature to work across any Django app, a custom middleware class seemed like the right approach. Writing a custom middleware is quite easy and, like so many other things in Django, well documented.
Calculating the Metrics
Luckily, the Django developers had already
thought of the value of collecting some basic
database statistics, so I simply had to figure out
how to access them. When DEBUG mode is
on (through your settings file), a database backend
class wraps a cursor in a
CursorDebugWrapper object which keeps
a list of each SQL query executed along with its
execution time. I just had to make sure
DEBUG was enabled, if not already, and
then add up the time of any calls incurred during
this page's invocation.
The total time to process the view is measured
in the middleware's process_view
routine and then the time spent in Python (i.e
Django) was easy to back out by timing the entire
view call and then subtracting out the database
time.
Viewing the Stats
The last part of the puzzle was how to put this information in the page itself. Normally in Django you would use a custom template tag for such a chore but I couldn't since I wouldn't have completed timing the view call until the template had already been rendered.
I decided to follow the spirit of template substitution by creating a special HTML comment that described the format of the output and then replacing the placeholder with the formatted output as the middleware returned the view to its caller.
The StatsMiddleware Class
The code is pretty straightforward. First, I save the state of the debug setting and then enable debugging. This triggers the use of the debug wrapper class in the database backend which will keep the stats of the database part of the execution. Then I save the size of the connection's queries where this debug info is stored so I will know which queries were additionally called during this view's invocation. Then I build the view, keeping track of its time
import re from operator import add from time import time from django.db import connection class StatsMiddleware(object): def process_view(self, request, view_func, view_args, view_kwargs): # turn on debugging in db backend to capture time from django.conf import settings debug = settings.DEBUG settings.DEBUG = True # get number of db queries before we do anything n = len(connection.queries) # time the view start = time() response = view_func(request, *view_args, **view_kwargs) totTime = time() - start # compute the db time for the queries just run queries = len(connection.queries) - n if queries: dbTime = reduce(add, [float(q['time']) for q in connection.queries[n:]]) else: dbTime = 0.0 # and backout python time pyTime = totTime - dbTime # restore debugging setting settings.DEBUG = debug stats = { 'totTime': totTime, 'pyTime': pyTime, 'dbTime': dbTime, 'queries': queries, } # replace the comment if found if response and response.content: s = response.content regexp = re.compile(r'(?P<cmt><!--\s*STATS:(?P<fmt>.*?)-->)') match = regexp.search(s) if match: s = s[:match.start('cmt')] + \ match.group('fmt') % stats + \ s[match.end('cmt'):] response.content = s return response
After the view is called, I restore the debug
setting and save the stats in a dict
which will be used for argument replacement during
the output formatting step. As I said earlier, I
use a sort of poor-man's templating to
render the stats into the page on output. An HTML
comment of the form
<!-- STATS: format_string -->
will be replaced with the output of
format_string after formatting against
the stats dict. This blog uses the
string:
<!-- STATS: Total: %(totTime).2f Python: %(pyTime).2f DB: %(dbTime).2f Queries: %(queries)d -->
I put this inside the footer
<div> and format through the CSS
style sheet.
The replacement block in the code deserves a
couple of comments. I wanted to make sure the
string formatting was not applied against the
entire content of the page so that random
% symbols wouldn't cause problems. So
I search for the special comment tag and then apply
the formatting only to the subsection where it was
found. The resulting content is then assembled with
the untouched bounding parts of the original conent
and the modified subsection.
Immediately Helpful
Soon after I got this working, it helped me identify a logic bug in my code. I had created a view that showed the position of an investment fund in a fairly large table (about 700 rows). The page took a few seconds to build and render. The statistics at the bottom of the page showed a whopping 2813 queries! Yikes!
The reason became immediately clear: each row in
the table displayed a Position object
as well as fields from the four other models it had
ForeignKey relations with. Thus, each
time a row was evaluated in the template, there
were four individual database queries to look up
the related data. I simply added
select_related() to the end of my main
query and the query count dropped to 4 and the
execution time was cut to less than half the
original 2.5 seconds. Nice.