Memcached and Google App Engine

Posted: October 9th, 2009 | Author: Giv | Filed under: Python, Tutorials | No Comments »

Memcached is your friend.

I have to admit I’m new to the caching world. In the past I’ve let my databases do the work for me. Caching mySQL queries and indexing helped a lot with speed and scalability and I never really knew how to optimise HTTP requests in my mashups (p.s. that’s the last time you’ll hear me use that horrid word).

When I started working at the BBC, I quickly realised my applications needed to scale better. We are no longer talking about a few hundred page loads an hour – more like millions. Thankfully, I learned a lot about scalability and the power of distributed memory caching. With memcached I was able to make fewer requests and this meant less load on the database and that made a drastic improvement on page load speeds.

Now I have a cheap shared host running my personal sites and it’s unlikely to have a service like memcache on these cheap plans but I was able to experiment on Google App Engine using Python and Google’s memcache service. I figured I would do a mini tutorial here as my first blog entry.

The function of memcache is really basic. Let’s say we want to grab my latest tweets and display them on my site. Let’s assume I get thousands of visitor on my site so every time someone loads my page, I have to make an HTTP request to Twitter’s API, get the data, parse it and display it.

This is unnecessary. All visitors would see the same list of tweets so why make a separate request for each? It makes more sense to cache the results the first time it is requested and then serve up the cached version to the rest of the visitors. That’s where memcache comes in. We can store the results of either an HTTP request or a database query for a set duration, say 60 seconds. By doing so, we’ve gone from making hundreds of HTTP requests a minute to just 1. This naturally improves performance and I’m sure Twitter’s operations guys will thank you for it.

Ok so let’s look at how we would do this in App Engine.

Let’s create a method called getTweets(). This will do one of two things. First it’ll ask memcache if there is an existing cache for my tweets. If so, it’ll return the list. Otherwise, make the HTTP call, get back data and store in memcache for the next user then return the data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from google.appengine.api import memcache
from django.utils import simplejson as json
 
data = getTweets()
 
def getTweets():
    data = memcache.get("twitter")
    if data is not None:
        return data
    else:
        result = urlfetch.fetch(url="http://twitter.com/statuses/user_timeline/givp.json")
        if result.status_code == 200:
            data = json.loads(result.content)
 
        memcache.add("twitter", data, 60)
        return data

That’s it! The thing using the getTweets() method doesn’t need to worry about how the data is being fetched because the returned results are exactly the same either way.

You’ll notice in the add and get methods I have used “twitter” as the first parameter. This is the key or identifier of the cached item and you can use whatever you like. For example, if you wanted to cache the tweets of multiple people, you could use a key like this: “twitter~person1″ and “twitter~person2″. As long as both the add and get methods use the same key.

And there you have it. My tweets are cached for 60 seconds. You can try this right now and actually see the difference. The first time you run this code it takes a while for the HTTP request but if you run it again after the initial execution the results are returned in a split second.

The numbers speak for themselves. These are average page load speeds:

Without memcache: 0.869s
With memcache: 0.148s

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

Leave a Reply