iPhone App Development with Google App Engine

Posted: February 14th, 2010 | Author: Giv | Filed under: Google App Engine, Objective-c, Python | No Comments »

After almost a year of messing around with various iPhone development alternatives such as Phonegap and Titanium, I finally decided to learn Objective-C and do it all properly. I actually think those other frameworks are brilliant as they allow you to use familiar languages like Javascript to quickly create nice apps for both iPhone and Android. But since they rely heavily on the web view element for loading HTML, creating sophisticated apps like Skype would be impossible.

So I set out to create an app for the iPhone with Objective-C. My app is pretty simple. It basically pulls in RSS news, audio podcast and video podcast feeds into a UITableView list, allowing the user to read, listen and watch news stories from the Democracy Now! website.


I managed to put together the app pretty quickly but I ran into a lot of issues when I tried to parse and massage the XML data. For starters, cocoa does not have native support for regular expressions (but there are several external libraries). I wanted to clean up the content I was getting back before displaying it to the user but I soon realised something that would normally take me a few minutes in Python/PHP/Javascript would take a lot longer in Objective-C. Parsing XML using NSXMLParser was an absolute nightmare and extremely slow. I rarely work with XML these days and find JSON a much easier protocol to deal with. I even tested the app with some sample JSON data using the excellent json-framework libarary and it was much easier and faster. Alas, I only had RSS feeds to work with.

The other problem I ran into was slow HTTP requests. It would sometimes take up to 20 seconds just to load the first screen. This was due to a combination of slow connection speeds, long response times from the data provider and a slow XML parser.

The solution I came up with was to do as little as possible in the phone app as far as the data was concerned. I decided to use Google App Engine to fetch the data from the source, parse, rejig, massage and beautify in Python, then serialise and return the results in JSON to the phone app to use.

It may sound like this would increase response times even more since the phone would have to first call GAE, then GAE would need to call the data source and then all the way back to the phone. This is true, however, once the data is with GAE we have the luxury of using memcache and datastore. The RSS and podcast feeds are updated once a day so there’s no reason to request the data from the source every time the user loads the app. Because each time we have to make the HTTP call, parse the data and load it up. This is extremely slow and unnecessary. We can just make one request a day, then parse, cleanup and cache the results for the next user that requests it.

So the app only talks to GAE. GAE first checks memcache to see if we have a cached version. If we don’t, it will make the HTTP call, fetch the data, parse, serialise, cache and return results. If we do have a cached version, there’s nothing else to do but to return the data. A cron job will also run every 24 hours to make sure memcache is up to date.

If you really want a solid and reliable app, you need to think about all the edge cases also. What happens if the cache expires and the data provider’s website is down? At that exact moment a user loads the app only to get an error message saying there’s nothing to show. An unlikely scenario but not impossible. So the way I got around this issue was to store the serialised JSON output in GAE’s datastore as well. We always use the data from memcache but should memcache be empty and the data source down, we can switch over to the datastore and load yesterday’s content instead. Not ideal but better than having a broken app.

This is a bit of an overkill for such a simple app but it’s super fast and efficient and will work well for almost any app that relies on 3rd-party APIs. To be fair, it was my lack of experience with Objective-C that led me to using GAE. I feel much more comfortable in Python than Objective-C and I’m sure an experienced cocoa developer would have no problems parsing and massaging data in the app itself.

Of course there is one other edge case – Google App Engine could go down or worst, the interwebz could break. In which case, a simple error message will suffice.

You can download Democracy Now! app on iTunes

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

KALX 90.7 FM iPhone App

Posted: February 9th, 2010 | Author: Giv | Filed under: Objective-c | No Comments »

My first iPhone/iPod Touch app is out! I’m waiting for the approval of a second app. When that’s done, I’ll do a proper post about all things Objective-C and iPhone SDK.

KALX app on iTunes

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

Soup

Posted: January 10th, 2010 | Author: Giv | Filed under: Videos | 2 Comments »

I’ve been meaning to get back into video editing for a while so I spent the weekend putting this together:

Soup from Giv Parvaneh on Vimeo.

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
2 Comments »

Django Gravatar filter

Posted: December 13th, 2009 | Author: Giv | Filed under: Python, Tutorials | 1 Comment »

If you want to add user avatars to your Django app, you can certainly use the excellent django-avatar app. This will let your users upload/edit their own avatars or use Gravatar.

But for my app I only wanted to use Gravatar so I was looking for a simpler solution that let me just pass the user object and an optional size in a template filter and have Gravatar take care of the rest.

The solution is custom template tags. If you’re already used to using the built-in template filters, you’ll know how useful and easy they are. I wanted my Gravatar filter to be as simple as possible. Something like this:

1
{{ user|gravatar:20 }}

Where 20 is the optional width/height of the avatar. This would then create an img tag with the full Gravatar URL.

First create your ‘templatetags’ directory and associated files as instructed in the docs. Then create a function that takes in the user object and uses the email address to construct the Gravatar URL:

1
2
3
4
5
6
7
8
9
10
from django import template
import hashlib
from django.utils.safestring import mark_safe
register = template.Library()
 
@register.filter()
def gravatar(user, size=50):
    gravatar_url = "http://www.gravatar.com/avatar"
    emailHash = hashlib.md5(user.email.lower()).hexdigest()
    return mark_safe("<img src='%s/%s.jpg?d=identicon&s=%s' alt='' />" % (gravatar_url, emailHash, size))
Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
1 Comment »

Test-driven Development in Agile Projects

Posted: November 21st, 2009 | Author: Giv | Filed under: Experiments, PHP, Python | No Comments »

I recently posted this on the BBC Web Developer Blog:

http://www.bbc.co.uk/blogs/webdeveloper/2009/11/testdriven-development-in-agil.shtml

Developers at the BBC tend to use Agile methodologies as a way to quickly release iterations of products. But where does rigorous code testing fit in with the short development and release cycles? How can we maintain the quality of our code when things need to change so fast?

More

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

Django Geolocation

Posted: October 31st, 2009 | Author: Giv | Filed under: Python, Tutorials | 2 Comments »

One thing I love about Django models is the ability to subclass its methods to add extra functionality without having to write any extra code in admin or view layers.

I have an application where a user can enter an address in the admin section. I want to plot this location on Google Maps later but I don’t want to have to parse the address and do a reverse geolocation lookup in the view layer every time that page is viewed. The best thing to do is to store the lat/long values in the database.

I could do this by messing around with the Django admin templates but I’d rather not even let the user know the geolocation lookup is happening. Besides, what if I want to interact with the DB from the interpreter? The geolocation bit should happen no matter where the database is being used.

This is my model

1
2
3
4
5
6
7
8
9
class Entry(models.Model):
    title = models.CharField(max_length=200)
    description = models.TextField('description', blank=True, null=True)
    address = models.CharField(max_length=200, blank=True, null=True)
    postcode = models.CharField(max_length=200, blank=True, null=True)
    city = models.CharField(max_length=200, blank=True, null=True)
    country = models.CharField(max_length=100)
    geo_lat = models.DecimalField('latitude', max_digits=13, decimal_places=10, blank=True, null=True)
    geo_long = models.DecimalField('longitude', max_digits=13, decimal_places=10, blank=True, null=True)

In Django admin I don’t show ‘geo_lat’ and ‘geo_lat’. We just ask the user to enter the address, then before saving, we do the lookup, set the lat/long values and then save the model.

Creating a new entry would still be done the same way from either admin, view or interpreter:

1
2
e = Entry(title='a new entry', address='123 Smith Road', city='London', country='UK')
e.save()

But we are going to hijack the save() method to do some extra work before saving to the database. Let’s create a method that does the geolocation lookup first:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
import urllib
import urllib2
from django.utils import simplejson as json
 
def get_geo(address):
    address = urllib.quote(address)
    url = "http://maps.google.com/maps/geo?q=%s&output=json&oe=utf8&sensor=true_or_false&key=12345" % (address)
    data = urllib2.urlopen(url)
    obj = json.loads( data.read() )
    if obj['Status']['code'] == 200:
        data = obj['Placemark'][0]['Point']['coordinates']
    else:
        raise Exception('Invalid address')
    return data

We pass the address to Google and if we get a 200 status code, we grab the lat/long values and return them.

Now let’s call this method in our save() subclass (inside the Entry model):

1
2
3
4
5
6
def save(self):
    add = "%s, %s, %s, %s" % (self.address, self.postcode, self.city, self.country)
    geo_data = utils.get_geo(add)
    self.geo_long = str(geo_data[0])
    self.geo_lat = str(geo_data[1])
    super(Listing, self).save() # Call the "real" save() method

This could be improved so instead of throwing and exception for bad addresses we handle it more gracefully by informing the user or at least save the address and ignore the geolocation lookup. But either way, the model is now responsible for doing the extra work before saving the new/updated data.

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
2 Comments »

HTML Scraping – Python vs PHP (re-post)

Posted: October 26th, 2009 | Author: Giv | Filed under: Experiments, PHP, Python | No Comments »

I posted this last year on my old blog. The subject came up again today so I dug it up from the archives:

I usually hate these “X vs Y” discussions but this week I was working on a harvesting project and was trying to figure out whether I should go down the Python or PHP route. I have been using PHP for many years now so PHP was the obvious choice but recently I have been using Python a fair bit and the more I use it, the more I realise what a sloppy language PHP is.

So I set out to do some tests to see which would get my task done quicker.

For Python I used the Beautiful Soup module and for PHP I used PHP Simple HTML DOM Parser class.

The test:
Go to http://news.bbc.co.uk, look for the third <p> tag and return the text for the first link within that paragraph.

Both classes make this very easy to do so coding was not a concern:

in Python:

1
2
3
page = urllib2.urlopen("http://news.bbc.co.uk")
soup = BeautifulSoup(page)
print soup.findAll('p')[2].findAll('a')[0].string

in PHP:

1
2
$html = file_get_html('http://news.bbc.co.uk');
echo $html->find('p', 2)->find('a', 0)->innertext();

I ran each 5 times to measure the execution time.

Python:
0.622022151947 seconds
0.577415943146 seconds
0.518396139145 seconds
0.503247022629 seconds
0.482849121094 seconds

PHP:
0.430239915848 seconds
0.415632009506 seconds
0.408473014832 seconds
0.413187026978 seconds
0.411664962769 seconds

Pretty damn close but PHP is on average a bit faster it seems.

To be fair, this really isn’t a very good way to measure the performance of the two. The real test, in my opinion, would be to see how scalable each method is and how they handle memory management once I start scraping say the entire Wikipedia collection. I could be wrong here but from what I’ve read so far, Python is the tool of choice for such heavy processing tasks.

Either way, I think I’ll go with Python :)

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

App Engine Datastore API – Tutorial 2

Posted: October 15th, 2009 | Author: Giv | Filed under: Python, Tutorials | No Comments »

In the last tutorial we quickly created a couple of models for storing our recipes and ingredients. In this tutorial we are going to assign ingredients to recipes as a one-to-many relationship.

But before we do that, we need to modify our data models a bit. The Recipe model stays the same but we’ll need a way of referencing the ingredients to add them to a single recipe. Like an SQL foreign key. We can do this using the “ReferenceProperty”:

1
2
3
4
class Ingredient(db.Model):
    name = db.StringProperty()
    recipe = db.ReferenceProperty(Recipe, collection_name='recipe_items')
    created = db.DateTimeProperty(auto_now_add=True)

The recipe ReferenceProperty allows us to associate Recipe objects to ingredients. Creating this relationship is straight forward:

1
2
3
4
5
6
7
# first create a recipe
recipe = Recipe(title = "Caprese Salad", description = "Light Italian classic").put()
 
# now create 3 ingredients and add them to the recipe object
Ingredient(recipe=recipe, name='Tomato').put()
Ingredient(recipe=recipe, name='Mozzarella').put()
Ingredient(recipe=recipe, name='Basil').put()

We just created 4 entries, 1 recipe and 3 ingredients. We can query them individually:

1
2
recipes = Recipe.all()
ingredients = Ingredient.all()

Except we’ve created a relationship between these entries so we can loop through all of our recipes and for each check to see if there are any associated ingredients. We can do this by using the collection_name we specified in our Ingredient model (“recipe_items”). So we can just get back a list of our recipes and send the whole list to the view like we did last time:

1
2
3
4
5
6
recipes = Recipe.all()
recipes.order("title")
recipeResults = recipes.fetch(limit=40)
 
# again, I'm just using Django templates
return render_to_response('main/index.html', {'recipeResults': recipeResults})

We don’t need to perform a separate query in the controller to get back the ingredients. All associated ingredient recipes are referenced so we can access them directly in the view:

1
2
3
4
5
6
7
8
9
10
11
<ul>
    {% for recipe in recipeResults %}
        <li>			
            {{ recipe.title }} 
#now for each recipe we'll loop through its associated ingredients
               ({% for ing in recipe.recipe_items %}
                   {{ ing.name }},
               {% endfor %})
        </li>
    {% endfor %}
</ul>

If you want to go further, I recommend checking out the modeling entity relationship docs. There are plenty of great examples.

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

App Engine Datastore API – Tutorial 1

Posted: October 11th, 2009 | Author: Giv | Filed under: Python, Tutorials | No Comments »

I’ve spent the last few days messing about with Google App Engine and I have to admit I’m liking it – a lot!

As a web engineer, the most tedious part of developing for me is configuring environments and setting up databases. I just want to code and not have to worry about configuration and optimisation. So I guess that’s why I’m enjoying App Engine so much.

One of the really exciting aspects of developing in AE so far has been working with the modeling API. It’s nice not having to deal with messy SQL schemas and complicated joins. It’s all pure OO programming. There’s a lot of good documentation on the Google site but this will help you get going quickly.

For our sample app, we want to create a recipe site. We’ll keep it simple for now. We have recipes and each recipe has a bunch of ingredients so we’ll create an object that represents a single recipe and one that represent a single ingredient.

1
2
3
4
5
6
7
8
class Recipe(db.Model):
    title = db.StringProperty()
    description = db.TextProperty()
    created = db.DateTimeProperty(auto_now_add=True)
 
class Ingredient(db.Model):
    name = db.StringProperty()
    created = db.DateTimeProperty(auto_now_add=True)

This should be pretty straight forward. In SQL terms, we have created 2 tables with a bunch of fields and each field has a certain property like string, text, datetime etc. You don’t have to run any commands to create the database/tables. It’s all done at runtime.

Now that we have our models, let’s populate them.

1
2
3
4
5
6
7
8
9
# create a new recipe
recipe = Recipe()
 
# set the values of each field
recipe.title = "Caprese Salad"
recipe.description = "Light Italian classic"
 
# save it!
recipe.put

You can also do the above like this:

1
recipe = Recipe(title = "Caprese Salad", description = "Light Italian classic").put()

You can keep adding more recipes by creating a new instance of the Recipe model.

Now that we have a bunch of recipes, let’s get them all out and then output them in our view layer as html

1
2
3
4
5
6
recipes = Recipe.all()
recipes.order("title")
recipeResults = recipes.fetch(limit=40)
 
# I'm just using standard Django templates here
return render_to_response('index.html', {'recipeResults': recipeResults})

And in our template we can just loop through the results

1
2
3
4
5
6
7
<ul>
{% for recipe in recipeResults %}
	<li>
		{{ recipe.title }} - {{ recipe.description }} 
	</li>
{% endfor %}
</ul>

That’s it! and not a single line of SQL.

In the next tutorial we’ll create some ingredients and I’ll show how to do one-to-many relationships.

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »

Memcached and Google App Engine

Posted: October 9th, 2009 | Author: Giv | Filed under: Python, Tutorials | No Comments »

Memcached is your friend.

I have to admit I’m new to the caching world. In the past I’ve let my databases do the work for me. Caching mySQL queries and indexing helped a lot with speed and scalability and I never really knew how to optimise HTTP requests in my mashups (p.s. that’s the last time you’ll hear me use that horrid word).

When I started working at the BBC, I quickly realised my applications needed to scale better. We are no longer talking about a few hundred page loads an hour – more like millions. Thankfully, I learned a lot about scalability and the power of distributed memory caching. With memcached I was able to make fewer requests and this meant less load on the database and that made a drastic improvement on page load speeds.

Now I have a cheap shared host running my personal sites and it’s unlikely to have a service like memcache on these cheap plans but I was able to experiment on Google App Engine using Python and Google’s memcache service. I figured I would do a mini tutorial here as my first blog entry.

The function of memcache is really basic. Let’s say we want to grab my latest tweets and display them on my site. Let’s assume I get thousands of visitor on my site so every time someone loads my page, I have to make an HTTP request to Twitter’s API, get the data, parse it and display it.

This is unnecessary. All visitors would see the same list of tweets so why make a separate request for each? It makes more sense to cache the results the first time it is requested and then serve up the cached version to the rest of the visitors. That’s where memcache comes in. We can store the results of either an HTTP request or a database query for a set duration, say 60 seconds. By doing so, we’ve gone from making hundreds of HTTP requests a minute to just 1. This naturally improves performance and I’m sure Twitter’s operations guys will thank you for it.

Ok so let’s look at how we would do this in App Engine.

Let’s create a method called getTweets(). This will do one of two things. First it’ll ask memcache if there is an existing cache for my tweets. If so, it’ll return the list. Otherwise, make the HTTP call, get back data and store in memcache for the next user then return the data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
from google.appengine.api import memcache
from django.utils import simplejson as json
 
data = getTweets()
 
def getTweets():
    data = memcache.get("twitter")
    if data is not None:
        return data
    else:
        result = urlfetch.fetch(url="http://twitter.com/statuses/user_timeline/givp.json")
        if result.status_code == 200:
            data = json.loads(result.content)
 
        memcache.add("twitter", data, 60)
        return data

That’s it! The thing using the getTweets() method doesn’t need to worry about how the data is being fetched because the returned results are exactly the same either way.

You’ll notice in the add and get methods I have used “twitter” as the first parameter. This is the key or identifier of the cached item and you can use whatever you like. For example, if you wanted to cache the tweets of multiple people, you could use a key like this: “twitter~person1″ and “twitter~person2″. As long as both the add and get methods use the same key.

And there you have it. My tweets are cached for 60 seconds. You can try this right now and actually see the difference. The first time you run this code it takes a while for the HTTP request but if you run it again after the initial execution the results are returned in a split second.

The numbers speak for themselves. These are average page load speeds:

Without memcache: 0.869s
With memcache: 0.148s

Share
  • Twitter
  • Facebook
  • Digg
  • del.icio.us
  • Google Bookmarks
No Comments »