I have been trying to think of a clever but relatively simple way to use characteristics derived from photos to search for similar photos/objects in a database. I wanted this process to be completely automated the same way Open Calais uses keywords to tag and connect blocks of text.
One obvious way is to use a tool similar to Flickr’s notes and allow users to select portions of the photo and tag or label them. We can then search the database for similar tags in photos and establish a network. But this requires a fair bit of work and I’m not sure a lot of users would bother using this tool.
So how do you automate this process without using complex image recognition software and an MIT degree? Simple answer, you probably can’t. But we can do something else that doesn’t require a lot of work, it is completely automated, the tools are free and it allows you to connect your collection objects through a medium other than the standard tags and text search.
Colour!
I got this idea from an experimental tool created by Jim Bumgardner a while ago and decided to create my own version to use against museum collections.
Take this image from the Walker Art Center database for example:

It would be nice to display say 4 of the most commonly used colours in this photo and then allow users to search for other objects with the same colours. Here’s what I did:
1. Dither image down to 256 colours
2. Cycle through every single pixel, and extract RGB values
3. Omit white colours (since most objects will have the colour white)
4. Omit duplicate colours from the spectrum
5. Get an average of all the remaining colours and grab the top 4 colours
Now we have 4 of the most prominent colours in this photo and here they are:

We can now store these colour values in the database along with other information about the object.
How do we use the colour information?
We can either store the hex values of the image and then allow users to search by colour. Eg. “show me all objects that contain the hex colour #CC3399.
Or we can have more intelligent colour searches by storing the RGB values and performing a proximity search by selecting objects that contain the colours close to #CC3399. For example, we search for the colour R: 100, G: 100, B: 100. But instead of getting an exact colour match, we can allow the database to use an approximate value of +-10 for each colour channel so it finds similar shades of the requested colour.
For the sake of experimentation, I wanted to go one step further. It’s nice having the option to display objects by a single colour, but what if I wanted to search for all objects that have a dark blue in the middle only? Now this is starting to look more serious. The example image above has a lot of blue in it but the strongest shades of blue appear horizontally in the middle. So we should be able to search for objects with a similar pattern. Here’s how I tackled this challenge.

1. Slice the image into 16 segments. This is just an arbitrary number. 4 didn’t seem like it was enough and 32 was too much.
2. Cycle through the pixels of each segment and store a SINGLE average RGB value for that segment i.e. the most commonly used colour in that segment.
3. Store the value in the database associating the colour with the segment #1.
4. Repeat the process for all 16 segments and store in database.
Once we are finished, this is the information we end up with:

It looks like a low-resolution version of the original image and that’s exactly what it is. Except we now know which colour is used in which part of the image. Now using the proximity search method mentioned earlier and with some fancy SQL queries, we should be able to not only find colour matches but also patterns. The colour data here should return all objects that are predominantly blue where the darkest shades are in the middle of the image.
Conclusion
The processes above seem like the CPU is doing a lot of work; but it really isn’t. I have tested this process on several images and the entire analysis for each image took less than 1 second.
It may not be necessary to use the second method at all. Doing a basic colour search may be sufficient in most cases. Having the ability to search a collection database by colour is pretty powerful in my opinion and adding this extra metadata to your objects won’t require much work at all!
So why not just batch process your images and store the colour value in the database rather than processing each time? Its not as if the images change much.
This is pretty much what the Sculpteur people at Uni of Southhampton were doing a few years back - they were also doing shape recognition with what i think became an open source toolset.
See - http://www.archimuse.com/mw2005/papers/addis/addis.html
I don\’t know what became o that project but i expect that content provision rather than technical barrier befell it.
Something like this is actually on my list to experiment with - thanks for posting your results! I think it would be really useful to grab the dataset from this experiment so you could start searching based on color words rather that just rgb. Could be pretty cool…
Yes, you would definitely do this as a batch process once.
I did some experimentation with multi-color analysis some months bzck, including a two-color picker, and this thing:
http://www.krazydad.com/colrpickr/2colors.php
My method for determining the top-N colors was a little different than what you described. I used something called k-means cluster analysis, although I imagine the results are pretty similar. I also used this technique to weed out unsaturated images from the “Catchy Colors” pool, and to identify images with too many mismatched colors in the “Color Fields” pool.
Hi Jim. I love your work and I’ll definitely check out the k-means algorithm. Thank you.