Archive for October, 2008

Yahoo Search Monkey for Collections

I have to admit I can’t remember the last time I used Yahoo as a search engine. But I recently looked into Yahoo’s Search Monkey and was very impressed. Normally, you create your web page, Google picks it up and provided you’ve got a proper HTML title, meta tags etc, Google will display the title and description in its search results. Well and good. But wouldn’t it be nice if you had some control over how the search results were displayed?

Have a look at the search results for “The Smith” in Yahoo’s search engine. Notice how the Last.fm results display extra information such as genre, similar artists and even a thumbnail:

I think this is pretty amazing and it would be perfect for museum collection objects.

Here’s a result from the British Museum with the default search engine format:

It doesn’t really look any different from any other search record. So let’s beautify it a bit by showing the object thumbnail, the dimensions and the owner:

Much nicer. This took me just a few minutes to achieve and you can pretty much put anything you like in these result blocks. It doesn’t require any extra set-up from your end and all you need to do is to tell Yahoo which elements on the page you want to display using XPath.

There’s obviously a lot more you can do with Search Monkey. You can set up patterns to apply the same format to all your pages and you can even use your own custom data feeds to display information that doesn’t exists on the page.

I’m hoping Google will do something similar very soon.

Content analysis and auto-tagging

When I posted my last entry I didn’t think the next one would be 4 months later! I have been extremely busy with work and haven’t had much time to experiment with anything during my spare time. I have been involved with the National Museums Online Learning Project over at the V&A in London and I am in the process of creating a federated search component across 9 national museums. I have been fortunate enough to also be involved with helping some of the project partners develop/improve their existing collection search pages on their own site.

Currently I am experimenting with content analysis or auto-tagging. I initially decided to follow in the footsteps of PHM’s collection search and use Open Calais to see what it believes to be significant in the object description. I have to admit I was a bit disappointed. I don’t think OC is suitable for museum content since it mostly looks for news-related keywords:

Entities
Anniversary, City, Company, Continent, Country, Currency, EmailAddress, EntertainmentAwardEvent, Facility, FaxNumber, Holiday, IndustryTerm, MarketIndex, MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalDisaster, NaturalFeature, OperatingSystem, Organization, Person, PhoneNumber, Product, ProgrammingLanguage, ProvinceOrState, PublishedMedium, RadioProgram, RadioStation, Region, SportsEvent, SportsGame, SportsLeague, Technology, TVShow, TVStation, URL

Events and Facts
Acquisition, Alliance, AnalystEarningsEstimate, AnalystRecommendation, Bankruptcy, BonusShares, BusinessRelation, Buybacks, CompanyAffiliates, CompanyCustomer, CompanyEarningsAnnouncement, CompanyEarningsGuidance, CompanyInvestment, CompanyLegalIssues, CompanyLocation, CompanyMeeting, CompanyReorganization, CompanyTechnology, CompanyTicker, ConferenceCall, CreditRating, FamilyRelation, IPO, JointVenture, ManagementChange, Merger, MovieRelease, MusicAlbumRelease, PersonAttributes, PersonCommunication, PersonEducation, PersonPolitical, PersonPoliticalPast, PersonProfessional, PersonProfessionalPast, PersonTravel, Quotation, StockSplit

When I attempted to extract the significant keywords from the following object, I only got 2 tags back:

There are clearly other words in the description that are meaningful. What about the most obvious keyword, “dress”?!

I then tried using Yahoo’s Content Extraction service and I was much happier with the results:

Of course, the advantage of Open Calais is that it neatly groups your tags into specific categories (see the list above) but again, this being a Reuters project, it is very much news-centric and it ignores a lot of important semantic metadata we want to see with museum content.