When I posted my last entry I didn’t think the next one would be 4 months later! I have been extremely busy with work and haven’t had much time to experiment with anything during my spare time. I have been involved with the National Museums Online Learning Project over at the V&A in London and I am in the process of creating a federated search component across 9 national museums. I have been fortunate enough to also be involved with helping some of the project partners develop/improve their existing collection search pages on their own site.
Currently I am experimenting with content analysis or auto-tagging. I initially decided to follow in the footsteps of PHM’s collection search and use Open Calais to see what it believes to be significant in the object description. I have to admit I was a bit disappointed. I don’t think OC is suitable for museum content since it mostly looks for news-related keywords:
Entities
Anniversary, City, Company, Continent, Country, Currency, EmailAddress, EntertainmentAwardEvent, Facility, FaxNumber, Holiday, IndustryTerm, MarketIndex, MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalDisaster, NaturalFeature, OperatingSystem, Organization, Person, PhoneNumber, Product, ProgrammingLanguage, ProvinceOrState, PublishedMedium, RadioProgram, RadioStation, Region, SportsEvent, SportsGame, SportsLeague, Technology, TVShow, TVStation, URL
Events and Facts
Acquisition, Alliance, AnalystEarningsEstimate, AnalystRecommendation, Bankruptcy, BonusShares, BusinessRelation, Buybacks, CompanyAffiliates, CompanyCustomer, CompanyEarningsAnnouncement, CompanyEarningsGuidance, CompanyInvestment, CompanyLegalIssues, CompanyLocation, CompanyMeeting, CompanyReorganization, CompanyTechnology, CompanyTicker, ConferenceCall, CreditRating, FamilyRelation, IPO, JointVenture, ManagementChange, Merger, MovieRelease, MusicAlbumRelease, PersonAttributes, PersonCommunication, PersonEducation, PersonPolitical, PersonPoliticalPast, PersonProfessional, PersonProfessionalPast, PersonTravel, Quotation, StockSplit
When I attempted to extract the significant keywords from the following object, I only got 2 tags back:
There are clearly other words in the description that are meaningful. What about the most obvious keyword, “dress”?!
I then tried using Yahoo’s Content Extraction service and I was much happier with the results:
Of course, the advantage of Open Calais is that it neatly groups your tags into specific categories (see the list above) but again, this being a Reuters project, it is very much news-centric and it ignores a lot of important semantic metadata we want to see with museum content.

