Flickr Geocodr – k-means cluster enabled geocoder

Here’s a geocoder that finds places based on the combined knowledge – the tags, title or description of geotagged Flickr Photos. Example Application. Now that Flickr has 10 Million Geotagged pics, there’s a fair chance that people will add tags to them that describes the location. So searching for photos tagged with Manchester, will probably bring up a lot of photos that are located in Manchester. However, it will also bring up other groups of photos, and this is where clustering comes into play. geocodr.png

This work is inspired by Mikel Maron’s Flickr Geocoder, which grabs photos via the geoRSS feed, and uses the mean value of the locations. The Flickr GeoRSS feed Mikel used also includes photos that have no locational information – so we have to use an API call(flickr.photos.search) to grab more geotagged photos, and a simple mean value doesn’t take into account the clustering of photos that are found, for multiple areas.

I ported over and changed a Java k-means clustering algorithm into PHP. The clustering process seems to be very fast.

world-manchester4wide.jpg
this screenshot shows a search for Manchester across the world, and shows the number of clusters. It picks out the cluster with the largest number of points within it.

You can get different results by changing a number of parameters both in the clustering and the flickr api call. I found that three or four clusters gave a good result, the number of points around 50 was sufficient, but a larger sample would give a better answer, searching by tag or text, using a bounding box etc, could improve or change results.

In the Example Application, and as default setting on the geocoder, it returns photos based on “interestingness” rather than “relevance” or date. This seemed to give a good spread of different authors, and photos.

This is clustering based on geographical proximity, but how about clustering based on other variables? The similarity of other tags? Colours in the photo? Date or time taken? A multi-variate clustering may be worth looking at. Dan Catt has talked about clustering recently too.

Possible things for the future:
Automatically search by text if no results are given by tags.
Make pure clustering webservice.
Return photo and cluster, points information back in response.

Edits: For the Ning users:
I made use of <xn:head> to insert the relevant OpenLayers javascript code, and marker code.
Since Ning uses dojo, I used that to communicate via javascript to the webservice:

var bindArgs = {
url: “Flickr/flickrgeocodr.php”,
method: “get”,
content: {“place”: place },
mimetype: “text/xml”,
load: function(type, data) {
doPlace(data, place)
}
};
dojo.io.bind(bindArgs);
}

14 thoughts on “Flickr Geocodr – k-means cluster enabled geocoder

  1. Pingback: Brain Off » flickr Geocodr

  2. Great. I was thinking that a tag-based geocoder should be coming from somewhere soon, glad to see a real implementation! Maybe we’ll play a bit with your API, I’m curious to see what happens…

  3. Pingback: geobloggers » Blog Archive » Links for 2006-11-27 - The Collective Intelligence Edition

  4. Glad you like it! I’m working on enabling GeoRSS feeds as an output, and optional additional cluster information, so you can find out where it placed the other clusters, what points are within them, and their bounding boxes.
    -Tim

  5. Pingback: geobloggers » Blog Archive » TagMaps from Yahoo! Research Berkeley

  6. Pingback: Brain Off » Datamining Folksonomy

  7. Pingback: High Earth Orbit » Blog Archive » TwitterVision (geo-twitter) API

Leave a reply to daniel Cancel reply