Flickr Geocodr – k-means cluster enabled geocoder

Here’s a geocoder that finds places based on the combined knowledge – the tags, title or description of geotagged Flickr Photos. Example Application. Now that Flickr has 10 Million Geotagged pics, there’s a fair chance that people will add tags to them that describes the location. So searching for photos tagged with Manchester, will probably bring up a lot of photos that are located in Manchester. However, it will also bring up other groups of photos, and this is where clustering comes into play. geocodr.png

This work is inspired by Mikel Maron’s Flickr Geocoder, which grabs photos via the geoRSS feed, and uses the mean value of the locations. The Flickr GeoRSS feed Mikel used also includes photos that have no locational information – so we have to use an API call( to grab more geotagged photos, and a simple mean value doesn’t take into account the clustering of photos that are found, for multiple areas.

I ported over and changed a Java k-means clustering algorithm into PHP. The clustering process seems to be very fast.

this screenshot shows a search for Manchester across the world, and shows the number of clusters. It picks out the cluster with the largest number of points within it.

You can get different results by changing a number of parameters both in the clustering and the flickr api call. I found that three or four clusters gave a good result, the number of points around 50 was sufficient, but a larger sample would give a better answer, searching by tag or text, using a bounding box etc, could improve or change results.

In the Example Application, and as default setting on the geocoder, it returns photos based on “interestingness” rather than “relevance” or date. This seemed to give a good spread of different authors, and photos.

This is clustering based on geographical proximity, but how about clustering based on other variables? The similarity of other tags? Colours in the photo? Date or time taken? A multi-variate clustering may be worth looking at. Dan Catt has talked about clustering recently too.

Possible things for the future:
Automatically search by text if no results are given by tags.
Make pure clustering webservice.
Return photo and cluster, points information back in response.

Edits: For the Ning users:
I made use of <xn:head> to insert the relevant OpenLayers javascript code, and marker code.
Since Ning uses dojo, I used that to communicate via javascript to the webservice:

var bindArgs = {
url: “Flickr/flickrgeocodr.php”,
method: “get”,
content: {“place”: place },
mimetype: “text/xml”,
load: function(type, data) {
doPlace(data, place)