Here’s a geocoder that finds places based on the combined knowledge – the tags, title or description of geotagged Flickr Photos. Example Application. Now that Flickr has 10 Million Geotagged pics, there’s a fair chance that people will add tags to them that describes the location. So searching for photos tagged with Manchester, will probably bring up a lot of photos that are located in Manchester. However, it will also bring up other groups of photos, and this is where clustering comes into play.
This work is inspired by Mikel Maron’s Flickr Geocoder, which grabs photos via the geoRSS feed, and uses the mean value of the locations. The Flickr GeoRSS feed Mikel used also includes photos that have no locational information – so we have to use an API call(flickr.photos.search) to grab more geotagged photos, and a simple mean value doesn’t take into account the clustering of photos that are found, for multiple areas.
I ported over and changed a Java k-means clustering algorithm into PHP. The clustering process seems to be very fast.
this screenshot shows a search for Manchester across the world, and shows the number of clusters. It picks out the cluster with the largest number of points within it.
You can get different results by changing a number of parameters both in the clustering and the flickr api call. I found that three or four clusters gave a good result, the number of points around 50 was sufficient, but a larger sample would give a better answer, searching by tag or text, using a bounding box etc, could improve or change results.
In the Example Application, and as default setting on the geocoder, it returns photos based on “interestingness” rather than “relevance” or date. This seemed to give a good spread of different authors, and photos.
This is clustering based on geographical proximity, but how about clustering based on other variables? The similarity of other tags? Colours in the photo? Date or time taken? A multi-variate clustering may be worth looking at. Dan Catt has talked about clustering recently too.
Possible things for the future:
Automatically search by text if no results are given by tags.
Make pure clustering webservice.
Return photo and cluster, points information back in response.
Edits: For the Ning users:
I made use of <xn:head> to insert the relevant OpenLayers javascript code, and marker code.
Since Ning uses dojo, I used that to communicate via javascript to the webservice:
var bindArgs = {
url: “Flickr/flickrgeocodr.php”,
method: “get”,
content: {“place”: place },
mimetype: “text/xml”,
load: function(type, data) {
doPlace(data, place)
}
};
dojo.io.bind(bindArgs);
}
Pingback: Brain Off » flickr Geocodr
Great. I was thinking that a tag-based geocoder should be coming from somewhere soon, glad to see a real implementation! Maybe we’ll play a bit with your API, I’m curious to see what happens…
Pingback: geobloggers » Blog Archive » Links for 2006-11-27 - The Collective Intelligence Edition
Glad you like it! I’m working on enabling GeoRSS feeds as an output, and optional additional cluster information, so you can find out where it placed the other clusters, what points are within them, and their bounding boxes.
-Tim
Pingback: geobloggers » Blog Archive » TagMaps from Yahoo! Research Berkeley
Pingback: Brain Off » Datamining Folksonomy
Pingback: High Earth Orbit » Blog Archive » TwitterVision (geo-twitter) API
Hi, is the code for the k-means algorithm in php available?
sure! just sign up to Ning and have a look at the code đŸ™‚
So is the php code available?
Yes it is!! See the comment above! heh
… or for the more “gimme a link to click” kind of person, here’s a link
http://www.ning.com/view-source.html?appUrl=Geothings
Hey I’m trying to take a look at the ported clustering code and ning isn’t working for me. Is it available somewhere else?
There is a version of K Means in PHP
the code is here
(sorry for the double post)
K Means