Cross posted from The Wikimaps Blog wikimaps.wikimedia.fi
The wonderful, prolific and very popular Maps Mania blog featured the Wikimaps Warper a few times recently, do check them out!
The first interactive map: The Vintage Maps of Berlin uses the Wikimaps Warper.
This collection of old historical maps of Berlin centers around what is now Museum Island in Berlin.
In the oldest maps you can clearly see the two towns of Cölln and Altberlin, on opposite banks of the River Spree. As you progress through the maps you can see how this area of Berlin’s has changed and developed over the centuries.
Do check out the 11 maps of Berlin from 1652 to today here: http://homepage.ntlworld.com/keir.clarke/leaflet/berlin.htm
The second post, and interactive map entitled Maps in the Age of Cholera based on an epidemiological map of Leeds (co-incidentally my home!).
This was also created by Keir and he writes:
Twenty Years before John Snow famously mapped the locations of cholera victims in Broad Street, London, Robert Baker plotted the deaths of cholera victims in Leeds.
Maps in the Age of Cholera is a story map based around Robert Baker’s ‘Sanitary Map of the Town of Leeds’ exploring the 1832 cholera epidemic in the Yorkshire town. Baker never made the link between cholera and contaminated water. However, in his map and in the accompanying report to the Leeds Board of Health, Baker noted that “the disease was worst in those parts of the town where there is often an entire want of sewage, drainage and paving”.
The map itself uses this Leaflet Story Map plug-in. The Leaflet Story Map library uses jQuery to create a scroll driven story map. The map tiles scheme for Robert Baker’s 1832 ‘Sanitary Map of the Town of Leeds’ comes from Wikimaps Warper.
do go check out the interactive story map here http://homepage.ntlworld.com/keir.clarke/leaflet/cholera.htm
The Wikimaps map warper – a free georeferencing georectification tool for historical maps and images hosted on Wikimedia Commons had some issues recently. Here’s a short document describing some diagnostics to identify the problems.
The wikimaps warper runs on the Wikimedia Labs infrastructure – essentially an OpenStack cluster of servers. The Labs is run by some great admins and dev ops and they are on the ball when it comes to rolling out needed updates to al the servers and instances. However each instance is generally admin’ed by people within individual projects. It was after they applied a kernel update due to a security patch and restarted all the machines that the warper was seen to be offline. The ultimate issue was that there was something wrong with the NFS – as many other instances and tools were having problems – I’m not going to blame the NFS because it showed up where the warper application was encountering issues.
The application is a ruby on rails application which runs using Apache and Passenger. It was timing out, it wouldn’t start. I ssh’d into the server. The passenger version was an old one (via a package, rather than a gem) and didn’t support increasing the timeout version. So I removed that, and installed a new version. The library would also be running using a faster version of Ruby to get running. Did all that, but the warper wouldn’t start.
I ssh’ed into the server and couldn’t see any errors anywhere. But running a console command, or a “rake -T” which would load up the Rails environment, took 11 minutes. ELEVEN MINUTES!
I use a little old atom cpu, 2gig memory netbook, and it takes less than 22 seconds to load up there. Enter Strace and ruby-prof. (Note that the screenshots and reports following are based on me getting the loading time down to something workable, but there still work to do)
strace is a linux command line tool which traces system calls and signals. It’s amazing for debugging things. I’ve hardly ever used it. I ran strace
strace rake -T
and watched for around 5 minutes the Rails loading process. I stopped it as it wasn’t telling me much and I wasn’t prepared to wait 11 minutes! But what it did was show me something like the following – the inefficient way Rails loads up all the libraries, the gems:
strace -e trace=open,stat,read,write bin/rake -T
to trace only certain file related calls: here is an example of what it looks like on my netbook
It shows, firstly that its doing lots of file calls, and secondly that its getting lots of misses. Not very efficient but also, totally normal and not revealing.
strace with summary report is what I wanted.
strace -o strace.txt -c bin/rake -T
gave something better:
You can see in this case that it’s taking around 5 seconds with open calls (the errors are the expected misses, I think). Compare this from my netbook
5 seconds vs 0.04 seconds. But what files was it opening, what was Ruby up to? Enter ruby-prof
ruby-prof -s total -p flat bin/rake -- -T > ruby_prof_flat.txt
ruby prof is more of a profiling tool. I’m using the command line utility, but its most often used in code, for example around a block or class. Check it out.
The ruby-prof can output its data in a few different formats or profiles. I tried the more colourful ones, which didn’t tell me much:
this is somewhat useful, but the document was massive and hard to compare or see where the bottlenecks were. You can see that “require” -the loading of the libraries was troublesome though.
colourful stack – not sure what this told me more but it looked pretty.
Before finally choosing the default option – flat:
here it is on the server
and compare that with my crappy netbook
So it gave a much better diagnostic. It’s slowing right down reading in the libraries, the gems. Now, on both machines, I am using rvm and both are loading from a home directory. But on the server, the home directory was on NFS, not a more local partition!
I copied the rvm directory to a local partition and symlinked things and now the application can start much quicker.
There’s probably more I can do to optimise this – I believe it’s still hitting the NFS in a few places, but it’s given me a good pointer in the right direction to making the wikimaps warper run better and stronger for the future.
The Wikimaps Wikimedia Commons Warper project has recieved an Individual Engagement Grant for 2016. Go read this post by Susanna Annas, project lead for the Wikimaps Warper introducing the work for the Warper 2.0 in the Wikimaps Blog. It’s one of fourteen projects that Wikimedia has funded.
Here is what we are working towards – a complete redesign of the interface.
This grant is to lay the ground before this work can happen. We need to decouple the application, rewire it. To do this I am gutting out the internals of the Mapwarper Rails application and implementing an API so that third party applications can find, query maps and mosaics, warp and crop maps and upload and add maps. We will also be improving the connections to Wikimedia Commons to enabled easier additions of maps. We’re probably going to be adding built in caching of the WMS and Tiles to improve performance, towards a production ready performant system ready for the loads of millions of users. We’ve half a year to do the work.
The OpenStreetMap Ireland community and http://www.townlands.ie/ have rectified all 675 sheets on Mapwarper.net leading towards the goal of mapping all Townloads in OpenStreetMap. Awesome stuff, it’s a very cool project.
They have also just surpassed adding 47,000 townlands!
— OpenStreetMap ie (@osm_ie) January 4, 2016
A couple of years ago I made an open source parody fun image generation / service information sign maker and today it’s been shut down due to lawyers from Transport for London (TfL).
I made it in a weekend as during the week I was doing my civic duty with some good legal professionals on a period of Jury Service. People made images to share with their friends, make jokes, announce anniversaries, quote prayers, tell poetry, advertise events, leave personal messages and write inspirational comments and so on. I have not seen any offensive images that people made with it either on the site or on Twitter. When it launched it got a fair bit of praise and positive coverage from many places including BBC America, ITV, The Londonist, The Atlantic Cities, The Guardian, The Next Web, B3ta.com, etc..
This Thursday (10 Dec 2015) I got an email with a scanned letter from a lawyer from Transport for London (TfL), a UK public transport authority. Here it is with names, email and addresses redacted.
So, I’ve destroyed the site, deleted the code and emailed them my confirmation of that. I decided to do it as soon as I was able (about 24 hours of the request) as I didn’t want the distraction and hassle, so I can get back to work.
As of last Thursday the site tubesign.herokuapp.com is offline and I cannot put it back online. The Ruby code, misc files and the CC-By-SA images on Heroku where it was hosted have all been deleted. My repository on GitHub has also been deleted although others may have copied their own forks of the MIT licensed code. It was only a few lines of unspectacular Ruby code anyhow.
Some people have speculated that this may also have been due to candidate for Mayor of London and MP for Tooting, Mr Sadiq Kahn tweeting one of the images someone made showing the hashtag “#youaintnomuslimbruv” – and then dozens of people replying saying it was made using the parody website. Perhaps we will never know, it doesn’t really matter. It appears that whilst a Labour MP, ‘Mr Khan is no Corbynite leftwinger‘ but one would imagine that he might stick up for RMT Union members against TfL management. And so should you also support the staff during their industrial actions – it was these same TfL bosses who issued this takedown.
I was surprised to see that letter in my inbox and disappointed that TfL were not willing to be more civil and reasonable in their approach. However, it’s not the first time TfL have acted in this way before in a case about a fan website about tube map variations – I remember it going around the blogs at the time in 2006.
Big institutions struggle and work slowly with technology but is it just me or is it a bit surprising to see how they have made no progress in almost ten years?
Now back to making some better transport maps.
From the OpenCellID Project and Mozilla Location Services, I produced these two maps of parts of Leeds. The areas are merged together based on the network code. It was made using QGis.
I talked about this during Superposition Leeds #008 on 20 May 2015 http://www.thesuperposition.org/event/008/
They look pretty rather than useful, so here they are!
I’m proud to tell you that the project I worked on last year with Topomancy for the Library of Congress and the New York Public Library has just been released to the Internet. It’s an open source, fast, temporal enabled, versioned gazetteer using open data. It also works as a platform for a fully featured function-laden historical gazetteer.
Here is an introduction to the project giving an overview and discussion of the challenges and listing the features of this software application. Enjoy!
DEMO: Head over to http://loc.gazetteer.us/ for the Digital Gazetteer – a more global gazetteer and to http://nypl.gazetteer.us/ for the NYPL’s NYC focused historic gazetteer. If you want to try out the conflation, remediation and administrative roles, let us know at firstname.lastname@example.org
A gazetteer is a geographic dictionary sometimes found as an index to an atlas. It’s a geographic search engine for named places in the world. This application is a temporal gazetteer with data source remediation, relations between places and with revisions which works with many different data sources. It’s fast, written in Python and uses ElasticSearch as a database. The application was primarily written as a Digital Gazetteer for the Library of Congress’s search and bibliographic geographic remediation needs and was also developed for the New York Public Library’s Chronology of Place project. It is currently being used in production by the Library of Congress to augment their search services. The software is MIT licensed.
* JSON API
* Backbone.js frontend
* ElasticSearch as revision enabled document store
* Simple data model.
* Core properties, name, type, geometry
* Alternate Names, (incl. language and local, colloq etc)
* Administrative hierarchy
* Relations between places (conflation, between geography and between time, etc
* Edit History, versioning, rollbacks, reverts
* Text search (with wildcard, AND and OR support – Lucene query syntax)
* Temporal search
* Search according to data source and data type
* Search within a geographic bounding box
* Search within the geography of another Place.
* GeoJSON and CSV results
* Search can consider alternate names and administrative boundaries, and address details.
* API search of historical tile layers
* Server side dynamic simplification option for complex polygon results
* Place has alternate names and administrative boundaries
* Similar Features search (similar names, distance, type etc)
* Temporal data type with fuzzy upper and lower bounds.
* Display of any associated source raster tile layer (e.g. historical map)
* Full vector editing interface for edit / creation.
* Creation of composite places from union of existing places.
* Full revision history of changes, rollback and rollforward.
* Conflates (A is the same place as B)
* Contains (A contains B spatially)
* Replaces (A is the same as B but over time has changed status, temporal)
* Subsumes (B is incorporated into A and loses independent existence, temporal)
* Comprises (B comprises A if A contains B, along with C,D and E)
We will delve into these relationship concepts later
* GeoDjango administrative pages
* Administrative Boundary operations
* Batch CSV Import of places for create / Update
* Edit feature code definitions
* Edit groups and users and roles etc
* Edit layers (tile layers optionally shown for some features)
* Add / Edit data origin definitions
The Library has lots of bibliographic metadata, lots of geographic information, much of it historical, almost all of it is unstructured.
For example, they have lots of metadata about books, where it was published, the topics, subjects etc. They want to try and improved the quality of the geo information associated with the metadata, and to augment site search.
So the library needs an authoritative list of places. The Library fully understands the needs for authoritative lists – they have authority files for things, ideas, places, people, files etc, but no centralised listing of them, and where there are geographic records there may be no actual geospatial information about them.
So we start with a simple data model, where a named location on the Earth’s surface has a name, a type and a geometry. All very simple right? But actually it’s a complex problem. Take the name of a place, what name to use? What happens if a place has multiple names, and what happens if it has multiple records to describe the same place? Taxonomies are also a large concern, for example establishing a set schema for every different type of feature on the earth is not trivial!
What’s the geometry of a place? Is it a point, is it a polygon, and at what scale? For administrative datasets, it’s often impossible to get a good detailed global administrative dataset. Often in many places the data is not there. Timeframe and temporal gazetteers are an another large area for research (see OpenHistoricalMap.org if this intrigues you!). But the way we describe places in time is very varied, for example “in the 1880’s” or “mid 19th Century” or “1 May 2012 at 3pm”. What about places which are vague or composed of other places, like “The South” (of the US) – how would a gazetteer handle these? And the relationships between places is another very varied research topic.
So we think the project has tried to address these challenges. For names, the system can accept multiple additional alternate names, and conflation enables the fixing of multiple records together so that the results shows the correct authoritative results. The Digital Gazetteer allows places to have any type of geometry (e.g. point, line, polygon) where all the database needs is a centroid to make search work. For temporal support, places have datestamp for start and end dates but crucially there is in addition fuzzy start and ends specified in days. This enables a place, for example to have a fuzzy start date (sometime in the year 1911) and a clear end date (23 May, 1945). For “The US South” example – composite places were created. The system generates the union of the composite places and makes a new one. The component places still exist – they just have a relationship with their siblings and with their new parent composite place. This brings us to how the Digital Gazetteer handles relations between places.
Let’s look a bit more in detail about the relationship model. Basically the relationships between places help in conflation (reducing duplicate records) and in increasing search accuracies. The five relationships are as follows:
This is the most common relationship between records initially. It effectively is an ontological statement that the place in one record is the same as described in another record, that entries A and B are the same place. It’s a spatial or a name type of relation. For example, if we had 5 records for Statue of Liberties, and all 4 were conflated to the one record, when you searched for the statue you’d get the one record, but with a link to each of the other four. Conflates hides the conflated record from search results.
Contains is a geographical relationship. Quite simply, Place A contains Place B. So for example, the town of Brighton would contain the Church St. Matthews.
Replaces is mainly a temporal relation, where one place replaces another place if the other place has significantly changed status, name, type or boundary. For example, the building representing the Council Offices of the town from 1830-1967 is replaced by a bank.
Subsumes is mainly a temporal relation. Where a place A becomes incorporated into another place B and loses independent existence. For example, the ward of Ifield which existed from 1780 to 1890 becomes subsumed into the ward of Crawley.
Comprises is primarily a spatial or name relation. Place A comprises place B along with place C,D and E. This relation creates composite places, which inherit the geometries of the component places. For example, “The US South” can be considered a composite place. This place is comprised of Virginia, Alabama etc. Virginia in this case comprises “the US South”, and the composite place “The US South” has the union of the geometry of all the places it is comprised by.
OpenStreetMap (OSM), Geonames, US Census Tiger/Line, Natural Earth, Historical Marker Database (HMDB), National Historical GIS (NHGIS), National Register of Historic Places Database (NRHP) and Library of Congress Authority Records
There remains two main areas for future development and research – Automatic Conflation and Search Ranking. Since there are multiple datasets, there will of course be the same record for the same place. The challenge is how to automatically find the same place from similar records by some kind of search distance. For example, by distance from each other, distance geographically, and in terms of name and place type. Tricky to get right, but the system would be able to undo any of the robots mistakes. Further information about this topic can be found on the GitHub wiki: https://github.com/topomancy/gazetteer/wiki/Conflation
By default the gazetteer uses full text search which also takes into account alternate names and administrative boundaries, but there is a need to float up the more relevant places in the search results. We can also sort by distance from the search centre if doing a search within geographic bounds, which is used for helping find similar places for conflation. We could probably look at weighting results based on place type, population and area, although population and area for many urban areas in the world may not be available. One of the most promising areas of research is using Wikipedia request logs as a proxy for importance – places could be more important if they are viewed on Wikipedia more than other places.
Some other issues which I haven’t got space to go into here include: synchronising changes up and downstream to and from the various services and datasets. Licensing of the datasets could be looked at especially if they are being combined. What level of participation in the conflation and remediation steps should a gazetteer have, which depends on where the gazetteer is based and who it is being used for.
I mentioned at the beginning of the post that the New York Public Library (NYPL) was also involved with the development of the Gazetteer. That project was called The Chronology of Place, and as the name suggests is more temporal in nature. But it’s also more focused geographically. Whereas the LoC are interested in the US and the World as a whole, the NYPL’s main focus is the City of New York. They wanted to deep dive into each building of the city, exploring the history and geography of buildings, streets and neighbourhoods.
Thus the level of detail was more fine grained, and is reflected in some custom default cartography in the web application client. A nondescript building in a street in a city for example are not usually considered a “place” worthy of a global gazetteer but for the NYPL each building was significant. Also, the NYPL has extensive access to historical maps via the NYPL Map Warper which Topomancy developed for them, and around a hundred thousand digitized vector buildings from these historical map atlases. This data, along with data from the city were able to be added to the system to augment the results. Additional data sources include the Census’s Historical Township boundary datasets, NYC Landmarks Preservation Commission Landmarks and NYC Building Footprints.
There were two additional features added to the application for the NYPL’s Chronology of Place. The first was expanding the data model to include street addresses, so that a building with no name can be used, and the second was to display raster tile layers (often from historical maps) for specific features. Thus,the building features which were digitized from the historical maps were able to be viewed alongside the source raster map that they came from.
Mauricio from the innovative NYPL Labs has just published an extensive tutorial on how to use MapWarper.net with GeoJSON, MapboxJS, and JSFiddle to create your own historical web map, as he says it is “a primer on working with various free web mapping tools so you can make your own awesome maps.” The end result is worth checking out.
In the tutorial the following steps are included:
- geo-referencing the scanned map so that web tiles can be generated
- generating GeoJSON data to be overlaid
- creating a custom base map (to serve as reference/present day)
- integrating all assets in an interactive web page
Its a very detailed introduction to a wide range of new, free and open geo tools on the web, and I cannot recommend it high enough! It’s also great to see mapwarper.net being used in this way!