Internationalization (i18n) and Translations for Mapwarper. (English, Dutch, Japanese)

mapwarper_locale

Many thanks to Ellen Gehring of Erfgoed Leiden en Omstreken for the Dutch translations. Ellen with the ELO wanted their own warper and because of this, support for locales, internationalisation and translations was added to the codebase.  Huge thanks to Kamata Ryo for the work on the Locale Changer and the Japanese translations.

If you would like to see your own language, let me know.  We’ve used both LocaleApp and normal Github pull requests.

LeedsAdultLearning.co.uk

This year, with Leeds City Council I developed LeedsAdultLearning.co.uk which is a course finder for about 300 courses by the City and run by a number of providers and dozens of venues. It offers a range of first step courses for adults, such as basic IT skills, ESOL, caring and crafts. Within the first 24 hours of launch it received over 3000 visits, in the first month, it had over 25,000 visits, with the average user spending three minutes on the site.  The code’s up on my LearningInLeeds GitHub repository.

Screenshot-2017-11-22 Find courses near you in Leeds - Adult Learning in Leeds

The project evolved from a LCC Innovation Lab – similar to the Leodis project I also worked on. The key idea is that it was designed to be a pilot or prototype project, small in scope and quick to develop, it would aim to be an aspirational example of how the City can work with the Council and open data to make good IT products. The adult education department were fully engaged with the development and design of the project, giving feedback, priorities. This engagement was really welcome and I think the experts say its crucial to any successful agile project. The department didn’t have any online course finder before so this was bringing something new, and needed to them.

Screenshot-2017-11-22 Accredited Courses at Swarthmore College - Adult Learning in Leeds

It was featured in the Government Technology News site, the Yorkshire Evening Post, on Made in Leeds TV, and was shown on the big screen in Millennium Square.

Screenshot_2017-11-22_13-45-10

Features

  • Automatic imports of courses from Data Mill North (open data site)
  • Full text search with support for sounds like and spelling mistakes
  • Geographical, near searches
  • Bus and walking directions to the start of the course from any point
  • Add to calendar links for course start
  • Showing courses by topics or categories
  • Responsive and mobile friendly.
  • Simple CMS admin UI for staff to update text pages, change records etc
  • Caching of external API requests, front page and CMS pages
  • Recent searches kept

Technology

  • Ruby on Rails
  • Devise and Active Admin for admin UI
  • Postgresql, PostGIS and pg_search for db, geo, full text stuff
  • Bootstrap for front end user interface layout, CSS etc
  • Transport API, Bing Transit, Mapzen for journey planning and geocoding etc

Future

The project could be altered for other organisations, and it could be altered to include the whole range of courses on offer for adults across the city region. I think usage metrics would need to be done to see what users actually do on the site, and whether the journey planning is useful. Adding extra information about course duration,  how many times a week / month etc would be good. Making it more mobile friendly could be looked at, including making a mobile only app.

New shortcuts and add layer features in mapwarper.net

Mapwarper.net has been updated with several months of the latest code. Some of these features are directed towards the rectify map interface with the addition of keyboard shortcuts and the new search and add custom layer tool. If you want to support mapwarper.net donate to paypal.me/timdevelops!

Search for existing map and mosaics and add to the map

Search for warped maps and mosaics (previously called layers) to add to the reference map.

Add Z/X/Y tiles to reference layer

Instead of searching for maps, paste in a map template string into the box. In the animation below, we get a template from another warper and add that in. You can use any tiles so long as they are in the ZXY ordered scheme e.g. https://example.com/osm/{z}/{x}/{y}.png

Keyboard shorcuts: Change Mode  & Place Point

Use 1, 2 and 3 on your keyboard to change mode from place point, add point and move map modes.

Press “q” to place a point at the mouse cursor – regardless of the map mode. This enables you to quickly move around and add points

Quick Place and Quick Save

In this animation, the user has the move map control mode activated, and they move the mouse of the maps, and click “q” to quickly place point at mouse location. They then hit “e” or “return” to save that point – that saves you clicking the  “add control point” button!

Auto Place Point – MAGIC!

Here the “a” key is pressed to (similar to the q key) drop a point where the mouse is, but crucially, another point on the other map is added. This works if the point you add is on either map. The other map is zoomed to that new point. You’d need at least 2 points to give this a go.

(It uses a helmert transform to work out, based on the existing points, where to add the other point – so it’s a simple 2D transform and won’t be accurate for very warped maps)

What’s coming to Mapwarper.net – new stuff!

Mapwarper.net is getting an upgrade of code early this first week in April.

New things:

  • Keyboard shortcuts
  • New JSON-API and docs
  • Map caching for faster maps
  • Bigger map windows
  • Upgraded Ruby and Rails versions
  • Changed “my maps” to “favourites”
  • Changed “layers” to “mosaics”
  • Social logins – OpenStreetMap, Github and Facebook
  • Year / date search
  • Translation support
  • Added Mapbox Satellite layer
  • Search for an existing map or mosaic to help rectification
  • Ability to add an “Z/X/Y” Tile layer to help rectification
  • Upload CSV of Control Points for a map
  • Download CSV of control points for any map
  • Many bugs fixed and much other things I forgot to list above!

Screenshot_2017-04-02_19-14-00

You might particularly enjoy the keyboard shortcuts:

Screenshot_2017-04-02_19-11-38

The “a” shortcut does some auto-magic where if you have 3 or more points, it will automatically place the other point at the place where it thinks works best.

GIFrecord_2017-04-02_193003.gif

Quick place “q” and save point “e” then Auto place “a” twice and save point “e”.                      Click the image for bigger

To try out these latest features yourself visit http://mapwarper.net

Let me know what you think!

If you wish to support the hosting and development of this project, you can donate to Tim via PayPal here: paypal.me/timdevelops

 

 

Doing Digital History 2016 – Map Warper Tutorial

Doing Digital History 2016 is a two week Summer institute sponsored by National Endowment for the Humanities and was held at George Mason university in DC/Arlington earlier in the year. It was attended by “mid-career” historians and the focus was on digital humanities and history – making visualizations, mapping, sound and vision, and more.

Screen-Shot-2016-07-07-at-4.11.16-PM.png

I was happy to see that the participants had a workshop Georectifying maps with Map Warper.

Here’s the screenshot of the tutorial:

 

 

 

 

 

The New Cloud Atlas – Mapping the Physical Infrastructure of the Internet

Introduction

The New Cloud Atlas, (newcloudatlas.org) is a global effort to map each data place that makes up the cloud in an open and accountable way. It’s a project to find and map each warehouse data centre, each internet exchange, each connecting cable and switch. Anything of any physical significance in the operation of the cloud should be observed in some way, and recorded for everyone to see and use. Data is stored in OpenStreetMap and users can map things using the on site iD editor with custom telecoms presets for the first time. Map tiles with two styles have been produced and have now made visible this hidden infrastructure. http://newcloudatlas.org

The New Cloud Atlas, named after the nineteenth collaborative scientific data collection project, is about understanding and making visible the hidden “Cloud”. Although most of these telecoms features are in the open and in plain sight, many are missing from open datasets or may be considered sensitive. Telecoms infrastructure has immense importance in connectivity and power in our connected world  – the more connected a place is the more benefits it has. Indeed the lines of fibre optic backbone have become the new ley lines of the 21st Century powering the forces behind a new Psychogeography of places.

A bit about the name: The First Cloud Atlas was published in 1896 by the Permanent Committee of the first International Meterological Congress. Cloud weather observatories around the world were able to share consistent observations of the clouds and observe weather systems whose scale stretched over national boundaries. The publication of the International Cloud Atlas represented a move beyond national concerns and boundaries to an international perspective.

In addition to its important role in the predicting the weather, the vision is a surprisingly early call for infrastructural globalism and worldwide collaboration:

“If there be any branch of science in which work on a uniform system can be especially useful and advantageous, that branch is the inquiry into laws of weather, which, from its very nature, can only be prosecuted with a hope of success by means of very extensive observations embracing large areas, in fact, we might almost say, extending over the whole surface of the globe”

Site

The site shows frequently updated tiles generated from OpenStreetMap(OSM) data, details about the project and a custom OSM editor for making it easier to add map features. Here are some screenshots.

Screen Shot 2016-08-12 at 21.12.33

Map, Transparent Tiles, Markers, Legend

 

Screen Shot 2016-08-12 at 21.16.25

Cloud X-Ray Style, with scale independent(ish) building polygons

The Cloud X-Ray style, shown above was partially inspired by Kosmtik’s data inspector style, and it shows polygons that are enlarged at low zooms. Polygons should appear to be the same size on the screen as you zoom in. It gives a sci-fi cartography, but I find it very useful finding clusters of mapped features, as all features are shown at all zoom levels.

 

Screen Shot 2016-08-12 at 21.21.00.png

Custom iD Editor with Telecoms Presets

Note: that you can also edit in JOSM or Vespucci OSM Editors using these presets here: https://simonpoole.github.io/new-cloud-atlas-preset/

Background

The New Cloud Atlas is a project initiated by experimental media technologist, artist and designer Ben Dalton with the design and research studio of Amber Frid-Jimenez and Joe Dahmen, and myself. Ben writes about the project – with the main idea that it’s about understanding what the Internet actually is in physical terms, rather than as something that remains clouded and mysterious:

The first appearance of the internet cloud was in network diagrams. The cloud symbol was used to stand in for complexity. The cloud embodied something of the way that the internet functions. The internet was designed to be ‘end-to-end’, so computers are meant to be able to connect to each other without interference as the message passes through a network of interconnections. Only the end points are meant to matter. The clouds here represent ‘something in the middle that is too complex to draw here’, a kind of neutral space through which information passes. It is an act of simplification, but it also contains an implicit statement that ‘the cloud will look after itself’ that this thing is going to carry on being there.

Beclouding is deliberately making something more confusing, in order to obfuscate or conceal its meaning. The use of the cloud has shifted in digital systems. The idea that ‘this is too complicated to think about’ has been moved front and centre and converted into a business model, shedding its innocence along the way. Through a sleight of hand, the cloud sometimes appears as a platform, and sometimes a material. This narrative rests on the idea that the services are to be trusted, and they can take care of themselves on your behalf. We trust them with our emails and our childhood photographs and our meeting plans and whatever else we use the cloud for. In this new definition of the cloud, there is a statement that ‘this is too complex to deconstruct or critique’. You shouldn’t try to look in to the cloud and see what’s there. It’s made up of vapour, and it’s not to be interrogated. Better to simply observe it from a distance and admire it at sunset.

Once the domain of national governments, information infrastructure is increasingly constructed, operated, and maintained by major multinational corporations. These corporations, which include Google, Facebook, Amazon, Apple, and Microsoft, have a similar vested interest in maintaining control over of the flow of goods and information once exercised by national governments, but a reach at once more extensive and less transparent.

Psychogeography

Regular readers may know of my interest in Psychogeography. The British Psychogeography of the 90s employed Ley Lines and “Magico-Marxism” using the language of the occult to explain the unknown forces of power at work in space and in places. I’m developing the idea that the new lines of power in the 21st Century are of Information – and the actual lines of light that transmit these bits of data, and the buildings that house them. More about that in walk or talk later on this year!

Another more obvious connection with psychogeography is the hidden in plain sight angle. These passageways of the internet are often marked, on manhole covers, in mobile phone masts, in big buildings in light industrial estates, but they are utterly overlooked. They may travel along the margins, along canals or train tracks. They are also sited in classic psychogeographical “liminal” spaces – beaches, margins of rural and urban, wasteland, on top of tower buildings etc.

OpenStreetMap and the Telecoms WikiProject

OpenStreetMap allows anything that exists and can be verified to be mapped. There is no notability rule that Wikipedia has, for example. So it allows manhole covers to be mapped in detail, it allows telephone lines and the assorted street cabinet boxes that crowd our pavements. You might get feedback as to how to map these features and you might get funny comments about why these features are being mapped (and indeed, mapping with OpenStreetMap is voluntary!) but pretty much all OSM mappers will agree that these features shouldn’t be excluded.

Telecoms features in OpenStreetMap haven’t been well mapped before. This is both good and bad in that the taxonomy (or folksonomy to be more accurate)  – the tags that describe these features -have not been standardized. We have the opportunity to define the tags, or at least standardize some of them to be more consistent across similar telecoms infrastructure features.

I started WikiProject Telecoms on the OSM Wiki, so please go there to see how to map and tag features – and if you are a telecoms, mapping or tagging specialist please suggest better ways to map these features! https://wiki.openstreetmap.org/wiki/WikiProject_Telecoms

The current features being rendered in the New Cloud Atlas map are:

  • Data Centres
  • Telephone Exchanges
  • Manhole covers
  • Telephone poles and wires
  • Submarine cables etc
  • Telecoms towers, masts, and antennae
  • Street Cabinets

Underground features may be more difficult to map – so we are relying on manhole covers which often show what its use and who operates the cable underneath (in the UK at least) – and those markings sprayed by utility companies, and some data imports. If you don’t know where an underground cable goes its probably best to leave it out.

You might have noticed that many of the options include sound and heat for the street cabinets. One of the side effects of today’s modern fibre optic street cabinets is that they are often installed with more power needs than copper wire ones – and so they need a fan. Often the cabinets are warm to the touch and sometimes they make a quite loud drone sound. This type of data can be useful, I have heard, to people who are vision impaired. Sound and touch can help orientate people in space.

Update: There is a JOSM Preset and a Vespucci Preset that Simon Poole developed

Open Data / Secrecy etc

It’s probably worth talking a little bit about the privacy and secrecy issues. Although the project isn’t about getting releases of data from companies and governments, and it’s not about uncovering the secret installations, it is about collaboratively mapping the world. Almost all of the information that will make up the New Cloud Atlas will be found in the field or in public information sources.

You may be reminded of a story in 2003 (2 years after 9/11) of Sean Gorman‘s PhD dissertation that the US Government wanted to make classified as although it contained only publicly available information (about the Internet connections in the US) he analyzed the data to identify the weak links – weaknesses that, for example, a disaster could take out or a terrorist could exploit.  Officials in the US Govt said that his dissertation should be burnt! Sean successfully graduated and started a mapping company with the DHS as clients. (I actually ended up working there at FortiusOne / GeoIQ several years after that for a bit). Now of course open data and open analysis is encouraged and promoted by governments (and following this trend check out Sean’s new startup Timbr.io).

You may also recall stories about how many national mapping agencies removed military bases (such as Aldermaston, or Greenham Common in the UK) from their paper maps – even when these bases were signed from the motorways and major roads and had nice big clear signs outside the fences.  A relic from the Cold War, perhaps. It appears to me that even in this current year the Ordnance Survey mislabels the Menwith Hill USAF/RAF Listening Base in North Yorkshire as just “Menwith Camp” with no indication of it’s real name, activity nor landuse (as compared to OpenStreetMap for example).

At this point, if you are curious, we should evoke the classic 1996 Wired Article by Neal Stephenson: Mother Earth Mother Board http://www.wired.com/1996/12/ffglass/ It’s essential if you are interested in reading more about the geo political and technology of international internet cable laying. It’s also a great read in general.

 

Liverpool Walk / Workshop

Ben and I ran a series of walks and workshops at FACT in June 2016. Cloud Dowsing Hunting for the Hidden Internet and Mapping the New Cloud Atlas

We used FieldPapers to give to participants and mappers and went around the streets of Liverpool.

Here we are near the main telephone exchange and data centre looking for manhole covers, cabinets and antennae, that’s me pointing.

13391350_1053402384736248_1279453524_n

You can view the photos I took on the Flickr Album https://www.flickr.com/photos/chippee/sets/72157671540933095

 

Development Notes

Code for the site is on github: https://github.com/timwaters/new_cloud_atlas

Mapnik / Kosmtik Style file and processing notes also on github: https://github.com/timwaters/cloud_mapping

Mapnik X-Ray Style

Of possible interest to mapnik style geeks could be the use of the scale denominator and PostGIS ST_Scale commands to scale up building polygons so that they appear to be the same size regardless of the zoom. If anyone wants to fix this to make it work better, please let me know!

select st_translate(st_scale(way, (!scale_denominator! * 0.00028) - (5 - z(!scale_denominator!)) ,
 (!scale_denominator! * 0.00028) - (5 - z(!scale_denominator!)) ), 
st_x(st_centroid(way))*(1-( (!scale_denominator! * 0.00028) - (5 - z(!scale_denominator!)) )), 
st_y(st_centroid(way))*(1-( (!scale_denominator! * 0.00028) - (5 - z(!scale_denominator!)) ))) as way,
 building AS type FROM planet_osm_polygon WHERE (building='data_center' ) AS data",

OSM Tile Generation

Tiles are kept up to date at around 15 minutes with the central OSM database. Occasionally a full planet import is done. I think I could use Lua scripting to ensure that the database remains lean. The system uses TileStache to enable the UTFGrids for the popups. Essentially we filter out a lot of stuff from the OSM database:

  1. Convert an osm.pbf file to an o5m file
    ./osmconvert  planet-latest.osm.pbf -o=planet.o5m
  2.  Filter the o5m file to an .osm file
    ./osmfilter planet.o5m --parameter-file=cloud_mapping/osmfilter_params.txt > planet.filtered.osm
  3. Import the .osm file into the database using the custom osm2pgsql Style
     osm2pgsql --slim -d gis planet.filtered.osm -S cloud_mapping/default.style
  4. Set up replication using Osmosis and osm2pgsql to get changes from OSM db
    osmosis --read-replication-interval  --simplify-change --write-xml-change changes.osc.gz
    osm2pgsql -d gis -S default.style -s -C 800 -a changes.osc.gz -e10-19 -o expire_e10-19.list

http://newcloudatlas.org/

Mapwarper featured in A Digital Humanities Primer for English Students

Jenna Herdman has written an excellent free e-book about Digital Humanities for English Students which has an entire chapter titled: Digital Mapping Tool Tutorial which features the Mapwarper. It’s been published using gitbook and is available in pdf, html, epub formats.

The tutorial covers adding a map to mapwarper.net to chart the movements of David in Charles Dickens’s David Copperfield.

unspecified-2

The map is then loaded into Palladio which is a new tool for me. it “is a web-based platform for the visualization of complex, multi-dimensional data”.

unspecified-1

Do check out this great resource. The book has seven chapters in total and all of them are interesting and worthwhile to read! https://www.gitbook.com/book/jennaherdman/a-digital-humanities-primer-for-english-students/details

Screenshot after this read more tag:


Continue reading

Colliding The Mental Maps of Edinburgh with Mapwarper.net

Last autumn I popped up to Edinburgh from the North of England for State of the Map Scotland conference. Together with Edinburgh College of Art in Evolution House participants took part in series of workshops “Map.Makars”

I took part in a memory map of the city. The rules were: no looking at other maps, the map should include the venue, the castle, the train station. We drew, from memory the city on large pieces of paper. Gregory scanned/photographed these and put these on mapwarper.net to stretch them to fit. he then combined these together with an interactive and animated transparency control to create the Hand Drawn Map Collider “No-map Map Give it a whirl! http://www.livingwithdragons.com/maps/nomap-map/

Screenshot from 2016-04-22 11:26:11.png

My map, in case you were wondering was possibly the least accurate of them, coming from furthest away! http://mapwarper.net/maps/10907

Screenshot from 2016-04-22 11:28:00.png

 

A Digital Gazetteer of Places for the Library of Congress and the NYPL

I’m proud to tell you that the project I worked on last year with Topomancy for the Library of Congress and the New York Public Library has just been released to the Internet. It’s an open source, fast, temporal enabled, versioned gazetteer using open data. It also works as a platform for a fully featured function-laden historical gazetteer.

You can check out the official release at the Library of Congress’s GitHub page, and for issues, documentation and development branches on Topomancy’s Gazetteer project site.

Here is an introduction to the project giving an overview and discussion of the challenges and listing the features of this software application. Enjoy!

DEMO: Head over to http://loc.gazetteer.us/ for the Digital Gazetteer – a more global gazetteer and to http://nypl.gazetteer.us/ for the NYPL’s  NYC focused historic gazetteer. If you want to try out the conflation, remediation and administrative roles, let us know at team@topomancy.com

Introduction, Overview, Features

A gazetteer is a geographic dictionary sometimes found as an index to an atlas. It’s a geographic search engine for named places in the world. This application is a temporal gazetteer with data source remediation, relations between places and with revisions which works with many different data sources. It’s fast, written in Python and uses ElasticSearch as a database. The application was primarily written as a Digital Gazetteer for the Library of Congress’s search and bibliographic geographic remediation needs and was also developed for the New York Public Library’s Chronology of Place project. It is currently being used in production by the Library of Congress to augment their search services. The software is MIT licensed.

Fig 1. Library of Congress Gazetteer

Fig 1. Library of Congress Gazetteer

Architecture

* JSON API
* Backbone.js frontend
* Django
* ElasticSearch as revision enabled document store
* PostGIS

Data Model

* Simple data model.
* Core properties, name, type, geometry
* Alternate Names, (incl. language and local, colloq etc)
* Administrative hierarchy
* Timeframe
* Relations between places (conflation, between geography and between time, etc
* Edit History, versioning, rollbacks, reverts

Features

Search

* Text search (with wildcard, AND and OR support – Lucene query syntax)
* Temporal search
* Search according to data source and data type
* Search within a geographic bounding box
* Search within the geography of another Place.
* GeoJSON and CSV results
* Search can consider alternate names and administrative boundaries, and address details.
* API search of historical tile layers
* Server side dynamic simplification option for complex polygon results

Fig 2. Gazetteer Text Search

Fig 2. Gazetteer Text Search

Fig 3. Temporal Search

Fig 3. Temporal Search

Fig 4. geographic search

Fig 4. geographic search

Place

* Place has alternate names and administrative boundaries
* Similar Features search (similar names, distance, type etc)
* Temporal data type with fuzzy upper and lower bounds.
* Display of any associated source raster tile layer (e.g. historical map)
* Full vector editing interface for edit / creation.
* Creation of composite places from union of existing places.
* Full revision history of changes, rollback and rollforward.

Fig 5. Alternate Names

Fig 5. Alternate Names

Fig 6. Similar Names

Fig 6. Similar Names

Fig 7. Vector Editing

Fig 7. Vector Editing

Relations between places

These are:
* Conflates (A is the same place as B)
* Contains (A contains B spatially)
* Replaces (A is the same as B but over time has changed status, temporal)
* Subsumes (B is incorporated into A and loses independent existence, temporal)
* Comprises (B comprises A if A contains B, along with C,D and E)

We will delve into these relationship concepts later

Site Admin

* GeoDjango administrative pages
* Administrative Boundary operations
* Batch CSV Import of places for create / Update
* Edit feature code definitions
* Edit groups and users and roles etc
* Edit layers (tile layers optionally shown for some features)
* Add / Edit data origin definitions

Fig 8. feature code edition

Fig 8. feature code edition

Fig 9. Django origin edition

Fig 9. Django origin edition

Background, Requirements and Challenges

Library of Congress and Bibliographic Remediation

The Library has lots of bibliographic metadata, lots of geographic information, much of it historical, almost all of it is unstructured.
For example, they have lots of metadata about books, where it was published, the topics, subjects etc. They want to try and improved the quality of the geo information associated with the metadata, and to augment site search.

So the library needs an authoritative list of places. The Library fully understands the needs for authoritative lists – they have authority files for things, ideas, places, people, files etc, but no centralised listing of them, and where there are geographic records there may be no actual geospatial information about them.

Initial Challenges

So we start with a simple data model, where a named location on the Earth’s surface has a name, a type and a geometry. All very simple right? But actually it’s a complex problem. Take the name of a place, what name to use? What happens if a place has multiple names, and what happens if it has multiple records to describe the same place? Taxonomies are also a large concern, for example establishing a set schema for every different type of feature on the earth is not trivial!

What’s the geometry of a place? Is it a point, is it a polygon, and at what scale? For administrative datasets, it’s often impossible to get a good detailed global administrative dataset. Often in many places the data is not there. Timeframe and temporal gazetteers are an another large area for research (see OpenHistoricalMap.org if this intrigues you!). But the way we describe places in time is very varied, for example “in the 1880’s” or “mid 19th Century” or “1 May 2012 at 3pm”. What about places which are vague or composed of other places, like “The South” (of the US) – how would a gazetteer handle these? And the relationships between places is another very varied research topic.

Approach

So we think the project has tried to address these challenges. For names, the system can accept multiple additional alternate names, and conflation enables the fixing of multiple records together so that the results shows the correct authoritative results. The Digital Gazetteer allows places to have any type of geometry (e.g. point, line, polygon) where all the database needs is a centroid to make search work. For temporal support, places have datestamp for start and end dates but crucially there is in addition fuzzy start and ends specified in days. This enables a place, for example to have a fuzzy start date (sometime in the year 1911) and a clear end date (23 May, 1945). For “The US South” example – composite places were created. The system generates the union of the composite places and makes a new one. The component places still exist – they just have a relationship with their siblings and with their new parent composite place. This brings us to how the Digital Gazetteer handles relations between places.

Fig 10. Composite Place

Fig 10. Composite Place

Relationships

Let’s look a bit more in detail about the relationship model. Basically the relationships between places help in conflation (reducing duplicate records) and in increasing search accuracies. The five relationships are as follows:

* Conflates
* Contains
* Replaces
* Subsumes
* Comprises

Conflates

This is the most common relationship between records initially. It effectively is an ontological statement that the place in one record is the same as described in another record, that entries A and B are the same place. It’s a spatial or a name type of relation. For example, if we had 5 records for Statue of Liberties, and all 4 were conflated to the one record, when you searched for the statue you’d get the one record, but with a link to each of the other four. Conflates hides the conflated record from search results.

Contains

Contains is a geographical relationship. Quite simply, Place A contains Place B. So for example, the town of Brighton would contain the Church St. Matthews.

Replaces

Replaces is mainly a temporal relation, where one place replaces another place if the other place has significantly changed status, name, type or boundary. For example, the building representing the Council Offices of the town from 1830-1967 is replaced by a bank.

Subsumes

Subsumes is mainly a temporal relation. Where a place A becomes incorporated into another place B and loses independent existence. For example, the ward of Ifield which existed from 1780 to 1890 becomes subsumed into the ward of Crawley.

Comprises

Comprises is primarily a spatial or name relation. Place A comprises place B along with place C,D and E. This relation creates composite places, which inherit the geometries of the component places. For example, “The US South” can be considered a composite place. This place is comprised of Virginia, Alabama etc. Virginia in this case comprises “the US South”, and the composite place “The US South” has the union of the geometry of all the places it is comprised by.

Data Sources

OpenStreetMap (OSM), Geonames, US Census Tiger/Line, Natural Earth, Historical Marker Database (HMDB), National Historical GIS (NHGIS), National Register of Historic Places Database (NRHP) and Library of Congress Authority Records

Further Challenges

Automatic Conflation

There remains two main areas for future development and research – Automatic Conflation and Search Ranking. Since there are multiple datasets, there will of course be the same record for the same place. The challenge is how to automatically find the same place from similar records by some kind of search distance. For example, by distance from each other, distance geographically, and in terms of name and place type. Tricky to get right, but the system would be able to undo any of the robots mistakes. Further information about this topic can be found on the GitHub wiki: https://github.com/topomancy/gazetteer/wiki/Conflation

Search Ranking

By default the gazetteer uses full text search which also takes into account alternate names and administrative boundaries, but there is a need to float up the more relevant places in the search results. We can also sort by distance from the search centre if doing a search within geographic bounds, which is used for helping find similar places for conflation. We could probably look at weighting results based on place type, population and area, although population and area for many urban areas in the world may not be available. One of the most promising areas of research is using Wikipedia request logs as a proxy for importance – places could be more important if they are viewed on Wikipedia more than other places.

Further Issues

Some other issues which I haven’t got space to go into here include: synchronising changes up and downstream to and from the various services and datasets. Licensing of the datasets could be looked at especially if they are being combined. What level of participation in the conflation and remediation steps should a gazetteer have, which depends on where the gazetteer is based and who it is being used for.

NYPL Chronology Of Place

I mentioned at the beginning of the post that the New York Public Library (NYPL) was also involved with the development of the Gazetteer. That project was called The Chronology of Place, and as the name suggests is more temporal in nature. But it’s also more focused geographically. Whereas the LoC are interested in the US and the World as a whole, the NYPL’s main focus is the City of New York. They wanted to deep dive into each building of the city, exploring the history and geography of buildings, streets and neighbourhoods.

Fig 11. NYPL Chronology of Place

Fig 11. NYPL Chronology of Place

Thus the level of detail was more fine grained, and is reflected in some custom default cartography in the web application client. A nondescript building in a street in a city for example are not usually considered a “place” worthy of a global gazetteer but for the NYPL each building was significant. Also, the NYPL has extensive access to historical maps via the NYPL Map Warper which Topomancy developed for them, and around a hundred thousand digitized vector buildings from these historical map atlases. This data, along with data from the city were able to be added to the system to augment the results. Additional data sources include the Census’s Historical Township boundary datasets, NYC Landmarks Preservation Commission Landmarks and NYC Building Footprints.

There were two additional features added to the application for the NYPL’s Chronology of Place. The first was expanding the data model to include street addresses, so that a building with no name can be used, and the second was to display raster tile layers (often from historical maps) for specific features. Thus,the building features which were digitized from the historical maps were able to be viewed alongside the source raster map that they came from.

Fig 12. Custom/Historical layers shown

Fig 12. Custom/Historical layers shown