Home // International Journal On Advances in Software, volume 8, numbers 3 and 4, 2015 // View article


Qualitative Comparison of Geocoding Systems using OpenStreetMap Data

Authors:
Konstantin Clemens

Keywords: Geocoding; Address Indexing; OpenStreetMap; Nominatim; Elasticsearch

Abstract:
OpenStreetMap is a platform where users con- tribute geographic data. To serve multiple use cases, these data are held in a very generic format. This makes processing and indexing OpenStreetMap data a challenge. Nominatim is an open source geocoding system that consumes OpenStreetMap data. Nominatim processes OpenStreetMap data well. It relies on predefined address schemes to determine the meaning of various address elements and to discover relevant results. Nominatim ranks results by a global precomputed score. Elasticsearch is a web service on top of Lucene – a general purpose document store. Lucene searches for documents and ranks results according to a term frequency – inversed document frequency scoring scheme. In this article, Nominatim is compared to two systems populated with exactly the same data: An out-of-the-box instance of Elasticsearch, and a specialized system that builds on top of Elasticsearch, but implements a custom algorithm to aggregate house numbers on every street segment, thereby vastly reducing the index size. The three geocoding systems are throughly benchmarked with three different data sets and geocoding queries of increasing complexity. The analysis shows: Term frequency – inversed document frequency based ranking yields more accurate results, and is more robust removing the need for predefined address schemes. Also, the reduced index size of the specialized system comes at a cost, which, depending on the application scenario, may be a viable option.

Pages: 377 to 386

Copyright: Copyright (c) to authors, 2015. Used with permission.

Publication date: December 30, 2015

Published in: journal

ISSN: 1942-2628