Skip to content

Instantly share code, notes, and snippets.

@jpetazzo
Last active October 12, 2023 09:09
Show Gist options
  • Star 25 You must be signed in to star a gist
  • Fork 3 You must be signed in to fork a gist
  • Save jpetazzo/5177554 to your computer and use it in GitHub Desktop.
Save jpetazzo/5177554 to your computer and use it in GitHub Desktop.
Manual custom geocoding using OSM database

Someone asked how to get the latlong from a specific road near a town on OpenStreetMap.

If you need to do it only once (e.g., you're about to go on a trip, and your GPS cannot find your destination city, but allows you to enter GPS coordinates), you can use Nominatim, OpenStreetMap's geocoding interface.

If you need to do it multiple times, in a programmatic manner, there are at least two ways to do that.

Note: I worked with OSM data a couple of years ago, but I don't have an OSM database on my local laptop right now, so some instructions will be a bit fuzzy. I do apologize in advance.

PostGIS queries on a local OSM DB

This is probably the most complicated method; you will need to install a local PostGIS database, load OSM data in it, learn how to do SQL queries with geographic data, and learn how the OSM data is structured. It will take some time, but you will learn a lot in the process, and you will be able to solve your problem without doing queries to an external data source (except for the initial download of OSM data, of course).

Install PostGIS & tools

apt-get install postgresql-9.1-postgis (or 9.2 or whatever ships with your distro)

apt-get install osm2pgsql

Setup a PostGIS DB

I don't know if those steps are still necessary, but they were with PostgreSQL 8.4.

# check if plpgsql is installed ; if not, install it
psql -Atc "select lanname from pg_language"  | grep -q ^plpgsql$ || createlang plpgsql

# check if postgis is installed ; if not, install it
psql -Atc "select postgis_full_version()" || psql < /usr/share/postgresql/8.4/contrib/postgis-1.5/postgis.sql

# check if SRS are loaded ; if not, load them
psql -Atc "select srid from spatial_ref_sys" | grep -q ^4326$ || psql < /usr/share/postgresql/8.4/contrib/postgis-1.5/spatial_ref_sys.sql

Of course, you will have to adjust the paths. Quick explanation:

  • the first query makes sure that PL/PGSQL is available in your database (PostGIS relies on it);
  • the second query creates all the PostGIS types, functions, etc. in your database;
  • the third query creates a special table , holding all the data needed to convert between different coordinates systems.

Get OSM data

You probably don't want to get data for the whole world, because this will be EXTREMELY large, and take DAYS to import in your machine (even assuming that you have a very fast machine with truckloads of RAM and SSD drives).

I recommend to download partial data from Geofabrik, starting with a very small map, e.g. a small region of France; and once everything works, you can import data for France, or Europe, or America, or whatever you want. You want OSM data (not SHP).

Load OSM data

This is the command that my installation scripts were using; I hope it still works correctly :-)

osm2pgsql -d "$PGDATABASE" -U "$PGUSER" -H "$PGHOST" -p osm -l -s -c ile-de-france.osm.bz2

Note: if you want to do the import again later (because it failed the first time, or whatever), you might have to drop manually the osm_polygon_view:

psql -c "DROP VIEW if exists osm_polygon_view"

Inspect OSM data

It's time to get familiar with the structure of the data.

You should have the following tables:

  • osm_point ("points of interests", with just lat+long coordinates)
  • osm_lines (roads and other line-line features)
  • osm_polygons (e.g. forests, parks...)
  • osm_roads (when I used OSM, surprisingly, the roads were in osm_lines, not osm_roads!)

Note: Sometimes, data is not where you expect it. I don't have an example on the top of my mind; but if you are looking for a polygon-like feature and cannot find it in osm_polygons, have a look in osm_lines instead. Not a big deal as long as you know that you have to look everywhere if you don't find something at its more logical place.

Each table will have:

  • one geographic column called way (it's a SQL column, but instead of having type string, integer, etc., it can be a point, line, or polygon),
  • many metadata columns.

If you want to find countries/states/regions/cities/districts/..., you have to look for two columns:

  • boundary with a value of administrative,
  • admin_level will be a numerical value to indicate if we're dealing with a country, a city, etc.

I don't remember the exact values, but to give you an idea, it will look like this:

  • 4 means country
  • 6 means region
  • 8 means city
  • 10 means district or neighborhood

So, something like this should give you all the cities:

SELECT name FROM osm_polygons WHERE boundary='administrative' AND admin_level=8;

If you don't want the exact limits of the cities, but just a center point, look in osm_point, and use the column place:

SELECT name,way FROM osm_points WHERE place IN ('town', 'city');

(The way column should be the GPS coordinates of the center of the city.)

Likewise, there should be a column named highway, containing the type of road. If highway is NULL, it means that the line is not a road. Note that there are multiple types of highways which could correspond to what you're looking for:

  • motorway
  • motorway_link
  • trunk
  • trunk_link
  • primary
  • primary_link

They also have a name so you can find the exact freeway you want. There will be multiple results, because a given freeway won't be one single object, but multiple distinct objects.

Putting everything together

Let's assume that you have found:

  • one and only one row in osm_points for your city,
  • at least one (possibly multiple) rows in osm_lines for your highway,

now you want to find exactly which segment of the highway is closest to the city.

Example query:

SELECT * FROM osm_lines,osm_points 
WHERE osm_lines.name=='Autoroute XXX' AND osm_lines.highway IN ('motorway','trunk','primary')
AND osm_points.name=='Meribel' AND osm_points.place IN ('city', 'town')
ORDER BY ST_Distance(osm_lines.way, osm_points.way) ASCENDING
LIMIT 1

If the highway is broken down in many small segments, that should be good enough.

And if you want to find exactly which point of the highway is closest to the city:

SELECT ST_Line_Interpolate_Point(osm_lines.way, ST_Line_Locate_Point(osm_lines.way, osm_points.way)) WHERE ...

Of course, since cities are actually polygons, you could also do funky requests on the polygons.

See the whole PostGIS reference manual for details.

Query the Nominatim API

You will be doing essentially the same thing, but instead of doing SQL queries to retrive city boundaries and highways, you will use the Nominatim API.

The general idea is:

  • retrieve the city you're looking for,
  • retrieve the highway you're looking for,
  • find where they intersect,
  • compute the GPS coordinates.
@JamesChevalier
Copy link

@mojtabajml
Copy link

What about reverse-geocoding? Can you give me an example query on osm data?

@stavskal
Copy link

I would love to hear about the same procedure applied to reverse geocoding as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment