Skip to content

Instantly share code, notes, and snippets.

@richardlent
Last active March 10, 2020 17:29
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save richardlent/f101e5d1f4dfad3d9916442fdb0a96c8 to your computer and use it in GitHub Desktop.
Save richardlent/f101e5d1f4dfad3d9916442fdb0a96c8 to your computer and use it in GitHub Desktop.
This program uses the Python geocoder library (https://geocoder.readthedocs.io/)and the OpenStreetMap/Nominatim database to geocode a plain text CSV (comma-separated values)file containing street addresses. The pandas library is used for data wrangling.
# geocode.py
# Richard A. Lent, Thursday, September 19, 2019 at 11:43 AM.
# Inspired by https://janakiev.com/blog/geocoding-in-python/ .
# Tuesday, March 10, 2020. Fixed display of progress dots.
"""Python geocoder
This program uses the Python geocoder library (https://geocoder.readthedocs.io/)
and the OpenStreetMap/Nominatim database to geocode a plain text CSV (comma-separated values)
file containing street addresses. The pandas library is used for data wrangling.
Run the program from the command line like this:
python geocode.py input.csv output.csv
The input CSV file should look like this (but you can have additional variables):
label,address
"Holy Cross","1 College Street, Worcester, Massachusetts"
"Apple","One Infinite Loop, Cupertino, CA 95014"
"Google","1600 Amphitheatre Parkway, Mountain View, CA 94043"
"O'Hare Airport","ORD, Chicago, IL 60666"
"Paris","Centre Pompidou, Paris, FR"
"Times Square","Times Square, NY"
Be sure to use the name "address" for the address string.
Enclose label and address strings in double quotes to deal with
embedded commas, apostrophes, and other oddities.
The same data contained in the input file will be written to the
output file, in CSV format, with the addition of the latitude and longitude.
If an address cannot be geocoded the latitude and longitude fields
will be blank (NULL).
The geocoding loop has a built-in time delay of 1 second between each API
call to deal with Nominatim's allowed maximum of 1 request per second.
The code was tested with Anaconda Python 3.7.3. Your mileage may vary.
"""
import geocoder
import time
import sys
import pandas as pd
import datetime
if len(sys.argv) < 3:
print(__doc__)
print("Insufficient command line arguments specified,\n")
print("e.g., python geocode.py input.csv output.csv\n")
sys.exit("\nProgram terminated. Bad command line arguments.")
addresses = pd.read_csv(sys.argv[1])
addresses['latitude'] = 0
addresses['longitude'] = 0
print("Start: ", datetime.datetime.now())
for i in range(len(addresses)):
g = geocoder.osm(addresses.loc[i, 'address'])
addresses.loc[i, 'latitude'] = g.lat
addresses.loc[i, 'longitude'] = g.lng
# Sleep to ensure to be below 1 request per second.
# See https://operations.osmfoundation.org/policies/nominatim/.
time.sleep(1)
sys.stdout.write(".")
sys.stdout.flush()
print("\nDone: ", datetime.datetime.now())
addresses.to_csv(sys.argv[2], index=None, header=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment