Skip to content

Instantly share code, notes, and snippets.

@Frackalyzer
Last active August 22, 2017 04:05
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Frackalyzer/e279fc6ad8ec8886b41c0234b9118bd6 to your computer and use it in GitHub Desktop.
Save Frackalyzer/e279fc6ad8ec8886b41c0234b9118bd6 to your computer and use it in GitHub Desktop.
Python 3 reverse geo-coder for NCEDC-formatted comma-separated-value earthquake files.

PyGeoCoderRev

Python reverse geo-coder for NCEDC-formatted comma-separated-value earthquake files.

Synopsis

This project, as currently implemented, is intended to reverse-geocode NCEDC-formatted earthquake comma-separated-value (CSV) files. Reverse-geocoding is the process of obtaining administrative units (e.g. country, state, county/province, city/village) from latitude and longitude (lat-long) coordinates. With a modicum of effort, this program could be modified so as to reverse-geocode most any file or database table.

Source(s) of Earthquake Data in CSV format

  • ANSS Composite Catalog Search
    • Choose Catalog in CSV format
    • Enter Start date,time value with a comma separating the date (yyyy/MM/dd) and time (HH:mm:ss) value
    • Enter End date,time value with a comma separating the date (yyyy/MM/dd) and time (HH:mm:ss) value, leaving this blank to default to today's date, time value.
    • Enter Minimum magnitude value, recommended minimum value, especially for fracking research, is 2.0 or less
    • Leave Maximum magnitude blank so that all earthquakes above the Minimum magnitude will be included
    • Choose Send output to an anonymous FTP file on the NCEDC within the "Select output mechanism" section
    • Enter 10000000 in the Line limit on output box (i.e. 10,000,000 rows max)
    • Click on the Submit request button
    • On the "NCEDC_Search_Results" web page that appears after the Submit request button is pressed, wait until a Url link appears, right-click on it and click on the Save link as... sub-menu item, and save the file to a location of your choosing.
    • The saved file mentioned in the bullet above, and nominally entitled catsearch.12345 with the 12345 being a variable value, is the file to which you'll point the GeoCoderRev.py script when you invoke it to reverse-geocode the rows therein.

Invoking the GeoCoderRev.py program

  • The simplest invocation of the program is as follows:
    • Navigate the folder holding the PyGeoCoderRev project.
    • Open a command terminal from within that folder
      • Windows: Shift-Right-click within the project's folder, choose Open command window here
      • Linux (Ubuntu with nautilus-open-terminal installed): Right-click within the project's folder, choose Open terminal
      • Linux (Ubuntu without nautilus-open-terminal installed): Ctrl-Alt-T, then navigate to the project's folder
    • Within the command terminal, enter the following command:
      • python GeoCoderRev.py --src-file-path=/path/to/the/downloaded/NCEDC/earthquake/CSV/file --out-file-path=/path/to/the/resulting/reverse-geocoded/NCEDC/earthquake/CSV/file

Command-line arguments

The GeoCoderRev.py program has more command-line options than just the two shown in the example above, a quick explanation of them follows:

  • --src-file-path: The required path to the raw NCEDC-formatted earthquake source file in CSV format.

  • --src-delimiter: The character that separates each value within the file. The default is a comma ,.

  • --src-quotechar: The character that surrounds each value within the file, should it contain a delimiter. The default is a double-quote ".

  • --src-quotemode: The quoting mode, which defaults to QUOTE_MINIMAL. Valid choices are QUOTE_MINIMAL, QUOTE_NONE, QUOTE_ALL, QUOTE_NONNUMERIC.

  • --out-file-path: The path to the reverse-geocoded NCEDC-formatted earthquake output file in CSV format.

  • --out-delimiter: The character that separates each value within the file. The default is a comma ,.

  • --out-quotechar: The character that surrounds each value within the file, should it contain a delimiter. The default is a double-quote ".

  • --out-quotemode: The quoting mode, which defaults to QUOTE_MINIMAL. Valid choices are QUOTE_MINIMAL, QUOTE_NONE, QUOTE_ALL, QUOTE_NONNUMERIC.

  • --out-file-name-folder: The output file's destination folder, default is None.

  • --out-file-name-prefix: The output file name's prefix, default is NCEDC_earthquakes.

  • --out-file-name-suffix: The output file name's suffix, default is _reverse_geocoded.

  • --out-file-name-extension: The output file name's extension, default is .csv.

  • --max-rows: Mostly intended to be used for testing purposes, this integer argument defaults to 0, which means unlimited rows will be processed. Any positive integer above zero will result in just that many rows being processed, for example 10 means only ten rows would be processed.

  • --flush-rows: This integer value controls how often a progress message is output to the console as well as when any buffered rows are "flushed" to the output file.

  • -h or --help: Specifying this argument will output command-line usage information to the console, which describes the command-line arguments for this program, and then terminates the program without any further processing.

Installation

PyGeoCoderRev is a Python 3 project, and as such a compatible Python 3 interpreter is required. In addition, the program utilizes the reverse-geocoder package, of which installation instructions appear below.

For first time installation,

$ pip install reverse_geocoder

Or upgrade an existing installation using,

$ pip install --upgrade reverse_geocoder

Package can be found on PyPI.

Dependencies (Python 3 packages)

  1. scipy
  2. numpy

License

Copyright © 2016 Khepry Quixote

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

Apache License, Version 2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

# ========================================================================
#
# Copyright © 2016 Khepry Quixote
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
#
# ========================================================================
import argparse
import csv
import io
import os
from pprint import pprint
from time import time
import reverse_geocoder as rg
pgm_name = 'GeoCoderRev.py'
pgm_version = '1.0'
quotemode_choices = ['QUOTE_MINIMAL', 'QUOTE_NONE', 'QUOTE_ALL', 'QUOTE_NONNUMERIC']
def quotemode_xlator(quote_mode_str):
quote_mode_val = csv.QUOTE_MINIMAL
if quote_mode_str.upper() == 'QUOTE_MINIMAL':
quote_mode_val = csv.QUOTE_MINIMAL
elif quote_mode_str.upper() == 'QUOTE_ALL':
quote_mode_val = csv.QUOTE_ALL
elif quote_mode_str.upper() == 'QUOTE_NONE':
quote_mode_val = csv.QUOTE_NONE
elif quote_mode_str.upper() == 'QUOTE_NONNUMERIC':
quote_mode_val = csv.QUOTE_NONNUMERIC
return quote_mode_val
arg_parser = argparse.ArgumentParser(prog='%s' % pgm_name, description='Reverse geo-code an NCEDC-formatted earthquake CSV file.')
arg_parser.add_argument('--src-file-path', required=True, help='source file path')
arg_parser.add_argument('--src-delimiter', default=',', help='source file delimiter character')
arg_parser.add_argument('--src-quotechar', default='"', help='source file quote character')
arg_parser.add_argument('--src-quotemode', dest='src_quotemode_str', default='QUOTE_MINIMAL', choices=quotemode_choices, help='source file quoting mode (default: %s)' % 'QUOTE_MINIMAL')
arg_parser.add_argument('--out-file-path', default=None, help='output file path (default: None, same path as source file)')
arg_parser.add_argument('--out-delimiter', default=',', help='output file delimiter character')
arg_parser.add_argument('--out-quotechar', default='"', help='output file quote character')
arg_parser.add_argument('--out-quotemode', dest='out_quotemode_str', default='QUOTE_MINIMAL', choices=quotemode_choices, help='output file quoting mode (default: %s)' % 'QUOTE_MINIMAL')
arg_parser.add_argument('--out-file-name-folder', default=None, help='output file name folder (default: None')
arg_parser.add_argument('--out-file-name-prefix', default='NCEDC_earthquakes', help='output file name prefix (default: NCEDC_earthquakes')
arg_parser.add_argument('--out-file-name-suffix', default='_reverse_geocoded.csv', help='output file name suffix (default: _reverse_geocoded)')
arg_parser.add_argument('--out-file-name-extension', default='.csv', help='output file name extension (default: .csv)')
arg_parser.add_argument('--max-rows', type=int, default=0, help='maximum rows to process, 0 means unlimited')
arg_parser.add_argument('--flush-rows', type=int, default=1000, help='flush rows interval')
arg_parser.add_argument('--version', action='version', version='version=%s %s' % (pgm_name, pgm_version))
args = arg_parser.parse_args()
if args.out_file_path is None:
if args.out_file_name_folder is None:
args.out_file_name_folder = os.path.dirname(args.src_file_path)
args.out_file_path = os.path.join(args.out_file_name_folder, args.out_file_name_prefix + args.out_file_name_suffix + args.out_file_name_extension)
args.src_quotemode_enm = quotemode_xlator(args.src_quotemode_str)
args.out_quotemode_enm = quotemode_xlator(args.out_quotemode_str)
args.max_rows = abs(args.max_rows)
args.flush_rows = abs(args.flush_rows)
if args.src_file_path.startswith('~'):
args.src_file_path = os.path.expanduser(args.src_file_path)
args.src_file_path = os.path.abspath(args.src_file_path)
if args.out_file_path.startswith('~'):
args.out_file_path = os.path.expanduser(args.outfile_path)
args.out_file_path = os.path.abspath(args.out_file_path)
print ('Reverse-geocoding source NCEDC earthquakes file: "%s"' % args.src_file_path)
print ('Outputting to the target NCEDC earthquakes file: "%s"' % args.out_file_path)
print ('')
print('Command line args:')
pprint (vars(args))
print('')
# beginning time hack
bgn_time = time()
# initialize
# row counters
row_count = 0
out_count = 0
# if the source file exists
if os.path.exists(args.src_file_path):
# open the target file for writing
with io.open(args.out_file_path, 'w', newline='') as out_file:
# open the source file for reading
with io.open(args.src_file_path, 'r', newline='') as src_file:
# open a CSV file dictionary reader object
csv_reader = csv.DictReader(src_file, delimiter=args.src_delimiter, quotechar=args.src_quotechar, quoting=args.src_quotemode_enm)
# obtain the field names from
# the first line of the source file
fieldnames = csv_reader.fieldnames
# append the reverse geo-coding
# result fields to field names list
fieldnames.append('cc')
fieldnames.append('admin1')
fieldnames.append('admin2')
fieldnames.append('name')
# instantiate the CSV dictionary writer object with the modified field names list
csv_writer = csv.DictWriter(out_file, delimiter=args.out_delimiter, quotechar=args.out_quotechar, quoting=args.out_quotemode_enm, fieldnames=fieldnames)
# output the header row
csv_writer.writeheader()
# beginning time hack
bgn_time = time()
# reader row-by-row
for row in csv_reader:
row_count += 1
# convert string lat/lon
# to floating-point values
latitude = float(row['Latitude'])
longitude = float(row['Longitude'])
# instantiate coordinates tuple
coordinates = (latitude, longitude)
# search for the coordinates
# returning the cc, admin1, admin2, and name values
# using a mode 1 (single-threaded) search
results = rg.search(coordinates, mode=1) # default mode = 2
# if results obtained
if results is not None:
# result-by-result
for result in results:
# map result values
# to the row values
row['cc'] = result['cc']
row['admin1'] = result['admin1']
row['admin2'] = result['admin2']
row['name'] = result['name']
# output a row
csv_writer.writerow(row)
out_count += 1
else:
# map empty values
# to the row values
row['cc'] = ''
row['admin1'] = ''
row['admin2'] = ''
row['name'] = ''
# output a row
csv_writer.writerow(row)
out_count += 1
# if row count equals or exceeds max rows
if args.max_rows > 0 and row_count >= args.max_rows:
# break out of reading loop
break
# if row count is modulus
# of the flush count value
if row_count % args.flush_rows == 0:
# flush accumulated
# rows to target file
out_file.flush()
# ending time hack
end_time = time()
# compute records/second
seconds = end_time - bgn_time
if seconds > 0:
rcds_per_second = row_count / seconds
else:
rcds_per_second = 0
# output progress message
message = "Processed: {:,} rows in {:,.0f} seconds @ {:,.0f} records/second".format(row_count, seconds, rcds_per_second)
print(message)
else:
print ('NCEDC-formatted Earthquake file not found: "%s"' % args.src_file_path)
# ending time hack
end_time = time()
# compute records/second
seconds = end_time - bgn_time
if seconds > 0:
rcds_per_second = row_count / seconds
else:
rcds_per_second = row_count
# output end-of-processing messages
message = "Processed: {:,} rows in {:,.0f} seconds @ {:,.0f} records/second".format(row_count, seconds, rcds_per_second)
print(message)
print('Output file path: "%s"' % args.out_file_path)
print("Processing finished, {:,} rows output!".format(out_count))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment