Skip to content

Instantly share code, notes, and snippets.

@iandees
Last active September 7, 2016 20:26
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save iandees/2cede1131d15960b48bb60bb70389feb to your computer and use it in GitHub Desktop.
Save iandees/2cede1131d15960b48bb60bb70389feb to your computer and use it in GitHub Desktop.
import unicodecsv
import argparse
import sys
import simplejson as json
from shapely import wkt
from shapely.geometry import mapping
import shapely.speedups
shapely.speedups.enable()
parser = argparse.ArgumentParser()
parser.add_argument('roadfile', type=argparse.FileType('r'))
parser.add_argument('featnamefile', type=argparse.FileType('r'))
parser.add_argument('outfile', type=argparse.FileType('w'), nargs='?', default=sys.stdout)
args = parser.parse_args()
with open('2016_feature_name_directionals.csv', 'r') as f:
directionals = dict(
(row['Direction Code'], row['Expanded Full Text'])
for row in unicodecsv.DictReader(f)
)
with open('2016_feature_name_qualifiers.csv', 'r') as f:
qualifiers = dict(
(row['Qualifier Code'], row['Expanded Full Text'])
for row in unicodecsv.DictReader(f)
)
with open('2016_feature_name_types.csv', 'r') as f:
types = dict(
(row['Type Code'], row['Expanded Full Text'])
for row in unicodecsv.DictReader(f)
)
featnames = dict()
for featname in unicodecsv.DictReader(args.featnamefile):
linearid = featname['LINEARID']
if linearid not in featnames:
featnames[linearid] = {
'MTFCC': featname.get('MTFCC'),
'NAME_EXPANDED': ' '.join(filter(None, [
qualifiers.get(featname.get('PREQUAL')),
directionals.get(featname.get('PREDIR')),
types.get(featname.get('PRETYP')),
featname.get('NAME'),
types.get(featname.get('SUFTYP')),
directionals.get(featname.get('SUFDIR')),
qualifiers.get(featname.get('SUFQUAL')),
]))
}
for road in unicodecsv.DictReader(args.roadfile):
linearid = road['LINEARID']
featname = featnames.get(linearid)
parsed_shape = wkt.loads(road['WKT'])
out = {
'type': 'Feature',
'geometry': mapping(parsed_shape),
'properties': {
'LINEARID': linearid,
}
}
if featname:
out['properties'].update(featname)
args.outfile.write(json.dumps(out) + '\n')

Building TIGER 2016 Road Tiles

US Census Bureau's TIGER dataset is one of the primary nationwide geographic datasets. Roughly 10 years ago, it was imported into OpenStreetMap and huge swaths of it haven't been touched since, even though the TIGER dataset is updated yearly. Based on earlier work that OpenStreetMap US did, Eric Fischer's TIGER2015 layer provides an overlay that helps mappers identify roads that are missing from OpenStreetMap and gives a way to find street names for roads that might not have names in OpenStreetMap.

These instructions replicate this layer with the more recent TIGER 2016 release. The TIGER dataset includes a ROADS and FEATNAMES dataset. The ROADS dataset includes geometries and a linearid that can be joined with the linearid in the FEATNAMES dataset. In FEATNAMES the road names are broken into several pieces, which we expand (unabbreviate) and concatenate to form a display label. Finally, the resulting joined data is built into a mbtiles file with tippecanoe and uploaded to MapBox Studio for styling.

  1. Download TIGER data. ROADS for the geometries, FEATNAMES for the split-apart road names.

    wget -e robots=off --quiet --mirror --no-parent --continue http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/
    wget -e robots=off --quiet --mirror --no-parent --continue http://www2.census.gov/geo/tiger/TIGER2016/ROADS/
  2. Unzip the TIGER data into per-county directories.

    find www2.census.gov/geo/tiger/TIGER2016/ROADS/ -name '*.zip' -print | xargs -t -L1 --max-procs=4 /bin/sh -c 'unzip -q file -d $(basename file _roads.zip)'
    find www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/ -name '*.zip' -print | xargs -t -L1 --max-procs=4 /bin/sh -c 'unzip -q file -d $(basename file _featnames.zip)'
  3. Convert the ROADS Shapefiles and FEATNAMES DBF files into CSVs.

    find . -name '*.dbf' -print0 | xargs -t -0 --max-procs=4 -Ifile ogr2ogr -f CSV file.csv file
    find . -name '*.shp' -print0 | xargs -t -0 --max-procs=4 -Ifile ogr2ogr -lco GEOMETRY=AS_WKT -f CSV file.csv file
  4. Use the included Python script to join the ROADS and FEATNAMES data sets and expand the abbreviated road names. The resulting data will be written as newline-separated GeoJSON features.

    find . -name '*_roads.shp' -print | xargs -t -L1 --max-procs=4 -Ifile /bin/sh -c 'base=$(basename file _roads.shp) && python merge_tiger_roads.py $base/${base}_roads.shp.csv $base/${base}_featnames.dbf.csv $base/$base.expanded.json'
  5. Run the resulting CSV through tippecanoe to generate an mbtiles file.

    (find . -type f -name '*.expanded.csv' -exec cat {} \;) | ./tippecanoe-master/tippecanoe -o expanded.mbtiles
  6. Send the mbtiles file to MapBox for rendering.

@wboykinm
Copy link

# USAGE
# ./script.sh <filenamecontainingallcountyfipscodes.txt>

set -eou 

FIPSES=$1

ianfightsatiger() {
  echo "starting county $1";
  wget -c http://www2.census.gov/geo/tiger/TIGER2016/FEATNAMES/tl_2016_$1_featnames.zip;
  wget -c http://www2.census.gov/geo/tiger/TIGER2016/ROADS/tl_2016_$1_roads.zip;
  unzip tl_2016_$1_featnames.zip;
  unzip tl_2016_$1_roads.zip;
  #ogr2ogr -f "CSV" <infile> <outfile> <something something>;
  #python <something something>;
  #mapbox upload <infile> tiles_{}; #maybe skip tippecanoe if you're hosting with mapbox anyway
  echo "done with county $1"
}
export -f ianfightsatiger

cat $FIPSES | parallel -j6 ianfightsatiger {}

@iandees
Copy link
Author

iandees commented Aug 31, 2016

Thanks @wboykinm. Since each of those steps had different kinds of load (e.g. wget waits for network, ogr waits on disk, my python code waits on CPU) I was hoping for some magical system that would smartly split up the work for me without having to run the work for each county in sequence. Just another crazy idea for now :).

@NelsonMinar
Copy link

The | parallel means it will run in parallel though. The jobs will naturally randomize so some are waiting on network, others are waiting on disk. Don't overthink it.

@NelsonMinar
Copy link

btw, for awhile I was running the openaddresses full batch job using GNU parallel. It worked great. Notes: https://nelsonslog.wordpress.com/2015/01/09/gnu-parallel-for-openaddr-process-one/

@wboykinm
Copy link

wboykinm commented Sep 7, 2016

OOoooooooooOOoooh. Thanks @NelsonMinar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment