Skip to content

Instantly share code, notes, and snippets.

@arkottke
Last active December 16, 2015 23:39
Show Gist options
  • Save arkottke/5515096 to your computer and use it in GitHub Desktop.
Save arkottke/5515096 to your computer and use it in GitHub Desktop.
Discussion of how to use shape files.
#!/usr/bin/env python
# encoding: utf-8
import csv
from osgeo import ogr, osr
class AbstractLookup:
def __call__(self, lon, lat):
if lon is None or lat is None:
return [None for _ in self.fields]
# # Create the point and assign it a spatial reference
point = ogr.CreateGeometryFromWkt("POINT(%s %s)" % (lon, lat))
wgs84 = osr.SpatialReference()
wgs84.ImportFromEPSG(4326)
point.AssignSpatialReference(wgs84)
point.TransformTo(self.layer.GetSpatialRef())
# Need to reset the reading to reset the loop.
self.layer.ResetReading()
for i in range(self.layer.GetFeatureCount()):
feature = self.layer.GetFeature(i)
# Need to close the ring prior to every reading.
feature.GetGeometryRef().CloseRings()
if feature.GetGeometryRef().Contains(point):
return [feature.GetField(index)
for index, name in self.fields]
else:
return [None for _ in self.fields]
class BasinLookup(AbstractLookup):
def __init__(self):
self.dataset = ogr.Open(
'./data/Sedimentary_Basins/'
'Sedimentary_Basins_of_the_United_States.shp')
self.layer = self.dataset.GetLayer(0)
self.fields = [
(8, 'name'),
(5, 'basin_type'),
(4, 'age'),
(3, 'tectonic_setting'),
(6, 'source'),
]
class PhysiographicLookup(AbstractLookup):
def __init__(self):
self.dataset = ogr.Open(
'./data/physio_shp/physio.shp')
self.layer = self.dataset.GetLayer(0)
self.fields = [
(6, 'Division'),
(7, 'Province'),
(8, 'Section'),
]
def to_float(s):
try:
return float(s)
except ValueError:
return None
if __name__ == '__main__':
sources = [('sites-basin.csv', BasinLookup),
('sites-physio.csv', PhysiographicLookup)]
with open('./data/20130429_updated_latlons.csv', 'rt') as fin:
reader = csv.reader(fin)
next(fin)
latlons = [(to_float(row[3]), to_float(row[4])) for row in reader]
for fname, cls in sources:
lookup = cls()
with open(fname, 'wt') as fout:
writer = csv.writer(fout, lineterminator='\n')
writer.writerow(['lat', 'lon']
+ [name for _, name in lookup.fields])
for lat, lon in latlons:
writer.writerow([lat, lon] + lookup(lon, lat))

The challenge in using Shapefiles is understanding the format of the file, which I haven't found to be well documented.

The first step is to use an interactive Python session, or better yet IPython, to explore the Shapefile. In the following example, I will look at some Sedimentary Basin data. The first step is to open the Shapefile with GDAL (which is called osgeo for some reason):

>>> import osgeo imoprt ogr, osr
>>> data_set = ogr.Open('Sedimentary_Basins/Sedimentary_Basins_of_the_United_States.shp')

Once Shapefile is open, the next step is to understand the structure of the data which is arranged into layers, features, and fields:

>>> data_set.GetLayerCount()
  1
>>> layer = data_set.GetLayer(0)
>>> layer.GetFeatureCount()
  144
>>> feature = layer.GetFeature(0)
>>> feature.GetFieldCount()
  10
>>> for i in range(feature.GetFieldCount()):
  print(i, feature.GetField(i))
    0 Atlantic Coastal Plain, Florida Peninsula, New England
    1 277953138.655
    2 3611.979
    3 Pericratonic
    4 Mesozoic
    5 Passive Margin (incl. Deltaic)
    6 Cohee and others (1962), Frezon and Finn (1988), and BOEMRE (2010)
    7 1:5,000,000
    8 West Atlantic Basin
    9 144

The steps that you might use to examine the data would depend on what you found along each step. If there are multiple layers you might have to look at each of those. From this information, I can see which fields are important for the task at hand.

The next step is to write a script that performs the lookups of points. There are a few points that I should mention.

The fields of interest are defined and given a name:

self.fields = [
        (8, 'name'),
        (5, 'basin_type'),
        (4, 'age'),
        (3, 'tectonic_setting'),
        (6, 'source'),
    ]

Given longitude and latitude pair (which I have assumed to be the WGS84 coordinate system) it is important that the coordinates are transformed into the coordinate system of the Shapefile. This transformation is done by:

point = ogr.CreateGeometryFromWkt("POINT(%s %s)" % (lon, lat))
wgs84 = osr.SpatialReference()
wgs84.ImportFromEPSG(4326)
point.AssignSpatialReference(wgs84)
point.TransformTo(self.layer.GetSpatialRef())

For a specific point, each feature in the map is tested to see if the point is contained by the feature:

self.layer.ResetReading()
for i in range(self.layer.GetFeatureCount()):
    feature = self.layer.GetFeature(i)
    # Need to close the ring prior to every reading.
    feature.GetGeometryRef().CloseRings()

    if feature.GetGeometryRef().Contains(point):
        return [feature.GetField(index)
                for index, name in self.fields]
else:
    return [None for _ in self.fields]

This is relatively simple logic. Two things worth mentioning is that rings are closed if they aren't closed within the Shapefile. Ia ran into one badly formed Shapefile so I include this by default, but it isn't always required. At the start of the loop there is a self.layer.ResetReading(), which resets the looping over the features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment