Skip to content

Instantly share code, notes, and snippets.

@tastatham
Last active April 24, 2020 11:23
Show Gist options
  • Save tastatham/cfed5c4fcd3a5ed74d78948d0cc78162 to your computer and use it in GitHub Desktop.
Save tastatham/cfed5c4fcd3a5ed74d78948d0cc78162 to your computer and use it in GitHub Desktop.
dask_geomodeling_zonalstats.py
def compute_zonal_stats(raster, vector, fid, col, statistics,tile,mode):
"""
Calculate zonal statistics using dask
Parameters
----------
raster : A GeoDataFrame to create multiple copies based on the list of density thresholds
vector : ESRI Shapefile vector grid (1km)
fid : Primary key (default is fid for exporting ESRI Shapefiles)
col : Column name for zonal stats
statistics : Zonal statistics aggregation
tile : How large to split the raster database to fit in memory
mode : Centroid / extent
Returns
-------
A GeoDataframe containing the computed zonal statistics
"""
# construct the View
rs = RasterFileSource(raster)
gs = GeometryFileSource(vector, id_field=fid)
# Define zonal statistics to calculate
agg = AggregateRaster(gs, rs, column_name=col, statistic=statistics)
# Build the data request
x1, x2, y1, y2 = rs.geometry.GetEnvelope()
projection = rs.projection
request = dict(geometry=box(x1, y1, x2, y2), projection=projection)
# Tile geometry
tiled = parallelize.GeometryTiler(agg, size=tile, projection=projection)
# Print statement
print('Geometry Tiler done for %s' %(raster))
# Define graph to store how to compute using dask
graph, name = tiled.get_compute_graph(**request, mode=mode)
# Print statement
print('Compute graph done for %s' %(raster))
# Compute using dask
tuple_output = get(graph,name) # Dask compute always returns tuples
# Convert to GeoDataFrame
gdf = list(tuple_output.items())[0][1]
return(gdf)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment