Skip to content

Instantly share code, notes, and snippets.

@johnbaums
Last active February 15, 2021 02:18
Show Gist options
  • Star 13 You must be signed in to star a gist
  • Fork 4 You must be signed in to fork a gist
  • Save johnbaums/26e8091f082f2b3dd279 to your computer and use it in GitHub Desktop.
Save johnbaums/26e8091f082f2b3dd279 to your computer and use it in GitHub Desktop.
Convert raster data to a ESRI polygon shapefile and (optionally) a SpatialPolygonsDataFrame
polygonizer <- function(x, outshape=NULL, pypath=NULL, readpoly=TRUE,
fillholes=FALSE, aggregate=FALSE,
quietish=TRUE) {
# x: an R Raster layer, or the file path to a raster file recognised by GDAL
# outshape: the path to the output shapefile (if NULL, a temporary file will
# be created)
# pypath: the path to gdal_polygonize.py or OSGeo4W.bat (if NULL, the function
# will attempt to determine the location)
# readpoly: should the polygon shapefile be read back into R, and returned by
# this function? (logical)
# fillholes: should holes be deleted (i.e., their area added to the containing
# polygon)
# aggregate: should polygons be aggregated by their associated raster value?
# quietish: should (some) messages be suppressed? (logical)
if (isTRUE(readpoly) || isTRUE(fillholes)) require(rgdal)
if (isTRUE(aggregate)) require(rgeos)
if (is.null(pypath)) {
cmd <- Sys.which('OSGeo4W.bat')
pypath <- 'gdal_polygonize'
if(cmd=='') {
cmd <- 'python'
pypath <- Sys.which('gdal_polygonize.py')
if (!file.exists(pypath))
stop("Could not find gdal_polygonize.py or OSGeo4W on your system.")
}
}
if (!is.null(outshape)) {
outshape <- sub('\\.shp$', '', outshape)
f.exists <- file.exists(paste(outshape, c('shp', 'shx', 'dbf'), sep='.'))
if (any(f.exists))
stop(sprintf('File already exists: %s',
toString(paste(outshape, c('shp', 'shx', 'dbf'),
sep='.')[f.exists])), call.=FALSE)
} else outshape <- tempfile()
if (is(x, 'Raster')) {
require(raster)
writeRaster(x, {f <- tempfile(fileext='.tif')})
rastpath <- normalizePath(f)
} else if (is.character(x)) {
rastpath <- normalizePath(x)
} else stop('x must be a file path (character string), or a Raster object.')
system2(cmd, args=(
sprintf('"%s" "%s" %s -f "ESRI Shapefile" "%s.shp"',
pypath, rastpath, ifelse(quietish, '-q ', ''), outshape)))
if(isTRUE(aggregate)||isTRUE(readpoly)||isTRUE(fillholes)) {
shp <- readOGR(dirname(outshape), layer=basename(outshape),
verbose=!quietish)
} else return(NULL)
if (isTRUE(fillholes)) {
poly_noholes <- lapply(shp@polygons, function(x) {
Filter(function(p) p@ringDir==1, x@Polygons)[[1]]
})
pp <- SpatialPolygons(mapply(function(x, id) {
list(Polygons(list(x), ID=id))
}, poly_noholes, row.names(shp)), proj4string=CRS(proj4string(shp)))
shp <- SpatialPolygonsDataFrame(pp, shp@data)
if(isTRUE(aggregate)) shp <- aggregate(shp, names(shp))
writeOGR(shp, dirname(outshape), basename(outshape),
'ESRI Shapefile', overwrite=TRUE)
}
if(isTRUE(aggregate) & !isTRUE(fillholes)) {
shp <- aggregate(shp, names(shp))
writeOGR(shp, dirname(outshape), basename(outshape),
'ESRI Shapefile', overwrite=TRUE)
}
ifelse(isTRUE(readpoly), return(shp), return(NULL))
}
@johnbaums
Copy link
Author

johnbaums commented Feb 24, 2016

Example

library(rasterVis)
download.file('https://www.dropbox.com/s/tk3kg2oce4h2snd/NEcountries.asc.zip?dl=1',
              destfile={f <- tempfile()}, quiet=TRUE, cacheOK=FALSE,
              mode='wb')
unzip(f, exdir = d <- tempdir() )
r <- raster(file.path(d, 'NEcountries.asc'), crs='+proj=longlat')
levelplot(r, margin=FALSE, col.regions=rainbow)

polyg1

p <- polygonizer(r)
p

## class       : SpatialPolygonsDataFrame 
## features    : 302 
## extent      : -180, 180, -90, 83.6  (xmin, xmax, ymin, ymax)
## coord. ref. : +proj=longlat +ellps=WGS84 +no_defs 
## variables   : 1
## names       :  DN 
## min values  :   1 
## max values  : 177

spplot(p, col.regions=rainbow(200))

polyg2

@JoshOBrien
Copy link

Hi again. FWIW, this doesn't work for me because it relies on cmd <- Sys.which('OSGeo4W.bat'), and while I've got C:\OSGeo4W64\bin on my search path, I don't have C:\OSGeo4W64 on there. One possible fix (?) would be to do cmd <- ifelse(file.exists("C:/OSGeo4W64/OSGeo4W.bat"), "C:/OSGeo4W64/OSGeo4W.bat", Sys.which("OSGeo4W.bat")).

Obviously though, fixing this so that it'll work in every possible setting is a Herculean task: that way lies madness, or at least something like gdalUtils::gdal_setInstallation... So maybe it'd be better to just add cmd as one more argument to polygonizer(), perhaps with default value like this: cmd = ifelse(.Platform$OS=="windows", "C:/OSGeo4W64/OSGeo4W.bat", "python"). That way, if they need to, users can just set the command by hand from the call to polygonizer().

Also -- again just perhaps on my box -- download.file() won't read from the dropbox URL as written, but will if I change the http to https.

@johnbaums
Copy link
Author

Sorry @JoshOBrien.. just saw this. Good idea re the cmd arg - makes total sense. And thanks for the heads up about the Dropbox link.

@adrfantini
Copy link

Now that the package sf is pretty mature, it'd be nice if this nice little function returned an sf object instead of a SpatialPolygonsDataFrame.

@adrfantini
Copy link

adrfantini commented Mar 20, 2018

Also note that gdal_polygonize.py uses the function GDALPolygonize, which truncates floats to integers. There is a C function GDALFPolygonize which does not, but no correspective python program unfortunately.

@johnbaums
Copy link
Author

Great idea, @adrfantini - I'll implement the sf suggestion when I get a chance. What's the implication of the coercion to integer? Is that truncation of vertex coordinates, or of the data in the attribute table? If the latter, I believe the only field returned with the shapefile is an integer sequence of polygon indices anyway (it's been a while since I've used the function - maybe I'm mistaken).

@hnbendini
Copy link

hnbendini commented Dec 4, 2018

Hi,
I remember I succeed using this awesome function. Very useful! But now, I'm struggling a bit, don´t know why.
The error is:
Traceback (most recent call last):
File "C:\PROGRA1\QGIS21.18\bin\GDAL_P~2.PY", line 35, in
from osgeo import gdal

Even if I set "readPoly to FALSE, I have the follow error:
Traceback (most recent call last):
File "C:\PROGRA1\QGIS21.18\bin\GDAL_P~2.PY", line 35, in
from osgeo import gdal
ImportError: No module named osgeo
ImportError: No module named osgeo
Show Traceback

Rerun with Debug
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
Cannot open data source

@hnbendini
Copy link

Hi,
I remember I succeed using this awesome function. Very useful! But now, I'm struggling a bit, don´t know why.
The error is:
Traceback (most recent call last):
File "C:\PROGRA1\QGIS21.18\bin\GDAL_P~2.PY", line 35, in
from osgeo import gdal

Even if I set "readPoly to FALSE, I have the follow error:
Traceback (most recent call last):
File "C:\PROGRA1\QGIS21.18\bin\GDAL_P~2.PY", line 35, in
from osgeo import gdal
ImportError: No module named osgeo
ImportError: No module named osgeo
Show Traceback

Rerun with Debug
Error in ogrInfo(dsn = dsn, layer = layer, encoding = encoding, use_iconv = use_iconv, :
Cannot open data source

For some reason, Sys.which("OSGeo4W.bat") does not work. Maybe because in my computer, QQIS has spaces on the path. I did this:
cmd<-paste0(dirname(dirname(Sys.which("QGIS"))),"/OSGeo4W.bat")
And it worked.

@naurban
Copy link

naurban commented Jan 3, 2019

Is there a way to extend this to an 8-way cell neighborhood instead of the default 4-way?

@johnbaums
Copy link
Author

@naurban - yes you can do something like this with the resulting polygon (p, and determining buffer width from the resolution of your raster, r):

b <- disaggregate(buffer(p, width=0.25*sqrt(sum(res(r)^2))))
b_spdf <- SpatialPolygonsDataFrame(b, data.frame(DN8=seq_along(b)))
p8 <- intersect(b_spdf, p)

I don't have gdal set up correctly on this machine, so haven't tested it, but I believe this will return a SpatialPolygonsDataFrame with a new field (DN8) giving polygon IDs that are grouped by 8-way connectedness.

@Chris-Has
Copy link

Any reason why I would get this error when I attempt to run polygonizer on a raster object?
Error in setwd(dirname(pypath)) : cannot change working directory

@johnbaums
Copy link
Author

@Chris-Has check what dirname(pypath) returns. Seems like that path doesn’t exist, or maybe pypath is NA?

@rafaleo
Copy link

rafaleo commented Jan 17, 2020

I have problem at this line:

system2(cmd, args=(
    sprintf('"%s" "%s" %s -f "ESRI Shapefile" "%s.shp"', 
pypath, rastpath, ifelse(quietish, '-q ', ''), outshape)))

Because with given data I have 2 versions of command string:

  1. While running the above code its quoted, 2) when I paste this directly to the console it's unquoted and it works.
    How can I unquote 'system2' arguments and command?

@johnbaums
Copy link
Author

@rafaleo you can use sf now:

polygonize <- function(srcfile, dstfile, mask, ogr_format, fieldname, band, connect8=FALSE) {
  options <- if(isTRUE(connect8)) 'CONNECT8=8' else character(0)
  contour_options <- character(0)
  if(missing(mask)) mask <- character(0)
  .Call("_sf_CPL_polygonize", PACKAGE = "sf", srcfile, mask, 
        'GTiff', ogr_format, layer=dstfile, options, 0, #iPixValField, 
        contour_options, use_contours=FALSE, use_integer=TRUE)
}
  
library(sf)
library(raster)
library(gdalUtilities) # for example raster
r <- raster::raster(system.file(package='gdalUtilities', 'extdata', 'tahoe.tif')) > 250
raster::writeRaster(r, f <- tempfile(fileext='.tif'))
polygonize(f, f2 <- tempfile(fileext='.shp'), ogr_format='ESRI Shapefile', connect8=TRUE)
plot(sf::read_sf(f2)) 

@rafaleo
Copy link

rafaleo commented Jan 18, 2020

@johnbaums Thank you for those alternatives. Thankfully I've managed to repair my python and can use that code as well.

@lovalery
Copy link

@johnbaums Dear John, I would like to polygonize a list of rasters using the lapply function but the polygonizer function seems to accept only one raster object (if I'm not wrong).
So, I was wondering if it would be possible for you to slightly modify the script of the function so that a list of rasters can be processed.
Thank you in advance for your answer.

@johnbaums
Copy link
Author

johnbaums commented Jun 30, 2020

@lovalery I don't think you should have any issues using lapply on a list of rasters:

library(raster)
L <- replicate(3, raster(matrix(runif(100), nrow=10)))
polys <- lapply(L, polygonizer)

See also this comment for an approach that uses sf (with a self-contained version of GDAL/OGR).

@lovalery
Copy link

lovalery commented Jul 1, 2020

@johnbaums Thank you very much for your reply.
lapply doesn't work in my case (I'm not sure why because, as you say, it should work).
I've tried the approach using 'sf' object and it works. That's the main thing :-)
Thanks again for your advice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment