Skip to content

Instantly share code, notes, and snippets.

@etiennebr
Last active November 13, 2023 11:24
Show Gist options
  • Save etiennebr/9515738 to your computer and use it in GitHub Desktop.
Save etiennebr/9515738 to your computer and use it in GitHub Desktop.
Transform raster or terra object to data.table
#' Transform raster to data.table
#'
#' @param x Raster* object
#' @param row.names `NULL` or a character vector giving the row names for the data frame. Missing values are not allowed
#' @param optional logical. If `TRUE`, setting row names and converting column names (to syntactic names: see make.names) is optional
#' @param xy logical. If `TRUE`, also return the spatial coordinates
#' @param centroids logical. If TRUE return the centroids instead of all spatial coordinates (only relevant if xy=TRUE)
#' @param sepNA logical. If TRUE the parts of the spatial objects are separated by lines that are NA (only if xy=TRUE and, for polygons, if centroids=FALSE
#' @param ... Additional arguments (none) passed to `raster::as.data.frame`
#'
#' @value returns a data.table object
#' @examples
#' logo <- brick(system.file("external/rlogo.grd", package="raster"))
#' v <- as.data.table(logo)
#' @import
as.data.table.raster <- function(x, row.names = NULL, optional = FALSE, xy=FALSE, inmem = canProcessInMemory(x, 2), ...) {
stopifnot(require("data.table"))
if(inmem) {
v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...))
} else {
tr <- blockSize(x, n=2)
l <- lapply(1:tr$n, function(i)
as.data.table(as.data.frame(getValues(x,
row=tr$row[i],
nrows=tr$nrows[i]),
row.names=row.names, optional=optional, xy=xy, ...)))
v <- rbindlist(l)
}
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
v
}
#' @param xy logical. If TRUE, the coordinates of each raster cell are included
#' @param cells logical. If TRUE, the cell numbers of each raster cell are included
#' @param na.rm logical. If TRUE, cells that have a NA value in at least one layer are removed
#' @param ... Additional arguments (none) passed to `terra::as.data.frame`
#' @value returns a data.table object
#' @examples
#' r <- rast(ncols=2, nrows=2)
#' values(r) <- 1:ncell(r)
#' as.data.table(r, xy = TRUE)
#' @importFrom terra as.data.frame
#' @importFrom data.table as.data.table
as.data.table.SpatRaster <- function(x, optional = FALSE, xy = FALSE, ...) {
stopifnot(require("data.table"))
v <- as.data.table(as.data.frame(x, optional = optional, xy = xy, ...))
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
v
}
if (!isGeneric("as.data.table")) {
setGeneric("as.data.table", function(x, ...)
standardGeneric("as.data.table"))
}
setMethod('as.data.table', signature(x='data.frame'), data.table::as.data.table)
# make sure you have terra or raster loaded (as needed) before
setMethod('as.data.table', signature(x='Raster'), as.data.table.raster)
setMethod('as.data.table', signature(x='SpatRaster'), as.data.table.SpatRaster)
@ForrestStevens
Copy link

Thank you for posting this great function, it's going to come in very handy and I appreciate you taking the time to put it up.

@ldemaz
Copy link

ldemaz commented May 20, 2015

Etienne, great function, which I have been incorporating into my work flow. One thing I noticed today when trying to convert a large raster (where inmem = FALSE), is that the xy=TRUE argument doesn't return coordinates. I rewrote the inner bits of the function to allow this, mostly lines 21-28. I am sure it could be better, as I am new to data.table, but I wrote some extra logic to allow the columns to be reordered to have x,y first given the way I joined them to the data.table (this seemed faster than cbind with two separate data.tables, but I didn't test it too much).

Anyway, it's below, and it seems to work now. Thanks again for posting this!

if(inmem) {
  v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...))
   coln <- names(x)
   if(xy) coln <- c("x", "y", coln)
   setnames(v, coln)
} else {
  tr <- blockSize(x)
  l <- lapply(1:tr$n, function(i) {
    DT <- as.data.table(as.data.frame(getValues(x, row = tr$row[i], nrows = tr$nrows[i]), ...))  
    if(xy == TRUE) {
      cells <- cellFromRowCol(x, c(tr$row[i], tr$row[i] + tr$nrows[i] - 1), c(1, ncol(x)))
      coords <- xyFromCell(x, cell = cells[1]:cells[2])
      DT[, c("x", "y") := data.frame(xyFromCell(x, cell = cells[1]:cells[2]))]
    } 
    DT
  })
  v <- rbindlist(l)
  coln <- names(x)
  if(xy) {
    coln <- c("x", "y", coln)
    setcolorder(v, coln)
  }
}
v

@philipshirk
Copy link

Great function! This reduces the processing time for counting the number of cells in my raster with a particular value from 175 seconds using raster::freq() down to 1.7 seconds!

@jacksonvoelkel
Copy link

Etienne, just dropping by to thank you for this great function!

@thiagoveloso
Copy link

Awesome function! I wonder why is it NOT shipped in the raster package?

@mxblsdl
Copy link

mxblsdl commented May 23, 2020

Recently came back to this function for an old workflow and keep running into memory allocation errors. Anyone else experience this? It is with large, but sparse rasters, lots of NA values over a large extent.

@etiennebr
Copy link
Author

I haven't used it in a while, so I can't say. If you can provide a reproducible example I could have a look.

@ptompalski
Copy link

@etiennebr fantastic tool!
Question - do you know if it is possible to adapt it to work with terra?
The terra package has terra::as.data.frame() function similar to raster::as.data.frame(), but the parameters are different. Perhaps you have looked into this already?

@etiennebr
Copy link
Author

Thanks @ptompalski! I haven't worked on this, but you're right it would be useful.

@thiagoveloso
Copy link

Maybe it's useful to take a look at Robert's feedback to an old request of mine here: rspatial/raster#55

@etiennebr
Copy link
Author

Thanks @thiagoveloso, @ptompalski I updated the gist. Let me know if it works for you!

@mikeshewring
Copy link

Updated to terra

as.data.table.raster <- function(x, row.names = NULL, optional = FALSE, xy=FALSE, inmem = terra::inMemory(x), ...) {
stopifnot(require("data.table"))
if(inmem) {
v <- as.data.table(as.data.frame(x, row.names=row.names, optional=optional, xy=xy, ...))
coln <- names(x)
if(xy) coln <- c("x", "y", coln)
setnames(v, coln)
} else {
tr <- blocks(x)
l <- lapply(1:tr$n, function(i) {
DT <- as.data.table(as.data.frame(terra::values(x, row = tr$row[i], nrows = tr$nrows[i]), ...))
if(xy == TRUE) {
cells <- terra::cellFromRowCol(x, c(tr$row[i], tr$row[i] + tr$nrows[i] - 1), c(1, ncol(x)))
coords <- terra::xyFromCell(x, cell = cells[1]:cells[2])
DT[, c("x", "y") := data.frame(terra::xyFromCell(x, cell = cells[1]:cells[2]))]
}
DT
})
v <- rbindlist(l)
coln <- names(x)
if(xy) {
coln <- c("x", "y", coln)
setcolorder(v, coln)
}
}
v
}

@etiennebr
Copy link
Author

Hi Mike, thanks for sharing! The current gist works for me with terra.

# after sourcing the gist
terra::rast(matrix(1, 2, 2)) %>% as.data.table(xy = TRUE)
#> x   y lyr.1
#> 1: 0.5 1.5     1
#> 2: 1.5 1.5     1
#> 3: 0.5 0.5     1
#> 4: 1.5 0.5     1

But maybe there are some edge cases that are not supported. Let me know if there anything missing from the actual gist that didn't work for you on terra.

@tylerhoecker
Copy link

tylerhoecker commented Nov 8, 2023

Thanks @etiennebr for this super useful function! I used it successfully with terra. I did encounter an error when applying this function over a list of SpatRasters using purrr::map, but not when using lapply. A non-repro example:

# WORKS
dt_list <- lapply(spatrast_list, FUN = as.data.table)

# ERROR
dt_list <- spatrast_list %>% 
  map(., as.data.table())

# Error message: Error in (function (classes, fdef, mtable)  : 
  unable to find an inherited method for function ‘as.data.table’ for signature ‘"missing"’

@etiennebr
Copy link
Author

Thanks @tylerhoecker, it seems that the as.data.table function should be called the same way than for lapply. Could you try:

dt_list <- spatrast_list %>% 
  map(as.data.table)
# or
dt_list <- map(spatrast_list, as.data.table)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment