Skip to content

Instantly share code, notes, and snippets.

@amsantac
Last active January 6, 2020 14:52
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save amsantac/63884509591810bdd6d9fbafa407ae2e to your computer and use it in GitHub Desktop.
Save amsantac/63884509591810bdd6d9fbafa407ae2e to your computer and use it in GitHub Desktop.
R ArcGIS species distribution modeling script
tool_exec <- function(in_params, out_params)
{
# load the required packages
cat(paste0("\n", "Loading packages...", "\n"))
if (!requireNamespace("dismo", quietly = TRUE))
install.packages("dismo")
if (!requireNamespace("raster", quietly = TRUE))
install.packages("raster")
require(dismo)
require(raster)
# read input and output parameters
occurrence_dataset = in_params[[1]]
continuous_rasters_folder = in_params[[2]]
biome_raster = in_params[[3]]
model = in_params[[4]]
out_raster = out_params[[1]]
out_table = out_params[[2]]
out_shp = out_params[[3]]
cat(paste0("\n", "Loading datasets...", "\n"))
# open the input shapefile dataset
d <- arc.open(occurrence_dataset)
occurrence <- arc.data2sp(arc.select(d))
# read the continuous raster files from a folder
rfiles1 <- list.files(path = continuous_rasters_folder, full.names = TRUE)
rasters1 <- stack(rfiles1[-grep(".aux", rfiles1)])
# read the categorical raster (e.g., biome)
raster2 <- raster(gsub("/", "\\\\", biome_raster))
# create a RasterStack object for the model predictors
predictors <- stack(rasters1, raster2)
# extract values from the RasterStack at the locations of the species occurrence
presvals <- as.data.frame(extract(predictors, occurrence))
cat(paste0("\n", "Adjusting model...", "\n"))
# adjust the chosen model: bioclim, domain, Mahalanobis or a generalized linear model
if(model == "bioclim"){
fitmodel <- bioclim(subset(presvals, select = -c(biome)))
p <- predict(predictors, fitmodel)
}
if(model == "domain"){
fitmodel <- domain(dropLayer(predictors, 'biome'), as.data.frame(occurrence)[, c('coords.x1', 'coords.x2')])
p <- predict(dropLayer(predictors, 'biome'), fitmodel)
}
if(model == "mahal"){
fitmodel <- mahal(dropLayer(predictors, 'biome'), as.data.frame(occurrence)[, c('coords.x1', 'coords.x2')])
p <- predict(dropLayer(predictors, 'biome'), fitmodel)
}
if(model == "glm"){
set.seed(0)
backgr <- randomPoints(predictors, 500)
absvals <- extract(predictors, backgr)
pb <- c(rep(1, nrow(presvals)), rep(0, nrow(absvals)))
sdmdata <- data.frame(cbind(pb, rbind(presvals, absvals)))
sdmdata[, 'biome'] = as.factor(sdmdata[, 'biome'])
fitmodel = glm(pb ~ ., data = sdmdata)
p <- predict(predictors, fitmodel)
}
# export the predicted species distribution raster object and the output table and shapefile
cat(paste0("\n", "Writing result datasets...", "\n"))
if (!is.null(out_raster) && out_raster != "NA")
writeRaster(p, out_raster)
if (!is.null(out_table) && out_table != "NA")
arc.write(out_table, presvals)
if (!is.null(out_shp) && out_shp != "NA")
arc.write(out_shp, presvals, coords = coordinates(occurrence), shape_info = arc.shapeinfo(d))
cat(paste0("\n", "Done.", "\n"))
return(out_params)
}
@MHeynen97
Copy link

Hi!
First of all, I would like to thank you for the tutorial on this script.
However, I have a question. I'm a beginner when it comes to the R language and for a school project, I would like to add the MaxEnt model.
My guess would be, based on the code above, to add the MaxEnt model with the following code:

if (model == "maxEnt"){ fitmodel <- mahal(dropLayer(predictors, 'biome'), as.data.frame(occurrence)[, c('coords.x1', 'coords.x2')]) p <- predict(dropLayer(predictors, 'biome'), fitmodel) }
However, I don't understand why this would (statistically) give different results as the Mahal and Domain model. I don't know if you could explain how this works? Or if I'm wrong?

and on another topic. When I try to run the script with my own data I keep getting the following errors. I am using continuous raster files.

Error in .rasterObjectFromFile : Cannot create a RasterLayer object from this file.
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Error: C stack usage 665199880 is too close to the limit
Failed to execute (RSDMamsantac).

Originally I thought this might be caused by low storage on my C drive. However, I still have over a 100GB available so I don't think this is the problem?

@amsantac
Copy link
Author

Hi! Thanks for your message.

These techniques may produce different results as they are based on distinct statistical theories, have different assumptions, some techniques may be better at predicting presence (occurrence) while others may be better at predicting absence, etc. You can read the 'Species distribution modeling with R' doc (https://cran.r-project.org/web/packages/dismo/vignettes/sdm.pdf) and other literature to learn more about each of these techniques.

Regarding the error, that problem is not caused by disk space but memory. One possible solution may be to use the brick() command, which points to files in disk instead of memory as stack() does. Another option would be to try the arc.raster() function from the arcgisbinding package and see how that works. arc.raster() was not part of the package when I wrote this tutorial 3 years ago. Alternatively you could try using/renting a machine with a larger memory, etc.

Hope this helps

@MHeynen97
Copy link

MHeynen97 commented Jan 2, 2020

Hi!
Thank you for your reply. I replaced the stack() by brick() and this gives me the following error:

Executing: RSDMamsantac "Aeshna_juncea_ExcelToTable Events" D:\INSECTENNETWERK\input_SSDMV1 # maxent C:\Users\merel\Documents\ArcGIS\Ooutput.tiff C:\Users\merel\Documents\ArcGIS\Default.gdb\bradypus_RSDMamsantac C:\Users\merel\Documents\ArcGIS\Aeshna_juncea_ExcelToTable_F1.shp
Start Time: Thu Jan 2 11:58:37 2020
Running script RSDMamsantac...
Loading datasets...
Error in brick : argument is of length zero
Failed to execute (RSDMamsantac).
Failed at Thu Jan 2 11:58:37 2020 (Elapsed Time: 0,12 seconds)

So I think that I might have done something wrong. I copied the code that I changed below:

`
read the continuous raster files from a folder
rfiles1 <- list.files(path = continuous_rasters_folder, full.names = TRUE)
rasters1 <- brick(rfiles1[-grep(".aux", rfiles1)])

read the categorical raster (biome) from a file
raster2 <- raster(gsub("/", "\\", biome_raster))

create a RasterBrick object for the model predictors
predictors <- brick(rasters1, raster2)

create values from the RasterBrick at the locations of the species occurence
presvals <- as.data.frame(extract(predictors, occurrence))
`

I think it's probably an easy solution and that it's probably caused by something that I didn't change in the code, but I can't figure out what.

@amsantac
Copy link
Author

amsantac commented Jan 2, 2020

I think a quick fix would be using brick(stack(...)) as follows:

rasters1 <- brick(stack(rfiles1[-grep(".aux", rfiles1)]))
predictors <- brick(stack(rasters1, raster2))

I always recommend to work first in R and make sure that the script runs without problem in R. When everything works as expected, then move to ArcGIS. The arcgisbinding package just provides an interface for the inputs and outputs.

@MHeynen97
Copy link

Hi!
So I tried the adaptation however, it doesn't change anything in the output.
I did try something different which gave me a different error. I reread the brick suggestion and noticed that you mentioned that it points to a file instead of a folder. I hadn't changed anything when adding the script to the toolbox in ArcGIS so the script still asked for a folder. I changed the Continous Raster into a multivalue type so that you can add the files in the model. This gave me the following result:

Executing: RSDMamsantacV2.2 "Aeshna_juncea_ExcelToTable Events" D:\INSECTENNETWERK\Environmental_factors_10m\afstand_tot_water_vol.tif;D:\INSECTENNETWERK\Environmental_factors_10m\ahn.tif;D:\INSECTENNETWERK\Environmental_factors_10m\ahn_frl_waterhoogte.tif;D:\INSECTENNETWERK\Environmental_factors_10m\geomofologische_kaart.tif;D:\INSECTENNETWERK\Environmental_factors_10m\isothermality.tif;D:\INSECTENNETWERK\Environmental_factors_10m\isothermality_frl.tif;D:\INSECTENNETWERK\Environmental_factors_10m\lgn7.tif;D:\INSECTENNETWERK\Environmental_factors_10m\regenval.tif;D:\INSECTENNETWERK\Environmental_factors_10m\soilmap_subgrp.tif;D:\INSECTENNETWERK\Environmental_factors_10m\vegetatie_totaal.tif;D:\INSECTENNETWERK\Environmental_factors_10m\waterindex.tif # maxent D:\INSECTENNETWERK\RSDMamsantac_Method\OuputRasters\Output.tiff D:\INSECTENNETWERK\RSDMamsantac_Method\OuputRasters\Table D:\INSECTENNETWERK\RSDMamsantac_Method\OuputRasters\shp.shp
Start Time: Mon Jan 6 09:35:50 2020
Running script RSDMamsantacV2.2...
Loading datasets...
Error in tool_exec : invalid 'path' argument
Failed to execute (RSDMamsantacV2.2).
Failed at Mon Jan 6 09:35:50 2020 (Elapsed Time: 0,15 seconds)

I have a feeling that this is going the right way, but that (since I changed an option in the script/tool) I need to adjust the code in R.
On the ArcGIS page I only found a code for arcpy.

@amsantac
Copy link
Author

amsantac commented Jan 6, 2020

Probably something was set up incorrectly in the ArcGIS tool. This may be just another error.

The initial error was due to a memory issue. I'd recommend to work in the R program, make sure that the script runs without problem and address the memory issue, if you are more interested in processing your data and getting results than in developing an ArcGIS tool.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment