Skip to content

Instantly share code, notes, and snippets.

@hakimabdi hakimabdi/MachineLearningScript.R Secret
Last active Aug 30, 2019

Embed
What would you like to do?
#####################################################################################################
# title : Machine learning exercise for Sentinel-2 data
# purpose : Train students on implementing a machine learning workflow in R
# author : Abdulhakim M. Abdi (@HakimAbdi / www.hakimabdi.com)
# input : A multi-temporal raster stack of Sentinel-2 data comprising scenes from four dates
# output : One classified land cover map from each of three machine learning algorithms
# Note 1 : This brief tutorial assumes that you are already well-grounded in R concepts and are
# : familiar with image classification procedure and terminology
# Reference : Please refer to Abdi (2019) "Land cover and land use classification performance of machine learning
# algorithms in a boreal landscape using Sentinel-2 data" in GIScience & Remote Sensing for help with the terminology and basic concepts.
# Reference URL : https://www.tandfonline.com/doi/full/10.1080/15481603.2019.1650447
#####################################################################################################
# set working directory. This will be the directory where all the unzipped files are located
setwd("C:/Users/Hakim/")
# load required libraries
library(raster) # for the manipulation of raster data
library(caret) # for the machine learning algorithms
library(sp) # for the manipulation of spatial objects
library(nnet) # Artificial Neural Network
library(randomForest) # Random Forest
library(kernlab) # Support Vector Machines
# Load the Sentinel-2 stack of the study area
s2data = stack("S2StackSmall.tif")
# Name the layers of the Sentinel-2 stack based on previously saved information
names(s2data) = as.character(read.csv("S2StackSmall_Names.csv")[,1])
# Load the sample data
# Alternatively, you can use the supplied orthophotos to generate a new set of training and validation data
# Your samples layer must have a column for each image in the raster stack, a column for the land cover class that point represents, an X and Y column
# You can create such a sample file using QGIS or another GIS software
samples = read.csv("Samples.csv")
# Split the data frame into 70-30 by class
trainx = list(0)
evalx = list(0)
for (i in 1:8){ # loop through all eight classes
cls = samples[samples$class == i,]
smpl <- floor(0.70 * nrow(cls))
tt <- sample(seq_len(nrow(cls)), size = smpl)
trainx[[i]] <- cls[tt,]
evalx[[i]] <- cls[-tt,]
}
# combine them all into training and evaluation data frames
trn = do.call(rbind, trainx)
eva = do.call(rbind, evalx)
# Set up a resampling method in the model training process
tc <- trainControl(method = "repeatedcv", # repeated cross-validation of the training data
number = 10, # number of folds
repeats = 5) # number of repeats
# Generate a grid search of candidate hyper-parameter values for inclusion into the models training
# These hyper-parameter values are examples. You will need a more complex tuning process to achieve high accuracies
nnet.grid = expand.grid(size = seq(from = 2, to = 10, by = 2), # number of neurons units in the hidden layer
decay = seq(from = 0.1, to = 0.5, by = 0.1)) # regularization parameter to avoid over-fitting
rf.grid <- expand.grid(mtry=1:20) # number of variables available for splitting at each tree node
svm.grid <- expand.grid(sigma=seq(from = 0.01, to = 0.10, by = 0.02), # controls for non-linearity in the hyperplane
C=seq(from = 2, to = 10, by = 2)) # controls the influence of each support vector
## Begin training the models. This will take about 6 minutes for all three algorithms
# Run the neural network model
nnet_model <- caret::train(x = trn[,(5:ncol(trn)-1)], y = as.factor(as.integer(as.factor(trn$class))),
method = "nnet", metric="Accuracy", trainControl = tc, tuneGrid = nnet.grid)
# Run the random forest model
rf_model <- caret::train(x = trn[,(5:ncol(trn)-1)], y = as.factor(as.integer(as.factor(trn$class))),
method = "rf", metric="Accuracy", trainControl = tc, tuneGrid = rf.grid)
# Run the support vector machines model
svm_model <- caret::train(x = trn[,(5:ncol(trn)-1)], y = as.factor(as.integer(as.factor(trn$class))),
method = "svmRadialSigma", metric="Accuracy", trainControl = tc, tuneGrid = svm.grid)
## Apply the models to data
# Apply the neural network model to the Sentinel-2 data
nnet_prediction = raster::predict(s2data, model=nnet_model)
# Apply the random forest model to the Sentinel-2 data
rf_prediction = raster::predict(s2data, model=rf_model)
# Apply the support vector machines model to the Sentinel-2 data
svm_prediction = raster::predict(s2data, model=svm_model)
# Convert the evaluation data into a spatial object using the X and Y coordinates and extract predicted values
eva.sp = SpatialPointsDataFrame(coords = cbind(eva$x, eva$y), data = eva,
proj4string = crs("+proj=utm +zone=33 +datum=WGS84 +units=m +no_defs +ellps=WGS84 +towgs84=0,0,0"))
## Superimpose evaluation points on the predicted classification and extract the values
# neural network
nnet_Eval = extract(nnet_prediction, eva.sp)
# random forest
rf_Eval = extract(rf_prediction, eva.sp)
# support vector machines
svm_Eval = extract((svm_prediction), eva.sp)
# Create an error matrix for each of the classifiers
nnet_errorM = confusionMatrix(as.factor(nnet_Eval),as.factor(eva$class))
rf_errorM = confusionMatrix(as.factor(rf_Eval),as.factor(eva$class))
svm_errorM = confusionMatrix(as.factor(svm_Eval),as.factor(eva$class))
# Plot the results next to one another along with the 2018 NMD dataset for comparison
nmd2018 = raster("NMD_S2Small.tif") # load NMD dataset (Nationella Marktäckedata, Swedish National Land Cover Dataset)
crs(nmd2018) <- crs(nnet_prediction) # Correct the coordinate reference system so it matches with the rest
rstack = stack(nmd2018, nnet_prediction, rf_prediction, svm_prediction) # combine the layers into one stack
names(rstack) = c("NMD 2018", "Single Layer Neural Network", "Random Forest", "Support Vector Machines") # name the stack
plot(rstack) # plot it!
# Congratulations! You conducted your first machine learning classification.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.