Skip to content

Instantly share code, notes, and snippets.

@tomhopper
Forked from dfalster/addNewData.R
Created October 9, 2016 00:12
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save tomhopper/9551f76bb0aa9191fee0c2e22cc64add to your computer and use it in GitHub Desktop.
Save tomhopper/9551f76bb0aa9191fee0c2e22cc64add to your computer and use it in GitHub Desktop.
The function addNewData.R modifies a data frame with a lookup table. This is useful where you want to supplement data loaded from file with other data, e.g. to add details, change treatment names, or similar. The function readNewData is also included. This function runs some checks on the new table to ensure it has correct variable names and val…
##' Modifies 'data' by adding new values supplied in newDataFileName
##'
##' newDataFileName is expected to have columns
##' c(lookupVariable,lookupValue,newVariable,newValue,source)
##'
##' Within the column 'newVariable', replace values that
##' match 'lookupValue' within column 'lookupVariable' with the value
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements
##' of 'newVariable' with the value 'newValue'.
##'
##' Note that lookupVariable can be the same as newVariable.
##'
##' @param newDataFileName name of lookup table
##' @param data existing data.frame
##' @param allowedVars vector of permissible variable names for newVariable
##' @return modified data.frame
addNewData <- function(newDataFileName, data, allowedVars){
import <- readNewData(newDataFileName, allowedVars)
if( !is.null(import)){
for(i in seq_len(nrow(import))){ #Make replacements
col.to <- import$newVariable[i]
col.from <- import$lookupVariable[i]
if(is.na(col.from)){ # apply to whole column
data[col.to] <- import$newValue[i]
} else { # apply to subset
rows <- data[[col.from]] == import$lookupValue[i]
data[rows,col.to] <- import$newValue[i]
}
}
}
data
}
##' Utility function to read/process newDataFileName for addNewData
##'
##' @param newDataFileName name of lookup table
##' @param allowedVars vector of permissible variable names for newVariable
##' @return data.frame with columns c(lookupVariable,lookupValue,newVariable,newValue,source)
readNewData <- function(newDataFileName, allowedVars){
if( file.exists(newDataFileName)){
import <- read.csv(newDataFileName, header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)
if( nrow(import)> 0 ){
#Check columns names for import are right
expectedColumns<- c("lookupVariable","lookupValue","newVariable","newValue")
nameIsOK <- expectedColumns %in% names(import)
if(any(!nameIsOK))
stop("Incorrect name in lookup table for ",
newDataFileName, "--> ", paste(expectedColumns[!nameIsOK],
collapse=", "))
#Check values of newVariable are in list of allowed variables
import$lookupVariable[import$lookupVariable == ""] <- NA
nameIsOK <- import$newVariable %in% allowedVars
if(any(!nameIsOK))
stop("Incorrect name(s) in newVariable column of ",
newDataFileName, "--> ", paste(import$newVariable[!nameIsOK],
collapse=", "))
} else {
import <- NULL
}
} else {
import <- NULL
}
import
}
lookupVariable lookupValue newVariable newValue source
id 1 species Banksia oblongifolia Daniel Falster
id 2 species Banksia ericifolia Daniel Falster
id 3 species Banksia serrata Daniel Falster
id 4 species Banksia grandis Daniel Falster
family Proteaceae Daniel Falster
location NSW Daniel Falster
id 4 location WA Daniel Falster

To install the code, do one of the following:

  1. Download it then load with source(addNewData.R)
  2. Load it directly from github
library(devtools, quietly=TRUE)
source_gist("https://gist.github.com/dfalster/5589956")

To use it, simply type

addNewData("dataNew.csv", myData, allowedVars)

where allowedVars is vector of permissible variable names for the columns of myData.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment