Skip to content

Instantly share code, notes, and snippets.

@dfalster
Last active February 19, 2023 00:29
Show Gist options
  • Save dfalster/5589956 to your computer and use it in GitHub Desktop.
Save dfalster/5589956 to your computer and use it in GitHub Desktop.
The function addNewData.R modifies a data frame with a lookup table. This is useful where you want to supplement data loaded from file with other data, e.g. to add details, change treatment names, or similar. The function readNewData is also included. This function runs some checks on the new table to ensure it has correct variable names and val…
##' Modifies 'data' by adding new values supplied in newDataFileName
##'
##' newDataFileName is expected to have columns
##' c(lookupVariable,lookupValue,newVariable,newValue,source)
##'
##' Within the column 'newVariable', replace values that
##' match 'lookupValue' within column 'lookupVariable' with the value
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements
##' of 'newVariable' with the value 'newValue'.
##'
##' Note that lookupVariable can be the same as newVariable.
##'
##' @param newDataFileName name of lookup table
##' @param data existing data.frame
##' @param allowedVars vector of permissible variable names for newVariable
##' @return modified data.frame
addNewData <- function(newDataFileName, data, allowedVars){
import <- readNewData(newDataFileName, allowedVars)
if( !is.null(import)){
for(i in seq_len(nrow(import))){ #Make replacements
col.to <- import$newVariable[i]
col.from <- import$lookupVariable[i]
if(is.na(col.from)){ # apply to whole column
data[col.to] <- import$newValue[i]
} else { # apply to subset
rows <- data[[col.from]] == import$lookupValue[i]
data[rows,col.to] <- import$newValue[i]
}
}
}
data
}
##' Utility function to read/process newDataFileName for addNewData
##'
##' @param newDataFileName name of lookup table
##' @param allowedVars vector of permissible variable names for newVariable
##' @return data.frame with columns c(lookupVariable,lookupValue,newVariable,newValue,source)
readNewData <- function(newDataFileName, allowedVars){
if( file.exists(newDataFileName)){
import <- read.csv(newDataFileName, header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)
if( nrow(import)> 0 ){
#Check columns names for import are right
expectedColumns<- c("lookupVariable","lookupValue","newVariable","newValue")
nameIsOK <- expectedColumns %in% names(import)
if(any(!nameIsOK))
stop("Incorrect name in lookup table for ",
newDataFileName, "--> ", paste(expectedColumns[!nameIsOK],
collapse=", "))
#Check values of newVariable are in list of allowed variables
import$lookupVariable[import$lookupVariable == ""] <- NA
nameIsOK <- import$newVariable %in% allowedVars
if(any(!nameIsOK))
stop("Incorrect name(s) in newVariable column of ",
newDataFileName, "--> ", paste(import$newVariable[!nameIsOK],
collapse=", "))
} else {
import <- NULL
}
} else {
import <- NULL
}
import
}
lookupVariable lookupValue newVariable newValue source
id 1 species Banksia oblongifolia Daniel Falster
id 2 species Banksia ericifolia Daniel Falster
id 3 species Banksia serrata Daniel Falster
id 4 species Banksia grandis Daniel Falster
family Proteaceae Daniel Falster
location NSW Daniel Falster
id 4 location WA Daniel Falster

To install the code, do one of the following:

  1. Download it then load with source(addNewData.R)
  2. Load it directly from github
library(devtools, quietly=TRUE)
source_gist("https://gist.github.com/dfalster/5589956")

To use it, simply type

addNewData("dataNew.csv", myData, allowedVars)

where allowedVars is vector of permissible variable names for the columns of myData.

@sacapunta
Copy link

thanks for creating this! i'm having an issue where the lookup table is not working correctly if the source data in the "lookupValue" vector ends with a space. i.e. it'll correctly function when looking up "source data" but not "source data ".

@stuartE9
Copy link

very helpful - thanks for sharing.

@marejp
Copy link

marejp commented Jul 10, 2017

Thanks for sharing Daniel, using it a lot.

@bbroke
Copy link

bbroke commented Sep 18, 2017

thank you -- really helpful!

@nrkoehler
Copy link

Thank you - it is of great help!

@jeffreyawright
Copy link

Firstly, thanks for this apparently great code. I can see it helping my present efforts. Unfortunately...

When I type:

library(devtools, quietly=TRUE)
source_gist("https://gist.github.com/dfalster/5589956")

I get an error message:

Error in r_files[[which]] : invalid subscript type 'closure'

Any tips?

 

@MaccoP
Copy link

MaccoP commented Dec 19, 2017

Hi, been trying to get the code to work but keep getting the error message:
Incorrect name in lookup table for Brand_Lookup.csv--> lookupVariable, lookupValue, newVariable, newValue

I am fairly new to R but believe it may be caused by my csv being saved from Excel and as non-English native my regional settings make my delimiter a ; instead of ,

May this be the cause and if so, what is the best way around it?

@psychemistz
Copy link

Hi, for those who have problem in source_gist, try this.
devtools::source_gist("5589956", filename = "addNewData.R")

@martungel
Copy link

Hi there, thank you very much for this very useful script!
For the French users - it is important to have your dataNew.csv file comma delimited (sep = ",") and not semi-column delimited (sep = ";") which is the default for French System Excel.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment