Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
The function addNewData.R modifies a data frame with a lookup table. This is useful where you want to supplement data loaded from file with other data, e.g. to add details, change treatment names, or similar. The function readNewData is also included. This function runs some checks on the new table to ensure it has correct variable names and val…
##' Modifies 'data' by adding new values supplied in newDataFileName
##'
##' newDataFileName is expected to have columns
##' c(lookupVariable,lookupValue,newVariable,newValue,source)
##'
##' Within the column 'newVariable', replace values that
##' match 'lookupValue' within column 'lookupVariable' with the value
##' newValue'. If 'lookupVariable' is NA, then replace *all* elements
##' of 'newVariable' with the value 'newValue'.
##'
##' Note that lookupVariable can be the same as newVariable.
##'
##' @param newDataFileName name of lookup table
##' @param data existing data.frame
##' @param allowedVars vector of permissible variable names for newVariable
##' @return modified data.frame
addNewData <- function(newDataFileName, data, allowedVars){
import <- readNewData(newDataFileName, allowedVars)
if( !is.null(import)){
for(i in seq_len(nrow(import))){ #Make replacements
col.to <- import$newVariable[i]
col.from <- import$lookupVariable[i]
if(is.na(col.from)){ # apply to whole column
data[col.to] <- import$newValue[i]
} else { # apply to subset
rows <- data[[col.from]] == import$lookupValue[i]
data[rows,col.to] <- import$newValue[i]
}
}
}
data
}
##' Utility function to read/process newDataFileName for addNewData
##'
##' @param newDataFileName name of lookup table
##' @param allowedVars vector of permissible variable names for newVariable
##' @return data.frame with columns c(lookupVariable,lookupValue,newVariable,newValue,source)
readNewData <- function(newDataFileName, allowedVars){
if( file.exists(newDataFileName)){
import <- read.csv(newDataFileName, header=TRUE, stringsAsFactors=FALSE,
strip.white=TRUE)
if( nrow(import)> 0 ){
#Check columns names for import are right
expectedColumns<- c("lookupVariable","lookupValue","newVariable","newValue")
nameIsOK <- expectedColumns %in% names(import)
if(any(!nameIsOK))
stop("Incorrect name in lookup table for ",
newDataFileName, "--> ", paste(expectedColumns[!nameIsOK],
collapse=", "))
#Check values of newVariable are in list of allowed variables
import$lookupVariable[import$lookupVariable == ""] <- NA
nameIsOK <- import$newVariable %in% allowedVars
if(any(!nameIsOK))
stop("Incorrect name(s) in newVariable column of ",
newDataFileName, "--> ", paste(import$newVariable[!nameIsOK],
collapse=", "))
} else {
import <- NULL
}
} else {
import <- NULL
}
import
}
lookupVariable lookupValue newVariable newValue source
id 1 species Banksia oblongifolia Daniel Falster
id 2 species Banksia ericifolia Daniel Falster
id 3 species Banksia serrata Daniel Falster
id 4 species Banksia grandis Daniel Falster
family Proteaceae Daniel Falster
location NSW Daniel Falster
id 4 location WA Daniel Falster

To install the code, do one of the following:

  1. Download it then load with source(addNewData.R)
  2. Load it directly from github
library(devtools, quietly=TRUE)
source_gist("https://gist.github.com/dfalster/5589956")

To use it, simply type

addNewData("dataNew.csv", myData, allowedVars)

where allowedVars is vector of permissible variable names for the columns of myData.

@sacapunta

This comment has been minimized.

Copy link

sacapunta commented Feb 11, 2016

thanks for creating this! i'm having an issue where the lookup table is not working correctly if the source data in the "lookupValue" vector ends with a space. i.e. it'll correctly function when looking up "source data" but not "source data ".

@stuartE9

This comment has been minimized.

Copy link

stuartE9 commented Aug 12, 2016

very helpful - thanks for sharing.

@marejp

This comment has been minimized.

Copy link

marejp commented Jul 10, 2017

Thanks for sharing Daniel, using it a lot.

@bbroke

This comment has been minimized.

Copy link

bbroke commented Sep 18, 2017

thank you -- really helpful!

@nrkoehler

This comment has been minimized.

Copy link

nrkoehler commented Oct 25, 2017

Thank you - it is of great help!

@heffaywrit

This comment has been minimized.

Copy link

heffaywrit commented Nov 27, 2017

Firstly, thanks for this apparently great code. I can see it helping my present efforts. Unfortunately...

When I type:

library(devtools, quietly=TRUE)
source_gist("https://gist.github.com/dfalster/5589956")

I get an error message:

Error in r_files[[which]] : invalid subscript type 'closure'

Any tips?

 

@MaccoP

This comment has been minimized.

Copy link

MaccoP commented Dec 19, 2017

Hi, been trying to get the code to work but keep getting the error message:
Incorrect name in lookup table for Brand_Lookup.csv--> lookupVariable, lookupValue, newVariable, newValue

I am fairly new to R but believe it may be caused by my csv being saved from Excel and as non-English native my regional settings make my delimiter a ; instead of ,

May this be the cause and if so, what is the best way around it?

@psychemistz

This comment has been minimized.

Copy link

psychemistz commented Apr 16, 2018

Hi, for those who have problem in source_gist, try this.
devtools::source_gist("5589956", filename = "addNewData.R")

@martungel

This comment has been minimized.

Copy link

martungel commented Apr 15, 2019

Hi there, thank you very much for this very useful script!
For the French users - it is important to have your dataNew.csv file comma delimited (sep = ",") and not semi-column delimited (sep = ";") which is the default for French System Excel.

Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.