Skip to content

Instantly share code, notes, and snippets.

@garyfeng
Last active January 13, 2016 17:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garyfeng/27e7f8e406192a8cb33a to your computer and use it in GitHub Desktop.
Save garyfeng/27e7f8e406192a8cb33a to your computer and use it in GitHub Desktop.
A function to back fill NAs in a vector with the last non-NA value
# A function to back fill NAs in a vector with the last non-NA value
# https://gist.github.com/garyfeng/27e7f8e406192a8cb33a
backFillNA<- function (x) {
nas<- which(is.na(x))
# trick from http://stackoverflow.com/questions/24837401/find-consecutive-values-in-vector-in-r
naList<-split(nas, cumsum(c(1, diff(nas) != 1)))
# get the backfill values
valueList<-lapply(naList, function(nalist) {
prev<- nalist[1]-1
#print(nalist)
if (prev<=0) prev <-1
x[nalist]<-x[prev]
return(x[nalist])
})
# now back fill
# can't use unlist(), as it converts everything to a common class.
# see http://stackoverflow.com/questions/13859905/returning-a-vector-of-class-posixct-with-vapply
#x[unlist(naList)] <- unlist(valueList)
x[unlist(naList)] <- do.call(c, valueList)
return (x)
}
x<-c("A","B",NA,NA,"C",NA,NA,NA,NA,"D",NA,NA);
#> x
# [1] "A" "B" NA NA "C" NA NA NA NA "D" NA NA
#> backFillNA(x)
# [1] "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D" "D"
########################
# The following recursive solution is taken from:
# https://stat.ethz.ch/pipermail/r-help/2008-July/169199.html
# it uses Recall(), R's method for recursive calling, to do back filling
# It works well for short vectors, but a test on a vector of 20k elements with many NAs
# and the recursion failed after a long while.
rna <- function(z) {
y <- c(NA, head(z, -1))
z <- ifelse(is.na(z), y, z)
if (any(is.na(z))) Recall(z) else z
}
#> x
# [1] "A" "B" NA NA "C" NA NA NA NA "D" NA NA
#> rna(x)
# [1] "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D" "D"
@garyfeng
Copy link
Author

garyfeng commented Jan 8, 2016

hmm... the performance of the non-recursive version may be a function of how many NAs there are -- more precisely, how many NA runs there are in the data. We need to test this.

@garyfeng
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment