Skip to content

Instantly share code, notes, and snippets.

@garyfeng
Last active January 13, 2016 17:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save garyfeng/27e7f8e406192a8cb33a to your computer and use it in GitHub Desktop.
Save garyfeng/27e7f8e406192a8cb33a to your computer and use it in GitHub Desktop.
A function to back fill NAs in a vector with the last non-NA value
# A function to back fill NAs in a vector with the last non-NA value
# https://gist.github.com/garyfeng/27e7f8e406192a8cb33a
backFillNA<- function (x) {
nas<- which(is.na(x))
# trick from http://stackoverflow.com/questions/24837401/find-consecutive-values-in-vector-in-r
naList<-split(nas, cumsum(c(1, diff(nas) != 1)))
# get the backfill values
valueList<-lapply(naList, function(nalist) {
prev<- nalist[1]-1
#print(nalist)
if (prev<=0) prev <-1
x[nalist]<-x[prev]
return(x[nalist])
})
# now back fill
# can't use unlist(), as it converts everything to a common class.
# see http://stackoverflow.com/questions/13859905/returning-a-vector-of-class-posixct-with-vapply
#x[unlist(naList)] <- unlist(valueList)
x[unlist(naList)] <- do.call(c, valueList)
return (x)
}
x<-c("A","B",NA,NA,"C",NA,NA,NA,NA,"D",NA,NA);
#> x
# [1] "A" "B" NA NA "C" NA NA NA NA "D" NA NA
#> backFillNA(x)
# [1] "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D" "D"
########################
# The following recursive solution is taken from:
# https://stat.ethz.ch/pipermail/r-help/2008-July/169199.html
# it uses Recall(), R's method for recursive calling, to do back filling
# It works well for short vectors, but a test on a vector of 20k elements with many NAs
# and the recursion failed after a long while.
rna <- function(z) {
y <- c(NA, head(z, -1))
z <- ifelse(is.na(z), y, z)
if (any(is.na(z))) Recall(z) else z
}
#> x
# [1] "A" "B" NA NA "C" NA NA NA NA "D" NA NA
#> rna(x)
# [1] "A" "B" "B" "B" "C" "C" "C" "C" "C" "D" "D" "D"
@garyfeng
Copy link
Author

Tried this with a vector with 29K elements, and recursion was too deep.

Thinking a non-recursive approach:

backFillNA<- function (x) {
nas<- which(is.na(x))

naList<-split(nas, cumsum(c(1, diff(nas) != 1)))

valueList<-lapply(naList, function(nalist) {
  prev<- nalist[1]-1
  #print(nalist)
  if (prev<=0) prev <-0
  # here we assume x is in the global
  x[nalist]<-x[prev]
  return(x[nalist])
})

x[unlist(naList)] <- unlist(valueList)
}

@garyfeng
Copy link
Author

garyfeng commented Jan 8, 2016

Compared the non-recursive version with a simple for loop based method:

loopFillNA <- function(x) {
 lastV <- x[1]
 for (i in 1:length(x)) {
   if(is.na(x[i])) x[i] <- lastV else lastV <- x[i]
  }
 x
}

We have a large numeric vector t0 with mostly NAs.

Testing results:

> system.time(loopFillNA(t0[1:10000]))
   user  system elapsed 
  1.366   0.977   2.344 
> system.time(rna(t0[1:10000]))
   user  system elapsed 
  0.961   0.238   1.198 
> system.time(backFillNA(t0[1:10000]))
   user  system elapsed 
  0.014   0.001   0.015 

@garyfeng
Copy link
Author

garyfeng commented Jan 8, 2016

hmm... the performance of the non-recursive version may be a function of how many NAs there are -- more precisely, how many NA runs there are in the data. We need to test this.

@garyfeng
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment