Skip to content

Instantly share code, notes, and snippets.

@glycerine
Created October 19, 2010 10:42
Show Gist options
  • Save glycerine/633993 to your computer and use it in GitHub Desktop.
Save glycerine/633993 to your computer and use it in GitHub Desktop.
just a couple of hints as you are learning the language
I do very little with apply. If necessary to process a matrix in row form, as this seems to be doing, I would define a function that did the row transformation, then apply it once.
sort takes the very useful additional parameter index=T, which will return the sorted order, obviating the need to do separate ranking.
For really sophisticated searching of sorted things, there is findInterval, which does binary search. It took me a while to find it in R. But it's incredibly nice. Here's an example of using sort (with i=T), and findInterval together.
# si.sample() :
# put a arbitrarily sampled sequence of events into a regularly sampled line,
# sampled at regular si intervals.
# IN: assumes you have just events-points in tm1 and event values (reals) in x1.
#
si.sample=function(x1,tm1, si=1, t0=floor(min(tm1)), tend=ceiling(max(tm1)), fill.na=T) {
if (length(tm1) != length(x1)) stop("x1 and tm1 must be the same length!")
# insure we are sorted by tm1
s=sort(tm1,i=T)
tm1=s$x
x1=x1[s$ix]
r=list()
N = ceiling((tend - t0)/si) +1
tm.new = t0 + (0:(N-1))*si
a=rep(NA,N)
# binary search using findInterval which defaults to half-open on the right [a,b) intervals... but since we want
# half-open on the left (a,b] intervals, we need to:
# negate time, apply findInterval, un-negate time to get correct behavior at the interval boundaries.
fi=findInterval(sort(-tm1), sort(-tm.new),rightmost.closed=T) # negate time and apply findInterval
fi2=1+length(tm.new)-rev(fi) # un-negating the time flip, to get correct boundary behavior.
fi2[fi2>(length(tm.new))]=1 # assign any time points past tend values back to bucket 1 to get NA-ed as well during a[1]=NA
a[fi2]=x1 # correctly overwritting with later values if multiple values per interval.
a[1]=NA # bucket that collects pre- t0 samples at t0 itself in tm.new; also gets samples post tend.
if (fill.na) {
r$v = fill.na.with.most.recently.seen(a)
} else {
r$v = a
}
r$tm = tm.new
class(r)="regular.sample"
r
}
# and it uses this
fill.na.with.most.recently.seen=function(x) {
lastseen = x[1]
for (i in 2:length(x)) {
if (is.na(x[i])) {
x[i]=lastseen
} else {
lastseen = x[i]
}
}
x
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment