Skip to content

Instantly share code, notes, and snippets.

@jknowles
Created January 19, 2012 23:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jknowles/1643635 to your computer and use it in GitHub Desktop.
Save jknowles/1643635 to your computer and use it in GitHub Desktop.
Functions for figuring out where user specified thresholds are met in a vector using R
#######################################
# Example data
######################################
testdata<-replicate(10, rpois(100, 20))
###########################################
# This function tells us how far we have to
# go before reaching a cutoff in a variable
# by sorting the vector, then finding how far
# to go. Note that the cutoff is expressed in
# percentage terms
############################################
fixcumsum<-function(x,cutoff){
x<-x[order(-x)] #sort vector descending
xb<-cumsum(x) # take cumulative sum
xc<-xb/sum(x,na.rm=T) #express proportionally
length(xc[xc<cutoff]) #count number of items until
# threshhold is exceeded
}
##########################################
# This function allows us to see what %
# of observations are found at a given threshold
# again expressed in percentage terms.
############################################
cutoff<-function(x,thresh){
#x is the column or variable
#thresh is the number to count to
x<-x[order(-x)] # sort vector descending
xb<-cumsum(x) # take cumulative sum (order matter)
xc<-xb/sum(x,na.rm=T) # express proportionally
xc[thresh] # report cumulative percentage at given threshold
}
###############################
# Now we have to simply apply
# over a data element such as a
# matrix or a dataframe
###############################
#here we apply the object to columns we specify that apply
#(data, 2, function, function var)
#where 2 tells R to go column-wise, which is appropriate for
# this data shape, and we specify function parameters after we
# tell R the function
thresh3<-apply(testdata,2,cutoff,thresh=3)
thresh5<-apply(testdata,2,cutoff,thresh=5)
thresh10<-apply(testdata,2,cutoff,thresh=10)
# we store these as variables because a vector is produced that
# shows us the value for each function that was applied to
cutoffs<-cbind(thresh3,thresh5,thresh10)
# We combine these into a data frame
# We make them into % for Excel purposes (not necessary)
cutoffs<-cutoffs*100
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment