Skip to content

Instantly share code, notes, and snippets.

@timmyshen
Created October 7, 2013 19:08
Show Gist options
  • Save timmyshen/6873215 to your computer and use it in GitHub Desktop.
Save timmyshen/6873215 to your computer and use it in GitHub Desktop.
Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet th…
corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating
## the location of the CSV files
## 'threshold' is a numeric vector of length 1 indicating the
## number of completely observed observations (on all
## variables) required to compute the correlation between
## nitrate and sulfate; the default is 0
## Return a numeric vector of correlations
source(file='complete.R')
complete.df <- complete(directory)
id.greater.thresh <- complete.df$id[complete.df$nobs > threshold]
output <- vector(mode='numeric')
for (i in id.greater.thresh) {
data <- getmonitor(id=i, directory=directory)
cor.one.monitor <- cor(x=data$sulfate, y=data$nitrate, use="complete.obs")
output <- c(output, cor.one.monitor)
}
output
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment