Skip to content

Instantly share code, notes, and snippets.

@aaronschiff
Last active July 12, 2017 05:54
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aaronschiff/eb98a19790796202cad8f85541fc8ad1 to your computer and use it in GitHub Desktop.
Save aaronschiff/eb98a19790796202cad8f85541fc8ad1 to your computer and use it in GitHub Desktop.
R function to read CSV files exported from Statistics New Zealand's Infoshare system
# Helper function to read csv files exported from Infoshare
# Categories should be a list of names of the categorical variables to be created
read_infoshare <- function(filename, categories) {
num_categories <- length(categories)
dat <- read_csv(filename)
names(dat)[1] <- "date"
# Drop rows where all columns except the first column are NA, to remove junk at the bottom of the file
junk_rows <- apply(dat[, -1], 1, function(x) prod(is.na(x))) == 1
dat <- dat[!junk_rows, ]
# Fill blanks in rows with category names, except first (date) column
for (i in 1:num_categories) {
for (j in 2:ncol(dat)) {
if (is.na(dat[i, j])) dat[i, j] <- dat[i, j - 1]
}
}
# Name columns except date column by combining values in rows with category names, then drop those rows
for (i in 1:num_categories) {
if (i > 1) {
names(dat)[-1] <- paste0(names(dat)[-1],
":",
as.character(dat[i, -1]))
} else {
names(dat)[-1] <- as.character(dat[i, -1])
}
}
dat <- dat[(num_categories + 1):nrow(dat), ]
# Reshape to long form data
dat <- dat %>%
gather(key, value, -date) %>%
separate(key, into = categories, sep = ":", convert = TRUE)
# Convert .. to NA and convert value column to numeric
dat <- mutate(dat, value = as.numeric(ifelse(value == "..", NA, value)))
return(dat)
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment