Skip to content

Instantly share code, notes, and snippets.

@MattSandy
Last active March 5, 2016 03:42
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save MattSandy/e380c05b909783a2d72d to your computer and use it in GitHub Desktop.
Save MattSandy/e380c05b909783a2d72d to your computer and use it in GitHub Desktop.
Creates a wide format for matches
# How to run the script
# rscript csv-wide.R "~/path/to/in.csv" "~/path/to/out.csv" column_name_for_matches
args = commandArgs(trailingOnly=TRUE)
match_column <- args[3]
max_width <- 0
csv <- read.csv(args[1], header = T, sep = ",", quote = "\"",
stringsAsFactors = F, encoding="UTF-8", fill=T)
csv[[match_column]] <- as.character(csv[[match_column]])
#grabs the max length of columns that will exist
for(needle in unique(csv[[match_column]])) {
if(as.numeric(length(unlist(csv[which(csv[[match_column]]==needle),])))>max_width) {
max_width <- as.numeric(length(unlist(csv[which(csv[[match_column]]==needle),names(csv)!=match_column])))
}
}
#creates empty matrix
formatted <- matrix("", nrow = length(unique(csv[[match_column]])), ncol = max_width)
#names rows based on the search match
row.names(formatted) <- unique(csv[[match_column]])
#populates the matrix
for(needle in unique(csv[[match_column]])) {
row <- as.vector(apply(csv[which(csv[[match_column]]==needle),names(csv)!=match_column],1,unlist))
row <- append(row,rep(NA,max_width-length(row)))
formatted[needle,] <- row
}
write.csv(formatted, file=args[2], row.names = TRUE, na="")
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment