Skip to content

Instantly share code, notes, and snippets.

@aaronwolen
Last active August 29, 2015 14:07
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save aaronwolen/13eca5b969689c6a2bed to your computer and use it in GitHub Desktop.
Save aaronwolen/13eca5b969689c6a2bed to your computer and use it in GitHub Desktop.
Extract and append multiple values embedded in rows
# Extract and append multiple values embedded in rows
#
# data: data.frame
# col: column name containing embedded values
# sep: regular expression to split column by
#
# df <- data.frame(key = c("a", "a;b", "a;b;c"), val = 1:3)
# unembed(df, "key", ";")
unembed <- function(data, col, sep, ...) {
stopifnot(is.data.frame(data))
col_i <- which(names(data) == col)
data[[col]] <- as.character(data[[col]])
pieces <- strsplit(data[[col]], sep, ...)
ns <- vapply(pieces, length, integer(1))
structure(data.frame(unlist(pieces),
data[rep(seq_along(ns), ns), -col_i]),
names = c(col, names(data)[-col_i]))
}
@mdozmorov
Copy link

stringsAsFactors bites. Recommend adding:

structure(data.frame(unlist(pieces),
data[rep(seq_along(ns), ns), -col_i], stringsAsFactors = FALSE),
names = c(col, names(data)[-col_i]))

@aaronwolen
Copy link
Author

The only downside to that approach is the columns you might actually want to remain factors would silently be converted to characters. I usually add options(stringsAsFactors = FALSE) to the top of my script to prevent this.

@aaronwolen
Copy link
Author

Actually, better idea: I just forced col to be a character vector before splitting. Does that help?

@mdozmorov
Copy link

It does help, and the function is great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment