Skip to content

Instantly share code, notes, and snippets.

@arvi1000
Last active December 9, 2016 21:00
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arvi1000/498626a221e2ccdca8580e95ec68d2d2 to your computer and use it in GitHub Desktop.
Save arvi1000/498626a221e2ccdca8580e95ec68d2d2 to your computer and use it in GitHub Desktop.
A setcols function for data.table
# I love data.table, but I find the syntax for "mutating" columns a little clunky for such a common task
# I wonder if it would be useful to have a setcols() function?
# Takes advantage of data.table's pass-by-reference
library(data.table)
setcols <- function(my_dt, cols, my_fun) {
# validate inputs
stopifnot('data.table' %in% class(my_dt),
'character' == class(cols),
'function' == class(my_fun))
# apply my_fun to cols in my_dt
for(j in cols) set(my_dt, j = j, value = my_fun(my_dt[[j]]))
}
# Now you can do things like this.
# ...given a data.table of mixed types
dat <- data.table(letters = c('a', 'b', 'c', 'd', 'e'),
fruits = c('apple', 'banana', 'carrot', 'durian', 'elderberry'),
num1 = rnorm(5),
num2 = seq(10, 50, 10))
# ...change some data types
setcols(dat, c('letters', 'fruits'), as.factor)
# ...transform some numbers
setcols(dat, c('num1', 'num2'), function(x) x*2 + 5)
# This simple wrapper seems to stay idiomatic to data.table, which already has
# functions setDT and setnames which modify the data.table passed to them
@arvi1000
Copy link
Author

arvi1000 commented Dec 9, 2016

Not sure setcols is the right name. Maybe mutate_cols but that's dplyr talk. Seems like data.table needs something in the set[...] family

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment