Skip to content

Instantly share code, notes, and snippets.

@brodieG
Last active April 8, 2020 11:02
Show Gist options
  • Save brodieG/046e7cdd2acf42d95909 to your computer and use it in GitHub Desktop.
Save brodieG/046e7cdd2acf42d95909 to your computer and use it in GitHub Desktop.
Corner Cases With Non-Standard Evaluation in data.table
# Because there is no way to tell data.table
# "interpret this variable as a column name", it's possible to come up
# with corner cases. I'll grant these are unlikely to occur in day
# to day use, but any function that uses `data.table` must account for
# them
# Low odds, and yes, there are workarounds, but this is
# what I mean by you have to think carefully to avoid
# corner cases
# Ex 1
my.dt <- data.table(col=letters[1:5], col2=1:5)
fun <- mean
col <- "col2"
my.dt[, fun(get(col))]
# this one in particular very unlikely, but illustrating a point
# Ex 2
mtcars.dt <- data.table(mtcars)
mtcars.dt[,`cyl,am`:= 1]
grp <- "cyl,am"
mtcars.dt[,mean(hp), by=grp]
grp <- "`cyl,am`"
mtcars.dt[,mean(hp), by=grp]
# This one actually works fine, but again, you have to be careful
# by signaling your intent with an expression instead of a symbol
# name, which is not at all intuitive to anyone familiar with R.
# The `get` solution is internally consistent, at least, though
# with the collision issue I highlighted earlier
# Ex 3
cols <- c("hp", "mpg")
fun <- mean
(data.table(mtcars)[, cols:=lapply(.SD, fun), .SDcols=cols])
(data.table(mtcars)[, (cols):=lapply(.SD, fun), .SDcols=cols])
# Let's try to group by expressions (to be fair, you can't
# really do this with `dplyr`)
# Ex 4
exp <- list(a=quote(gear %% 2), b=quote(cut(hp, 5)))
data.table(mtcars)[, mean(mpg), by=list(a=gear %% 2, b=cut(hp, 5))]
data.table(mtcars)[, mean(mpg), by=exp] # argh
# Ex 5
group_by_exp <- function(exp)
data.table(mtcars)[, mean(mpg), by=eval(substitute(exp))]
group_by_exp(list(a=gear %% 2, b=cut(hp, 5))) # this kind of wokrs
# Ex 6
exp.q <- quote(list(a=gear %% 2, b=cut(hp, 5)))
group_by_exp(exp.q) # argh
group_by_exp2 <- function(exp)
data.table(mtcars)[, mean(mpg), by=eval(eval(substitute(exp)))]
group_by_exp2(exp.q) # now we're getting crazy...
data.table(mtcars)[, mean(mpg), by=exp.q] # this actually works!, but not documented
# Again, everyone one of these has workarounds, though they require
# some care. I'd like a version of `[.data.table` that allows me
# to very explicitly tell it how to interpret things so that I don't
# have to worry about funny corner cases due to the flexibility in
# data.table. Don't get me wrong, for the most part the flexibility
# is fantastic.
@brodieG
Copy link
Author

brodieG commented Nov 6, 2014

Also, one comment re 2, 3, and 4-6 that's worth highlighting. The workarounds are all different. For one we need to use get, for the other to use() or some such, and for the last we need to evaluate quoted expressions. By providing one SE version that handles all this stuff you greatly simplify the accessibility of use of data.table to programmers (as opposed to command line users).

@brodieG
Copy link
Author

brodieG commented Nov 6, 2014

Note: discussion is being continued on e-mail. Will report back with conclusions.

@wolkym
Copy link

wolkym commented Oct 6, 2015

Any news?

@jangorecki
Copy link

jangorecki commented Apr 8, 2020

AFAIU all those corner cases are addressed by Rdatatable/data.table#4304

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment