Skip to content

Instantly share code, notes, and snippets.

@arunsrinivasan
Last active August 29, 2015 14:10
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save arunsrinivasan/b14441829e7f484fdf58 to your computer and use it in GitHub Desktop.
Save arunsrinivasan/b14441829e7f484fdf58 to your computer and use it in GitHub Desktop.
dplyr join question

Suppose I've two data.frames DF1 and DF2 as shown below:

require(dplyr)
set.seed(1L)
DF1 = data.frame(x=sample(3,10,TRUE), y1=1:10, y2=11:20)
#     x y1 y2
#  1: 1  1 11
#  2: 1  5 15
#  3: 1 10 20
#  4: 2  2 12
#  5: 2  3 13
#  6: 2  8 18
#  7: 2  9 19
#  8: 3  4 14
#  9: 3  6 16
# 10: 3  7 17

DF2 = data.frame(x=2:1, mult=runif(2))
#   x      mult
# 1 2 0.2059746
# 2 1 0.1765568

What I'd like to get is sum(y1)*mult and sum(y2)*mult for each x in DF2.

Here's my attempt:

ans = left_join(DF2, DF1) %>% 
        group_by(x) %>%
        summarise(y1 = sum(y1)*mult[1L], y2 = sum(y2)*mult[1L])
ans
# Source: local data frame [2 x 3]
# 
#   x       y1        y2
# 1 1 2.824908  8.121611
# 2 2 4.531441 12.770424

My question:

  • Is there a simpler (and maybe better) way to do this? For ex: using summarise_each()?

Note that I don't want to do DF1[DF1$x %in% DF2$x, ], as in the real scenario, there may be more columns to join by - where a subset like this would not be practical.

@hadley
Copy link

hadley commented Nov 25, 2014

Two other approaches. Use summarise_each():

DF2 %>% 
  left_join(DF1) %>% 
  group_by(x) %>%
  summarise_each(funs(sum(.) * mult[1L]), y1:y2)

summarise() then join: (longer, but possibly easier to follow)

DF1 %>% 
  group_by(x) %>%
  summarise_each(funs(sum)) %>%
  inner_join(DF2) %>%
  mutate_each(funs(. * mult), y1:y2)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment