{{ message }}

Instantly share code, notes, and snippets.

arunsrinivasan/dplyr_complex_join.md

Last active Aug 29, 2015
dplyr join question

Suppose I've two `data.frame`s `DF1` and `DF2` as shown below:

```require(dplyr)
set.seed(1L)
DF1 = data.frame(x=sample(3,10,TRUE), y1=1:10, y2=11:20)
#     x y1 y2
#  1: 1  1 11
#  2: 1  5 15
#  3: 1 10 20
#  4: 2  2 12
#  5: 2  3 13
#  6: 2  8 18
#  7: 2  9 19
#  8: 3  4 14
#  9: 3  6 16
# 10: 3  7 17

DF2 = data.frame(x=2:1, mult=runif(2))
#   x      mult
# 1 2 0.2059746
# 2 1 0.1765568```

What I'd like to get is `sum(y1)*mult` and `sum(y2)*mult` for each `x` in `DF2`.

Here's my attempt:

```ans = left_join(DF2, DF1) %>%
group_by(x) %>%
summarise(y1 = sum(y1)*mult[1L], y2 = sum(y2)*mult[1L])
ans
# Source: local data frame [2 x 3]
#
#   x       y1        y2
# 1 1 2.824908  8.121611
# 2 2 4.531441 12.770424```

My question:

• Is there a simpler (and maybe better) way to do this? For ex: using `summarise_each()`?

Note that I don't want to do `DF1[DF1\$x %in% DF2\$x, ]`, as in the real scenario, there may be more columns to join by - where a subset like this would not be practical.

Two other approaches. Use `summarise_each()`:

```DF2 %>%
left_join(DF1) %>%
group_by(x) %>%
summarise_each(funs(sum(.) * mult[1L]), y1:y2)```

`summarise()` then join: (longer, but possibly easier to follow)

```DF1 %>%
group_by(x) %>%
summarise_each(funs(sum)) %>%
inner_join(DF2) %>%
mutate_each(funs(. * mult), y1:y2)```