Skip to content

Instantly share code, notes, and snippets.

@jennybc
Last active August 8, 2016 03:26
Show Gist options
  • Save jennybc/f40df6eb1d24e1561021 to your computer and use it in GitHub Desktop.
Save jennybc/f40df6eb1d24e1561021 to your computer and use it in GitHub Desktop.
Row bind a list of data.frames with a key

Row bind a list of data.frames with a key

Jenny Bryan
22 August, 2014

I posed a question on Twitter (click to see the figure!):

What is most elegant (d)plyr-ish way to do? #dplyr @hadleywickham pic.twitter.com/kXsfkq7Rkq

— Jennifer Bryan (@JennyBryan) August 22, 2014

that boiled down to this: you have a list of data.frames and the element names convey information. You want to row bind them together and, in the new data.frame, you want a variable for the list element each observation originated in.

I got back an embarrassment of riches, which I'll record here.

Problem example

First, make an appropriate list of data.frames from the iris data. Note the Species information is carried only in the list names.

(my_list <-
   lapply(split(subset(iris, select = -Species), iris$Species), "[", 1:2, ))
## $setosa
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 
## $versicolor
##    Sepal.Length Sepal.Width Petal.Length Petal.Width
## 51          7.0         3.2          4.7         1.4
## 52          6.4         3.2          4.5         1.5
## 
## $virginica
##     Sepal.Length Sepal.Width Petal.Length Petal.Width
## 101          6.3         3.3          6.0         2.5
## 102          5.8         2.7          5.1         1.9

Row binding with existing rbind()-type functions cannot recover Species.

do.call("rbind", my_list) # rownames have never looked so good ...
##               Sepal.Length Sepal.Width Petal.Length Petal.Width
## setosa.1               5.1         3.5          1.4         0.2
## setosa.2               4.9         3.0          1.4         0.2
## versicolor.51          7.0         3.2          4.7         1.4
## versicolor.52          6.4         3.2          4.5         1.5
## virginica.101          6.3         3.3          6.0         2.5
## virginica.102          5.8         2.7          5.1         1.9
dplyr::rbind_all(my_list)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          7.0         3.2          4.7         1.4
## 4          6.4         3.2          4.5         1.5
## 5          6.3         3.3          6.0         2.5
## 6          5.8         2.7          5.1         1.9

dplyr::rbind_all() + mapply()

Kara Woo provided this solution:

my_list2 <-
  mapply(`[<-`, my_list, 'Species', value = names(my_list), SIMPLIFY = FALSE)
dplyr::rbind_all(my_list2)
##   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1          5.1         3.5          1.4         0.2     setosa
## 2          4.9         3.0          1.4         0.2     setosa
## 3          7.0         3.2          4.7         1.4 versicolor
## 4          6.4         3.2          4.5         1.5 versicolor
## 5          6.3         3.3          6.0         2.5  virginica
## 6          5.8         2.7          5.1         1.9  virginica

tidyr::unnest() extension

Hadley Wickham pegged this as a data tidying task and added experimental new functionality for tidyr::unnest(): tidyverse/tidyr#22. I installed tidyr from this commit to try out the new list method.

library(tidyr) 
unnest_(my_list) # by default, unnest() just row binds
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1          5.1         3.5          1.4         0.2
## 2          4.9         3.0          1.4         0.2
## 3          7.0         3.2          4.7         1.4
## 4          6.4         3.2          4.5         1.5
## 5          6.3         3.3          6.0         2.5
## 6          5.8         2.7          5.1         1.9
unnest(my_list, Species) # but this creates the desired Species variable
##      Species Sepal.Length Sepal.Width Petal.Length Petal.Width
## 1     setosa          5.1         3.5          1.4         0.2
## 2     setosa          4.9         3.0          1.4         0.2
## 3 versicolor          7.0         3.2          4.7         1.4
## 4 versicolor          6.4         3.2          4.5         1.5
## 5  virginica          6.3         3.3          6.0         2.5
## 6  virginica          5.8         2.7          5.1         1.9

data.table.extras::rbindlistn()

Kevin Ushey proposed the rbindlistn() function from his data.table.extras package. I installed data.table.extras from this commit to try it out.

library(data.table.extras)
## Loading required package: data.table
rbindlistn(my_list, "Species")
##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1:          5.1         3.5          1.4         0.2     setosa
## 2:          4.9         3.0          1.4         0.2     setosa
## 3:          7.0         3.2          4.7         1.4 versicolor
## 4:          6.4         3.2          4.5         1.5 versicolor
## 5:          6.3         3.3          6.0         2.5  virginica
## 6:          5.8         2.7          5.1         1.9  virginica

data.table::rbindlist + vapply()

Arun Srinivasan also proposed a data.table solution:

library(data.table)
rbindlist(my_list)[, Species := rep(names(my_list), vapply(my_list, nrow, 0L))][]
##    Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
## 1:          5.1         3.5          1.4         0.2     setosa
## 2:          4.9         3.0          1.4         0.2     setosa
## 3:          7.0         3.2          4.7         1.4 versicolor
## 4:          6.4         3.2          4.5         1.5 versicolor
## 5:          6.3         3.3          6.0         2.5  virginica
## 6:          5.8         2.7          5.1         1.9  virginica

Oddly, the above does not print to Console for me when run interactively, but, lo, here it is after I render() this. Not sure what to make of that. Resolved.

Python-inspired enumerate()

Kevin Ushey also shared a Python-inspired enumerate() function, but I still need to apply it to my problem. At this point, seems superseded by his other solution.

enumerate <- function(X, FUN, ...) {
  result <- vector("list", length(X))
  for (i in seq_along(result)) {
    tmp <- FUN(X[[i]], i, ...)
    if (is.null(tmp))
      result[i] <- list(NULL)
    else
      result[[i]] <- tmp
  }
  result
}

l <- list(a = 1, b = 2, c = 3)
enumerate(l, function(x, i) {
  cat("Name:  ", names(l)[[i]], "\n")
  cat("Value: ", x, "\n")
  invisible()
})
## Name:   a 
## Value:  1 
## Name:   b 
## Value:  2 
## Name:   c 
## Value:  3
## [[1]]
## NULL
## 
## [[2]]
## NULL
## 
## [[3]]
## NULL

sessionInfo()

sessionInfo()
## R version 3.1.0 (2014-04-10)
## Platform: x86_64-apple-darwin10.8.0 (64-bit)
## 
## locale:
## [1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] data.table.extras_1.0 data.table_1.9.3      tidyr_0.1.0.9000     
## 
## loaded via a namespace (and not attached):
##  [1] assertthat_0.1   digest_0.6.4     dplyr_0.2.0.99   evaluate_0.5.5  
##  [5] formatR_0.10     htmltools_0.2.4  knitr_1.6        magrittr_1.0.1  
##  [9] parallel_3.1.0   plyr_1.8.1       Rcpp_0.11.1      reshape2_1.4    
## [13] rmarkdown_0.2.64 stringr_0.6.2    tools_3.1.0      yaml_2.1.13
---
title: "Row bind a list of data.frames with a key"
author: "Jenny Bryan"
date: "22 August, 2014"
output:
html_document:
keep_md: TRUE
---
I posed a question on Twitter (click to see the figure!):
<blockquote class="twitter-tweet" lang="en"><p>What is most elegant (d)plyr-ish way to do? <a href="https://twitter.com/hashtag/dplyr?src=hash">#dplyr</a> <a href="https://twitter.com/hadleywickham">@hadleywickham</a> <a href="http://t.co/kXsfkq7Rkq">pic.twitter.com/kXsfkq7Rkq</a></p>&mdash; Jennifer Bryan (@JennyBryan) <a href="https://twitter.com/JennyBryan/statuses/502864414266363904">August 22, 2014</a></blockquote>
that boiled down to this: you have a list of data.frames and the element names convey information. You want to row bind them together and, in the new data.frame, you want a variable for the list element each observation originated in.
I got back an embarrassment of riches, which I'll record here.
#### Problem example
First, make an appropriate list of data.frames from the `iris` data. Note the `Species` information is carried only in the list names.
```{r}
(my_list <-
lapply(split(subset(iris, select = -Species), iris$Species), "[", 1:2, ))
```
Row binding with existing `rbind()`-type functions cannot recover `Species`.
```{r collapse = TRUE}
do.call("rbind", my_list) # rownames have never looked so good ...
dplyr::rbind_all(my_list)
```
#### `dplyr::rbind_all()` + `mapply()`
[Kara Woo](https://twitter.com/kara_woo) provided [this solution](https://twitter.com/kara_woo/statuses/502867132049145858):
```{r}
my_list2 <-
mapply(`[<-`, my_list, 'Species', value = names(my_list), SIMPLIFY = FALSE)
dplyr::rbind_all(my_list2)
```
#### `tidyr::unnest()` extension
Hadley Wickham pegged this as a data tidying task and added experimental new functionality for `tidyr::unnest()`: <https://github.com/hadley/tidyr/issues/22>. I installed `tidyr` from [this commit](https://github.com/hadley/tidyr/commit/b44eeb66e683abc1f610b04962d00b3e91822f31) to try out the new list method.
```{r collapse = FALSE}
library(tidyr)
unnest_(my_list) # by default, unnest() just row binds
unnest(my_list, Species) # but this creates the desired Species variable
```
#### `data.table.extras::rbindlistn()`
Kevin Ushey proposed the [`rbindlistn()` function](https://github.com/kevinushey/data.table.extras/blob/master/R/rbindlistn.R) from his [data.table.extras](https://github.com/kevinushey/data.table.extras) package. I installed `data.table.extras` from [this commit](https://github.com/kevinushey/data.table.extras/commit/99c27fe56ac8fe6abc0cf11fb87881a5f587ec42) to try it out.
```{r}
library(data.table.extras)
rbindlistn(my_list, "Species")
```
#### `data.table::rbindlist` + `vapply()`
[Arun Srinivasan](https://twitter.com/arun_sriniv) also proposed [a `data.table` solution](https://twitter.com/arun_sriniv/statuses/503010269048872960):
```{r}
library(data.table)
rbindlist(my_list)[, Species := rep(names(my_list), vapply(my_list, nrow, 0L))][]
```
~~*Oddly, the above does not print to Console for me when run interactively, but, lo, here it is after I `render()` this. Not sure what to make of that.*~~ Resolved.
#### Python-inspired `enumerate()`
Kevin Ushey also [shared](https://gist.github.com/kevinushey/7538142b5e16dd3b7200) a Python-inspired `enumerate()` function, *but I still need to apply it to my problem. At this point, seems superseded by his other solution.*
```{r}
enumerate <- function(X, FUN, ...) {
result <- vector("list", length(X))
for (i in seq_along(result)) {
tmp <- FUN(X[[i]], i, ...)
if (is.null(tmp))
result[i] <- list(NULL)
else
result[[i]] <- tmp
}
result
}
l <- list(a = 1, b = 2, c = 3)
enumerate(l, function(x, i) {
cat("Name: ", names(l)[[i]], "\n")
cat("Value: ", x, "\n")
invisible()
})
```
#### sessionInfo()
```{r}
sessionInfo()
```
@arunsrinivasan
Copy link

@jennybc, great summary!

Oddly, the above does not print to Console for me when run interactively, but, lo, here it is after I render() this. Not sure what to make of that.

It's because of :=, because it assigns by reference and returns invisibly. Just add a [] at the end

rbindlist(my_list)[, Species := rep(names(my_list), vapply(my_list, nrow, 0L))][]

@jennybc
Copy link
Author

jennybc commented Aug 23, 2014

@arunsrinivasan Thanks! I fixed that.

@arunsrinivasan
Copy link

👍

@karthik
Copy link

karthik commented Aug 23, 2014

Fantastic write up @jennybc
Love it!

@jonnybaik
Copy link

I'm a little late to the party, but here is a plyr solution:

> plyr::ldply(my_list, .id="Species")

     Species Sepal.Length Sepal.Width Petal.Length Petal.Width
1     setosa          5.1         3.5          1.4         0.2
2     setosa          4.9         3.0          1.4         0.2
3 versicolor          7.0         3.2          4.7         1.4
4 versicolor          6.4         3.2          4.5         1.5
5  virginica          6.3         3.3          6.0         2.5
6  virginica          5.8         2.7          5.1         1.9

@jennybc
Copy link
Author

jennybc commented Sep 19, 2014

@jonnybaik Nice! I just saw this. Will add to the round-up. How did we all miss that ?!?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment