Skip to content

Instantly share code, notes, and snippets.

@jennybc
Created January 13, 2015 01:00
Show Gist options
  • Save jennybc/97f2a969e2985f1362f3 to your computer and use it in GitHub Desktop.
Save jennybc/97f2a969e2985f1362f3 to your computer and use it in GitHub Desktop.
Binding a list of lists

Binding a list of lists

Jenny Bryan
12 January, 2015

I want to convert a list of lists into another list of lists through ... catenation or binding? Hard to describe in words -- easier to illustrate.

FWIW, in real life, I make a sequence of requests to an API, traversing all possible pages of results. Each GET request yields a list representing the data for one page, with components for the content, headers, status code, etc. I want to wrangle the resulting list of lists into something closer to what I would have gotten in the absence of pagination.

I illustrate with a simpler example that has nothing to do with an API. Here's a picture, but there's R code below:

Imgur

x is the input list of lists, with two conformable components, april and july.

library(plyr)
library(magrittr)

x <-
  list(april = list(n_days = 30,
                    holidays = list(list("2015-04-01", "april fools"),
                                    list("2015-04-05", "easter")),
                    month_info = c(number = "4", season = "spring")),
       july = list(n_days = 31,
                   holidays = list(list("2014-07-04", "july 4th")),
                   month_info = c(number = "7", season = "summer"))) %T>% str
## List of 2
##  $ april:List of 3
##   ..$ n_days    : num 30
##   ..$ holidays  :List of 2
##   .. ..$ :List of 2
##   .. .. ..$ : chr "2015-04-01"
##   .. .. ..$ : chr "april fools"
##   .. ..$ :List of 2
##   .. .. ..$ : chr "2015-04-05"
##   .. .. ..$ : chr "easter"
##   ..$ month_info: Named chr [1:2] "4" "spring"
##   .. ..- attr(*, "names")= chr [1:2] "number" "season"
##  $ july :List of 3
##   ..$ n_days    : num 31
##   ..$ holidays  :List of 1
##   .. ..$ :List of 2
##   .. .. ..$ : chr "2014-07-04"
##   .. .. ..$ : chr "july 4th"
##   ..$ month_info: Named chr [1:2] "7" "summer"
##   .. ..- attr(*, "names")= chr [1:2] "number" "season"

y, below, is indicative of my desired output, though I'm flexible on details like (row) names, matrix vs. data.frame, etc. I want to catenate or bind each component, such as n_days or holidays, across all the months.

y <- list(n_days = c(april = 30, july = 31),
          holidays = list(list("2015-04-01", "april fools"),
                          list("2015-04-05", "easter"),
                          list("2014-07-04", "july 4th")),
          month_info = cbind(april = c(number = "4", season = "spring"),
                             july = c(number = "7", season =
                                        "summer"))) %T>% str
## List of 3
##  $ n_days    : Named num [1:2] 30 31
##   ..- attr(*, "names")= chr [1:2] "april" "july"
##  $ holidays  :List of 3
##   ..$ :List of 2
##   .. ..$ : chr "2015-04-01"
##   .. ..$ : chr "april fools"
##   ..$ :List of 2
##   .. ..$ : chr "2015-04-05"
##   .. ..$ : chr "easter"
##   ..$ :List of 2
##   .. ..$ : chr "2014-07-04"
##   .. ..$ : chr "july 4th"
##  $ month_info: chr [1:2, 1:2] "4" "spring" "7" "summer"
##   ..- attr(*, "dimnames")=List of 2
##   .. ..$ : chr [1:2] "number" "season"
##   .. ..$ : chr [1:2] "april" "july"

I can of course do with brute force. But it isn't a great starting place for a general solution to the actual problem, which is catenating across pages returned by an API.

brute_force_y <-
  list(n_days = laply(x, `[[`, "n_days"),
       holidays = llply(x, `[[`, "holidays") %>% unlist(recursive = FALSE),
       month_info = lapply(x, function(z) z[["month_info"]] %>% unlist) %>%
         do.call("cbind", .))

## agree up to naming stuff
all.equal(y, brute_force_y)
## [1] "Component \"n_days\": names for target but not for current"  
## [2] "Component \"holidays\": names for current but not for target"

Feels like there must be some way to do this with mapply() or tidyr or ???

Is there a (better) name for what I am trying to do? Is this close to some pre-existing workflow that I could exploit by approaching differently?

---
title: "Binding a list of lists"
author: "Jenny Bryan"
date: "12 January, 2015"
output:
html_document:
keep_md: TRUE
---
I want to convert a list of lists into another list of lists through ... catenation or binding? Hard to describe in words -- easier to illustrate.
FWIW, in real life, I make a sequence of requests to an API, traversing all possible pages of results. Each `GET` request yields a list representing the data for one page, with components for the content, headers, status code, etc. I want to wrangle the resulting list of lists into something *closer to what I would have gotten in the absence of pagination*.
I illustrate with a simpler example that has nothing to do with an API. Here's a picture, but there's R code below:
![Imgur](http://i.imgur.com/6IBInP9.png)
`x` is the input list of lists, with two conformable components, `april` and `july`.
```{r}
library(plyr)
library(magrittr)
x <-
list(april = list(n_days = 30,
holidays = list(list("2015-04-01", "april fools"),
list("2015-04-05", "easter")),
month_info = c(number = "4", season = "spring")),
july = list(n_days = 31,
holidays = list(list("2014-07-04", "july 4th")),
month_info = c(number = "7", season = "summer"))) %T>% str
```
`y`, below, is indicative of my desired output, though I'm flexible on details like (row) names, matrix vs. data.frame, etc. I want to catenate or bind each component, such as `n_days` or `holidays`, across all the months.
```{r}
y <- list(n_days = c(april = 30, july = 31),
holidays = list(list("2015-04-01", "april fools"),
list("2015-04-05", "easter"),
list("2014-07-04", "july 4th")),
month_info = cbind(april = c(number = "4", season = "spring"),
july = c(number = "7", season =
"summer"))) %T>% str
```
I can of course do with brute force. But it isn't a great starting place for a general solution to the actual problem, which is catenating across pages returned by an API.
```{r}
brute_force_y <-
list(n_days = laply(x, `[[`, "n_days"),
holidays = llply(x, `[[`, "holidays") %>% unlist(recursive = FALSE),
month_info = lapply(x, function(z) z[["month_info"]] %>% unlist) %>%
do.call("cbind", .))
## agree up to naming stuff
all.equal(y, brute_force_y)
```
Feels like there must be some way to do this with `mapply()` or `tidyr` or ???
Is there a (better) name for what I am trying to do? Is this close to some pre-existing workflow that I could exploit by approaching differently?
@jennybc
Copy link
Author

jennybc commented Jan 13, 2015

Via Twitter, @rdpeng says:

I think the Reduce() function is what you want. It's not necessarily prettier, but it walks the list and combines as you go.

@jennybc
Copy link
Author

jennybc commented Jan 13, 2015

Via Twitter, @noamross says:

This is a job for rlist, I think.

@noamross
Copy link

Here's a start.

library(rlist)
library(pipeR)

renamed = function(obj, newnames) {
  names(obj) = newnames
  return(obj)
}

y = x %>>% 
  list.ungroup %>>%
  list.group(.name) %>>%
  list.map(renamed(., names(x))) %>>%
  list.map(unlist(., recursive=FALSE))
str(y)  
List of 3
 $ n_days    : Named num [1:2] 30 31
  ..- attr(*, "names")= chr [1:2] "april" "july"
 $ holidays  :List of 3
  ..$ april1:List of 2
  .. ..$ : chr "2015-04-01"
  .. ..$ : chr "april fools"
  ..$ april2:List of 2
  .. ..$ : chr "2015-04-05"
  .. ..$ : chr "easter"
  ..$ july  :List of 2
  .. ..$ : chr "2014-07-04"
  .. ..$ : chr "july 4th"
 $ month_info: Named chr [1:4] "4" "spring" "7" "summer"
  ..- attr(*, "names")= chr [1:4] "april.number" "april.season" "july.number" "july.season"

There's still a bit of a problem here because unlist combines a list of vectors in month_info into a single vector, rather than something tabular. So the function in list.map may still need to have some logic that depends on the data type. (Combine scalars to a vector, vectors to a table, lists to a list, etc.)

@renkun-ken
Copy link

My solution here:

I'm using the latest development version of rlist. The trick here is simplify2array I think.

library(rlist)
library(pipeR)

list(
  n_days = x %>>% list.mapv(n_days),
  holidays = x %>>% 
    list.map(holidays) %>>%
    list.ungroup %>>%
    unname,
  month_info = x %>>% 
    list.map(month_info) %>>%
    simplify2array)

The output is the same as the desired result:

List of 3
 $ n_days    : Named num [1:2] 30 31
  ..- attr(*, "names")= chr [1:2] "april" "july"
 $ holidays  :List of 3
  ..$ :List of 2
  .. ..$ : chr "2015-04-01"
  .. ..$ : chr "april fools"
  ..$ :List of 2
  .. ..$ : chr "2015-04-05"
  .. ..$ : chr "easter"
  ..$ :List of 2
  .. ..$ : chr "2014-07-04"
  .. ..$ : chr "july 4th"
 $ month_info: chr [1:2, 1:2] "4" "spring" "7" "summer"
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "number" "season"
  .. ..$ : chr [1:2] "april" "july"

@hadley
Copy link

hadley commented Jan 13, 2015

Here's a solution using lowliner. The key is to recognise that the initial translation is called an unzip():

library(lowliner)
library(dplyr)

# Unzip gets you most of the way there
y <- x %>% unzip()
str(y)

# It's then just a matter of making the data frame you want
y$month_info %>% 
  map(. %>% as.list %>% as_data_frame) %>%
  bind_rows()

@noamross
Copy link

It's interesting that we all found quick or canned solution to the inversion/unzipping problem, but all ran into the need to customize for the "combining data of different types" problem.

This seems like a general problem, and I'd note there's a close parallel here to the various JSON input/output R packages, which have different defaults or approaches for dealing with the translation of tabular data between JSON and R.

plyr has the property that you can specify how you want data combined by changing a single letter to choose the appropriate combining function (l[?]ply).

Perhaps rlist or lowliner could include a solution with similarly easy syntax to specify the combining function for different data types.

@hadley
Copy link

hadley commented Jan 13, 2015

@noamross If as_data_frame() knew what to do with a vector/matrix (tidyverse/dplyr#876) then my solution would be even briefer. I think that's about as far as you can go without losing generality

@noamross
Copy link

@hadley It would need to know what to do with lists with more than one level, too, right? If holidays had fields with sub-fields, it wouldn't combine to a data frame in an obvious way. There are probably a number of such cases, so you can't have a totally general solution, but maybe a function with syntax that lets you specify your preferences easily.

@renkun-ken
Copy link

With the latest rlist (v0.4), list.unzip is defined so that this problem can be easily solved using

list.unzip(x, holidays = c("list.ungroup", "unname", "simplify2array"))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment