Skip to content

Instantly share code, notes, and snippets.

@aammd
Last active Jan 5, 2021
Embed
What would you like to do?
turning a named list into a dataframe using dplyr

Is there an easy way to convert a named list into a dataframe, preserving the elements of the list in a "list-column"?

    library(dplyr)
    library(magrittr)

    ## make a random matrix
    rand_mat <- function() {
      Nrow <- sample(2:15,1)
      Ncol <- sample(2:15,1)
      rpois(Nrow*Ncol,20) %>%
        matrix(nrow = Nrow,ncol = Ncol)
      }

    ## make a named list
    unnamed.list <- replicate(10,rand_mat(),simplify = FALSE) 

    named.list <- unnamed.list %>% set_names(LETTERS[1:10])

    list_to_df <- function(listfordf){
      if(!is.list(listfordf)) stop("it should be a list")
      
      if(listfordf %>% names %>% is.null) {
        seq_along(listfordf) %>%
          data.frame(matname = ., stringsAsFactors = FALSE) %>%
          rowwise %>%
          do(list.element = listfordf %>% extract2(.$matname))
        } else {
          names(listfordf) %>%
            data.frame(matname = ., stringsAsFactors = FALSE) %>%
            group_by(matname) %>%
            do(comm.matrix = listfordf %>% extract2(.$matname))
          }
    }

    list_to_df(unnamed.list)

    ## Source: local data frame [10 x 1]
    ## Groups: <by row>
    ## 
    ##    list.element
    ## 1   <int[7,11]>
    ## 2  <int[15,14]>
    ## 3    <int[6,4]>
    ## 4  <int[11,14]>
    ## 5   <int[2,10]>
    ## 6  <int[11,13]>
    ## 7  <int[11,11]>
    ## 8   <int[13,8]>
    ## 9   <int[11,5]>
    ## 10  <int[4,15]>

    list_to_df(named.list)

    ## Source: local data frame [10 x 1]
    ## Groups: <by row>
    ## 
    ##    list.element
    ## 1   <int[7,11]>
    ## 2  <int[15,14]>
    ## 3    <int[6,4]>
    ## 4  <int[11,14]>
    ## 5   <int[2,10]>
    ## 6  <int[11,13]>
    ## 7  <int[11,11]>
    ## 8   <int[13,8]>
    ## 9   <int[11,5]>
    ## 10  <int[4,15]>

The dplyr package is used to apply the "Split-Apply-Combine" method of data analysis. Many useRs might have previously used named lists in combination with plyr::llply or plyr::ldply, applying a function to each section before combining them.

@rdinnager
Copy link

rdinnager commented Jun 6, 2014

Could you replace group_by(matname) with rowwise?

@rdinnager
Copy link

rdinnager commented Jun 6, 2014

No wait, that would not preserve the list names..

@hadley
Copy link

hadley commented Jun 6, 2014

I'd do it like this:

list_to_df <- function(listfordf){
  if(!is.list(named.list)) stop("it should be a list")

  df <- list(list.element = listfordf)
  class(df) <- c("tbl_df", "data.frame")
  attr(df, "row.names") <- .set_row_names(length(listfordf))

  if (!is.null(names(listfordf))) {
    df$name <- names(listfordf)
  }

  df
}

@rdinnager
Copy link

rdinnager commented Jun 9, 2014

Well, that makes a lot of sense. I keep forgetting that data.frame objects (and I suppose also tbl_df objects) are just lists "under the hood". Neat!

@aammd
Copy link
Author

aammd commented Jun 10, 2014

Thanks, Hadley! I've never seen this .set_row_names function before; I see now that it is an internal object. I really like this; it is very succinct!
...
er, I just checked with microbenchmark, and it turns out your version is fully 200x faster than my original! Is that because of the direct list-to-dataframe conversion?

@krlmlr
Copy link

krlmlr commented Sep 8, 2015

Now in my misc package, with slight modifications: https://github.com/krlmlr/kimisc/blob/develop/R/list_to_df.R

devtools::install("krlmlr/kimisc")

@austinj
Copy link

austinj commented Dec 20, 2017

I'm very late to the party, but I think that the second line of the list_to_df function Hadley wrote above should name the argument as listfordf, not named.list.

@EvanFalcone
Copy link

EvanFalcone commented Apr 19, 2018

^ confirming. It's no biggy, the input variable name probably just changed midway or something.

@dsolito
Copy link

dsolito commented Feb 17, 2019

Hello, I had the same problem.
Finally, I resolved it with :

list(site1 = c("url1", "url2"), site2 = c("url2", "url3", "url3")) %>% 
  enframe() %>% 
  unnest()

@Tayflo
Copy link

Tayflo commented Jan 5, 2021

Hello, I had the same problem.
Finally, I resolved it with :

list(site1 = c("url1", "url2"), site2 = c("url2", "url3", "url3")) %>% 
  enframe() %>% 
  unnest()

Very nice, thanks! I needed this to properly use a JSON file imported from jsonlite.

If it can be of use to anyone, here is a full reprex of the above code (with some details):

library(magrittr)
list(
  site1 = c("url1", "url2"),
  site2 = c("url2", "url3", "url3")
) %>%
  tibble::enframe("site", "url") %>%
  tidyr::unnest(cols = url)
#> # A tibble: 5 x 2
#>   site  url  
#>   <chr> <chr>
#> 1 site1 url1 
#> 2 site1 url2 
#> 3 site2 url2 
#> 4 site2 url3 
#> 5 site2 url3

Created on 2021-01-05 by the reprex package (v0.3.0)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment