Skip to content

Instantly share code, notes, and snippets.

@rstats-tips-gist
Created January 28, 2021 17:35
Show Gist options
  • Save rstats-tips-gist/79704d25b31b0f450405003bca507c90 to your computer and use it in GitHub Desktop.
Save rstats-tips-gist/79704d25b31b0f450405003bca507c90 to your computer and use it in GitHub Desktop.
title author date categories tags slug lastmod comment toc autoCollapseToc postMetaInFooter hiddenFromHomePage contentCopyright reward mathjax mathjaxEnableSingleDollar mathjaxEnableAutoNumber hideHeaderAndFooter flowchartDiagrams sequenceDiagrams
Iterating over the lines of a data.frame with purrr
rstats
2020-11-21
tidyverse
purrr
iterating-over-the-lines-of-a-data-frame-with-purrr
2020-11-21T17:41:43+02:00
false
true
false
false
false
false
false
false
false
false
false
enable options
false
enable options
false

I sometimes have a function which takes some parameters and returns a data.frame as a result. Then I have a data.frame where each row of it is a set of parameters. So I like to apply the function to each row of the parameter-data.frame and rbind the resulting data.frames.

There are several ways to do it. Let's have a look:

The function ...

So let's build a simple function we can use

https://gist.github.com/77b37ed4f28a03460b2372a1afba1573

https://gist.github.com/e600e6851e02b45538091f12cc1c5883

So this function takes three arguments and returns a data.frame. The length of the data.frame depends on the last parameter.

... and its parameters

So now we have several tuples of paramters. Each tuple is a row of our parameter-data.frame:

https://gist.github.com/83ca77d693bcbb46fff66621970f44d7

https://gist.github.com/c5fa0c41ec9ebc639d6a3e0af15aac15

So now we want to apply our function three times, one time for each row of the data.frame parameters.

Iterating with ...

There are several ways to interate.

... a for-loop

The most common way in programming is a for-loop:

https://gist.github.com/57c663ea0bfd59c7b4da41dcdfb2ea51

https://gist.github.com/bdcfd7f53a77f19c90d00ce20df8a96b

That's very ugly: You have to initialize the result-data.frame and it's slow. Whenever you want to use a for-loop in R step back and think about using something else.

... lapply()

Instead of for-loops you should use apply or one of its derivates. But apply works with lists. data.frames are lists but column-wise ones.

So we need to split the data.frame parameters into a list rowwise using split. Then we can apply my_function. Then we use do.call(rbind, x) do merge the results into one data.frame.

https://gist.github.com/882f0cf309d896345315d550325ec00f

https://gist.github.com/f97e03955b1c14f2e5c10bc3c23cde30

That's a lot more R-like. But the winner is:

... pmap_dfr() out of the purrr-package

The most elegant way I know of is purr's pmap_dfr

https://gist.github.com/2534bde812e39ea26f1f7f3f0193e5d4

https://gist.github.com/5ce3211a9d0c4f2ec1be8753671d4fae

pmap_dfr respects the column-names and parameter-names of the function. So you can mix them in the parameter-data.frame:

https://gist.github.com/edce95cae72ed09b0933509456a3a473

https://gist.github.com/957784a39f2e176ef22f904d1f1287ce

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment