title | author | date | categories | tags | slug | lastmod | comment | toc | autoCollapseToc | postMetaInFooter | hiddenFromHomePage | contentCopyright | reward | mathjax | mathjaxEnableSingleDollar | mathjaxEnableAutoNumber | hideHeaderAndFooter | flowchartDiagrams | sequenceDiagrams | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Iterating over the lines of a data.frame with purrr |
rstats |
2020-11-21 |
|
|
iterating-over-the-lines-of-a-data-frame-with-purrr |
2020-11-21T17:41:43+02:00 |
false |
true |
false |
false |
false |
false |
false |
false |
false |
false |
false |
|
|
I sometimes have a function which takes some parameters and returns a data.frame as a result. Then I have a data.frame where each row of it is a set of parameters. So I like to apply the function to each row of the parameter-data.frame and rbind the resulting data.frames.
There are several ways to do it. Let's have a look:
So let's build a simple function we can use
https://gist.github.com/77b37ed4f28a03460b2372a1afba1573
https://gist.github.com/e600e6851e02b45538091f12cc1c5883
So this function takes three arguments and returns a data.frame
. The length
of the data.frame depends on the last parameter.
So now we have several tuples of paramters. Each tuple is a row of our parameter-data.frame:
https://gist.github.com/83ca77d693bcbb46fff66621970f44d7
https://gist.github.com/c5fa0c41ec9ebc639d6a3e0af15aac15
So now we want to apply our function three times, one time for each row of the
data.frame parameters
.
There are several ways to interate.
The most common way in programming is a for-loop:
https://gist.github.com/57c663ea0bfd59c7b4da41dcdfb2ea51
https://gist.github.com/bdcfd7f53a77f19c90d00ce20df8a96b
That's very ugly: You have to initialize the result-data.frame and it's slow. Whenever you want to use a for-loop in R step back and think about using something else.
Instead of for-loops you should use apply
or one of its derivates.
But apply
works with lists. data.frames are lists but column-wise ones.
So we need to split the data.frame parameters into a list rowwise using split
.
Then we can apply my_function
. Then we use do.call(rbind, x)
do merge the
results into one data.frame.
https://gist.github.com/882f0cf309d896345315d550325ec00f
https://gist.github.com/f97e03955b1c14f2e5c10bc3c23cde30
That's a lot more R-like. But the winner is:
The most elegant way I know of is purr's pmap_dfr
https://gist.github.com/2534bde812e39ea26f1f7f3f0193e5d4
https://gist.github.com/5ce3211a9d0c4f2ec1be8753671d4fae
pmap_dfr
respects the column-names and parameter-names of the function. So
you can mix them in the parameter-data.frame: