Great post! https://github.com/2DegreesInvesting/resources/issues/283
The writing is super clear. These are the only tiny details that caught my eye:
- The pseudo code uses
>
but likley you meant%>%
.
# What I think you mean
first() %>% second() %>% third()
# What I see
first() > second() > third()
-
You introduce "a couple of attempts" but present three -- not two.
-
Advertising the reprex package is nice but an overkill for every chunk.
# Or reprex(advertise = FALSE)
reprex::reprex_r("Thanks anyway.")
yields:
"Thanks anyway."
#> [1] "Thanks anyway."
- The pipe you're using comes from the magrittr package. Althought dplyr reeports the pipe, I think it's best to point readers to magrittr so they are more likely to discover related information (see below a link to an article about sequential piping). Also, maggrittr doesn't have any startup message so no need to suppress anything and your example becomes cleaner.
I'd like to explore some interesting and potentially unexpected behavior observed when piping functions in R.
As an exploration of your thoughts, I think the post is great as is. Writting is a great tool for clarifying your thinking and thus learning. While I encourage this selfish goal, I think you could make a greater contribution to the reader by expanding the goal of this article to give a bit of historical context and to teach and show a general design principle. If you do this then I think this is worth a post on our blog.
The behaviour you report is a design choice that considered tradeoffs. I wouldn't reiterate what's been said elsewhere but I would acknowledge and cite it.
I think it's important to cite Lionel Henry's post on sequential piping. That reference alone will help the reader understand the current behaviour you report wasn't always the case.
# devtools::install_version("magrittr", "1.5")
packageVersion("magrittr")
#> [1] '1.5'
suppressPackageStartupMessages(library(dplyr))
first <- function() {
print("First")
return()
}
second <- function(x) {
print("Second")
return(x)
}
third <- function(x) {
print("Third")
return(x)
}
first() %>%
second() %>%
third()
#> [1] "First"
#> [1] "Second"
#> [1] "Third"
#> NULL
Beyond the bahaviour you report, some motivated readers may thank a pointers to:
Appart from the history, readers might want to learn some principle they can use
to build pipable functions. Your solution doesn't do that. It works if you call
first()
first and second()
second -- but it doesn't generalize to calling
second()
first and first()
second. This is confusing so let me show you what
I mean:
suppressPackageStartupMessages(library(dplyr))
first <- function() {
print("First")
}
second <- function(x) {
x # forces evaluation of the argument
print("Second")
}
first() %>% second()
#> [1] "First"
#> [1] "Second"
# Fails
second() %>% first()
#> Error in first(.): unused argument (.)
The advice I'd like to read is:
If a function is called primarily for its side-effects, it should invisibly return a useful output. If there’s no obvious output, return the first argument. This makes it possible to use the function with in a pipeline. -- https://design.tidyverse.org/out-invisible.html
This still needs a bit of thought to make it work with your example -- because your example does not really need a first argument. What I would do is something like this:
library(magrittr)
# @param x Don't use it. It's only to support piping.
#
# @return Called for it's side effect. Returns fist argument invisibly
# (https://design.tidyverse.org/out-invisible.html).
first <- function(x = NULL) {
force(x)
print("First")
invisible(x)
}
second <- function(x = NULL) {
force(x)
print("Second")
invisible(x)
}
first() %>% second()
#> [1] "First"
#> [1] "Second"
second() %>% first()
#> [1] "Second"
#> [1] "First"