1R is functional language, which means that your code often contains a lot of ( parentheses).
And complex code often means nesting those parentheses together, which make code hard to read and understand.
傳統寫法
hourly_delay <- filter(
summarise(
group_by(
filter(
flights,
!is.na(dep_delay)
),
date, hour
),
delay = mean(dep_delay),
n = n()
),
n > 10
)
Pipeline 寫法
hourly_delay <- flights %>%
filter(!is.na(dep_delay)) %>%
group_by(date, hour) %>%
summarise(
delay = mean(dep_delay),
n = n() ) %>%
filter(n > 10)
使用時機:當你需要對資料作許多複雜繁瑣的操作時候,包含畫圖和資料整理。
- 減少開發時間和改進好閱讀和維護的程式碼
- 讓R code 更簡潔有力。由於R是 functional language, 容易發生過多巢狀函數呼叫(nested function calls)。
- 如同 Unix 的Pipe
ps -aux | grep r
主要套件:{magrittr}
: 主要的運算子 %>%
, %<>%
, .
%>%
: The rules are simple: the object on the left hand side is passed as the first argument to the function on the right hand side.%<>%
: 將計算完的資料一併回傳給左邊變數(需要使用同一資料名稱時可用).
:It's even possible to pass in data to something other than the first argument of the function using a . (dot) operator to mark the place where the object goes.
car_data %>%
(function(x) {
if (nrow(x) > 2)
rbind(head(x, 1), tail(x, 1))
else x
})
a short-hansd notation.
car_data %>%
{
if (nrow(.) > 0)
rbind(head(., 1), tail(., 1))
else .
}
%T>%
: tee operator, side-effect(print, plotting, logging, etc) + 可以用在連續combo的時候,有些想接招但指作一次。(來到英國跟高手切磋才逼自己感受到這功能)
rnorm(200) %>%
matrix(ncol = 2) %T>%
plot %>% # plot usually does not return anything.
colSums
%$%
: exposition operator, This operator is handy when functions do not themselves have a data argument, as for example lm and aggregate do. + 可以用在傳送整個data.frame物件給後面函數使用!
iris %>%
subset(Sepal.Length > mean(Sepal.Length)) %$%
cor(Sepal.Length, Sepal.Width)
data.frame(z = rnorm(100)) %$%
ts.plot(z)
%<>%
: compound assignment operator,
The %<>%
can be used whenever expr <- ...
makes sense, e.g.
x %<>% foo %>% bar
x[1:10] %<>% foo %>% bar
x$baz %<>% foo %>% bar
more compact form as
rnorm(100) %>% `*`(5) %>% `+`(5) %>%
{
cat("Mean:", mean(.), "Variance:", var(.), "\n")
head(.)
}