Skip to content

Instantly share code, notes, and snippets.

@whizzalan
Last active June 27, 2017 07:35
Show Gist options
  • Save whizzalan/f6ce512822adbc4b69f07b88f6cdf491 to your computer and use it in GitHub Desktop.
Save whizzalan/f6ce512822adbc4b69f07b88f6cdf491 to your computer and use it in GitHub Desktop.

Pipeline Operator in R

Concepts of Pipeline

1R is functional language, which means that your code often contains a lot of ( parentheses).

And complex code often means nesting those parentheses together, which make code hard to read and understand.

傳統寫法

hourly_delay <- filter( 
  summarise(
    group_by( 
      filter(
        flights, 
        !is.na(dep_delay)
      ), 
      date, hour
    ), 
    delay = mean(dep_delay), 
    n = n()
  ), 
  n > 10 
) 

Pipeline 寫法

hourly_delay <- flights %>% 
 filter(!is.na(dep_delay)) %>% 
 group_by(date, hour) %>% 
 summarise( 
   delay = mean(dep_delay), 
   n = n() ) %>% 
 filter(n > 10)

Features

使用時機:當你需要對資料作許多複雜繁瑣的操作時候,包含畫圖和資料整理。

  1. 減少開發時間和改進好閱讀和維護的程式碼
  2. 讓R code 更簡潔有力。由於R是 functional language, 容易發生過多巢狀函數呼叫(nested function calls)。
  3. 如同 Unix 的Pipe ps -aux | grep r

主要套件:{magrittr}: 主要的運算子 %>% , %<>% , .

  1. %>% : The rules are simple: the object on the left hand side is passed as the first argument to the function on the right hand side.
  2. %<>%: 將計算完的資料一併回傳給左邊變數(需要使用同一資料名稱時可用)
  3. . :It's even possible to pass in data to something other than the first argument of the function using a . (dot) operator to mark the place where the object goes.

anonymous functions, or lambdas

2

car_data %>%
(function(x) {
  if (nrow(x) > 2) 
    rbind(head(x, 1), tail(x, 1))
  else x
})

a short-hansd notation.

car_data %>%
{ 
  if (nrow(.) > 0)
    rbind(head(., 1), tail(., 1))
  else .
}

Additional pipe operators

  1. %T>%: tee operator, side-effect(print, plotting, logging, etc)  +  可以用在連續combo的時候,有些想接招但指作一次。(來到英國跟高手切磋才逼自己感受到這功能)
rnorm(200) %>%
matrix(ncol = 2) %T>%
plot %>% # plot usually does not return anything. 
colSums
  1. %$%: exposition operator, This operator is handy when functions do not themselves have a data argument, as for example lm and aggregate do.  +  可以用在傳送整個data.frame物件給後面函數使用!
iris %>%
  subset(Sepal.Length > mean(Sepal.Length)) %$%
  cor(Sepal.Length, Sepal.Width)

data.frame(z = rnorm(100)) %$% 
  ts.plot(z)
  1. %<>%: compound assignment operator,

The %<>% can be used whenever expr <- ... makes sense, e.g.

x %<>% foo %>% bar
x[1:10] %<>% foo %>% bar
x$baz %<>% foo %>% bar

Aliases

more compact form as

rnorm(100) %>% `*`(5) %>% `+`(5) %>% 
{
  cat("Mean:", mean(.), "Variance:", var(.),  "\n")
  head(.)
}

3

tags: R,pipeline

Footnotes

  1. revolutionR blog

  2. magrittr

  3. magritte

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment