Skip to content

Instantly share code, notes, and snippets.

@nsabharwal
Last active March 6, 2017 01:44
Show Gist options
  • Save nsabharwal/9163e0adfc66af080145 to your computer and use it in GitHub Desktop.
Save nsabharwal/9163e0adfc66af080145 to your computer and use it in GitHub Desktop.
#Install R and then following packages
#repr failed to create
yum install R-*
install.packages("evaluate", dependencies = TRUE)
install.packages("base64enc", dependencies = TRUE)
install.packages("devtools", dependencies = TRUE)
install_github('IRkernel/repr')
install.packages("dplyr", dependencies = TRUE)
install.packages("caret", dependencies = TRUE)
install.packages("repr", dependencies = TRUE)
#setup zeppelin
git clone https://github.com/elbamos/Zeppelin-With-R
mvn clean package -DskipTests
#Notebook
%spark.r
library(magrittr)
library(dplyr)
library(ggplot2)
library(caret)
# flights.csv is on local os
%spark.r
flights <- read.csv("/tmp/flights.csv")
head(flights)
## dataset https://github.com/SparkIQ-Labs/Demos/tree/master/datasets/nyc-flights-dataset
%spark.r
# mean arrival delays for each destination airport
flights %>%
group_by(dest) %>%
summarise(
arr_delay = mean(arr_delay, na.rm = TRUE),
n=n()) %>%
arrange(desc(arr_delay))
# plot
%spark.knitr
``` {r echo = FALSE}
per_hour <- flights %>%
filter (cancelled == 0) %>%
mutate(time = hour + minute / 60) %>%
group_by(time) %>%
summarise(
arr_delay = mean(arr_delay, na.rm = TRUE),
n = n()
)
ggplot(filter(per_hour, n > 30 ), aes(time, arr_delay)) +
geom_vline(xintercept = 5:24 , colour = "white", size = 2) +
geom_point()
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment