Skip to content

Instantly share code, notes, and snippets.

@anirudhjayaraman
Created December 19, 2015 09:34
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anirudhjayaraman/2555eac5d30ba45f2207 to your computer and use it in GitHub Desktop.
Save anirudhjayaraman/2555eac5d30ba45f2207 to your computer and use it in GitHub Desktop.
group_by illustrative examples
# group_by() -------------------------------------------------------------------------
# Generate a per-carrier summary of hflights with the following variables: n_flights,
# the number of flights flown by the carrier; n_canc, the number of cancelled flights;
# p_canc, the percentage of cancelled flights; avg_delay, the average arrival delay of
# flights whose delay does not equal NA. Next, order the carriers in the summary from
# low to high by their average arrival delay. Use percentage of flights cancelled to
# break any ties. Which airline scores best based on these statistics?
hflights %>%
group_by(UniqueCarrier) %>%
summarise(n_flights = n(), n_canc = sum(Cancelled), p_canc = 100*n_canc/n_flights,
avg_delay = mean(ArrDelay, na.rm = TRUE)) %>% arrange(avg_delay)
# Generate a per-day-of-week summary of hflights with the variable avg_taxi,
# the average total taxiing time. Pipe this summary into an arrange() call such
# that the day with the highest avg_taxi comes first.
hflights %>%
group_by(DayOfWeek) %>%
summarize(avg_taxi = mean(TaxiIn + TaxiOut, na.rm = TRUE)) %>%
arrange(desc(avg_taxi))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment