Skip to content

Instantly share code, notes, and snippets.

@anirudhjayaraman
Created December 22, 2015 12:56
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save anirudhjayaraman/f498d036f71fa87c9789 to your computer and use it in GitHub Desktop.
Save anirudhjayaraman/f498d036f71fa87c9789 to your computer and use it in GitHub Desktop.
# Combine group_by with mutate-----
# First, discard flights whose arrival delay equals NA. Next, create a by-carrier
# summary with a single variable: p_delay, the proportion of flights which are
# delayed at arrival. Next, create a new variable rank in the summary which is a
# rank according to p_delay. Finally, arrange the observations by this new rank
hflights %>%
filter(!is.na(ArrDelay)) %>%
group_by(UniqueCarrier) %>%
summarise(p_delay = sum(ArrDelay >0)/n()) %>%
mutate(rank = rank(p_delay)) %>%
arrange(rank)
# n a similar fashion, keep flights that are delayed (ArrDelay > 0 and not NA).
# Next, create a by-carrier summary with a single variable: avg, the average delay
# of the delayed flights. Again add a new variable rank to the summary according to
# avg. Finally, arrange by this rank variable.
hflights %>%
filter(!is.na(ArrDelay), ArrDelay > 0) %>%
group_by(UniqueCarrier) %>%
summarise(avg = mean(ArrDelay)) %>%
mutate(rank = rank(avg)) %>%
arrange(rank)
# Advanced group_by exercises-------------------------------------------------------
# Which plane (by tail number) flew out of Houston the most times? How many times?
# Name the column with this frequency n. Assign the result to adv1. To answer this
# question precisely, you will have to filter() as a final step to end up with only
# a single observation in adv1.
# Which plane (by tail number) flew out of Houston the most times? How many times? adv1
adv1 <- hflights %>%
group_by(TailNum) %>%
summarise(n = n()) %>%
filter(n == max(n))
# How many airplanes only flew to one destination from Houston? adv2
# How many airplanes only flew to one destination from Houston?
# Save the resulting dataset in adv2, that contains only a single column,
# named nplanes and a single row.
adv2 <- hflights %>%
group_by(TailNum) %>%
summarise(n_dest = n_distinct(Dest)) %>%
filter(n_dest == 1) %>%
summarise(nplanes = n())
# Find the most visited destination for each carrier and save your solution to adv3.
# Your solution should contain four columns:
# UniqueCarrier and Dest,
# n, how often a carrier visited a particular destination,
# rank, how each destination ranks per carrier. rank should be 1 for every row,
# as you want to find the most visited destination for each carrier.
adv3 <- hflights %>%
group_by(UniqueCarrier, Dest) %>%
summarise(n = n()) %>%
mutate(rank = rank(desc(n))) %>%
filter(rank == 1)
# Find the carrier that travels to each destination the most: adv4
# For each destination, find the carrier that travels to that destination the most.
# Store the result in adv4. Again, your solution should contain 4 columns:
# Dest, UniqueCarrier, n and rank.
adv4 <- hflights %>%
group_by(Dest, UniqueCarrier) %>%
summarise(n = n()) %>%
mutate(rank = rank(desc(n))) %>%
filter(rank == 1)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment