Skip to content

Instantly share code, notes, and snippets.

@yabyzq
Created August 17, 2017 13:41
Show Gist options
  • Save yabyzq/b701332324e42f5e697c680128e5f255 to your computer and use it in GitHub Desktop.
Save yabyzq/b701332324e42f5e697c680128e5f255 to your computer and use it in GitHub Desktop.
Data.table
library(data.table)
#download data
flights <- fread("https://raw.githubusercontent.com/wiki/arunsrinivasan/flights/NYCflights14/flights14.csv")
flights
#Subset
flights[origin == "JFK" & month == 6L] # by column
flights[1:2] #by row
#sort
flights[order(origin, -dest)]
#return column as vector
flights[, arr_delay]
#return column as data.table
flights[, .(arrival_delay = arr_delay)]
flights[, "arr_delay"]
#do sth with j
flights[, .(m_arr_mean = mean(arr_delay))]
flights[, sum((arr_delay + dep_delay) < 0)]
flights[, .N]#number of rows
#grouping
flights[, .N, by = origin]
flights[, .N, .(dep_delay>0, arr_delay>0)]
flights[carrier == "AA", .(mean(arr_delay), mean(dep_delay)), keyby = .(origin, dest, month)]
flights[carrier == "AA", lapply(.SD, mean), by = .(origin, dest, month), #.SD will compute on all columns
.SDcols = c("arr_delay", "dep_delay")]#select column
flights[, head(.SD, 2), by = month] #first 2 column each group
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment