Skip to content

Instantly share code, notes, and snippets.

@aswalin
Last active March 12, 2019 02:54
Show Gist options
  • Save aswalin/295c57f10cc3b259e760929354dfad3d to your computer and use it in GitHub Desktop.
Save aswalin/295c57f10cc3b259e760929354dfad3d to your computer and use it in GitHub Desktop.
import pandas as pd, numpy as np, time
from sklearn.model_selection import train_test_split
data = pd.read_csv("flights.csv")
data = data.sample(frac = 0.1, random_state=10)
data = data[["MONTH","DAY","DAY_OF_WEEK","AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT",
"ORIGIN_AIRPORT","AIR_TIME", "DEPARTURE_TIME","DISTANCE","ARRIVAL_DELAY"]]
data.dropna(inplace=True)
data["ARRIVAL_DELAY"] = (data["ARRIVAL_DELAY"]>10)*1
cols = ["AIRLINE","FLIGHT_NUMBER","DESTINATION_AIRPORT","ORIGIN_AIRPORT"]
for item in cols:
data[item] = data[item].astype("category").cat.codes +1
train, test, y_train, y_test = train_test_split(data.drop(["ARRIVAL_DELAY"], axis=1), data["ARRIVAL_DELAY"],
random_state=10, test_size=0.25)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment