Skip to content

Instantly share code, notes, and snippets.

@rikturr
Created July 21, 2020 14:30
Show Gist options
  • Save rikturr/1635e22721bcb6d1bb5805c9b0298303 to your computer and use it in GitHub Desktop.
Save rikturr/1635e22721bcb6d1bb5805c9b0298303 to your computer and use it in GitHub Desktop.
pandas create features
taxi['pickup_weekday'] = taxi.tpep_pickup_datetime.dt.weekday
taxi['pickup_weekofyear'] = taxi.tpep_pickup_datetime.dt.weekofyear
taxi['pickup_hour'] = taxi.tpep_pickup_datetime.dt.hour
taxi['pickup_minute'] = taxi.tpep_pickup_datetime.dt.minute
taxi['pickup_year_seconds'] = (taxi.tpep_pickup_datetime - datetime.datetime(2019, 1, 1, 0, 0, 0)).dt.seconds
taxi['pickup_week_hour'] = (taxi.pickup_weekday * 24) + taxi.pickup_hour
taxi['passenger_count'] = taxi.passenger_count.astype(float).fillna(-1)
taxi = taxi.fillna(value={'VendorID': 'missing', 'RatecodeID': 'missing', 'store_and_fwd_flag': 'missing' })
# keep track of column names for pipeline steps
numeric_feat = ['pickup_weekday', 'pickup_weekofyear', 'pickup_hour', 'pickup_minute', 'pickup_year_seconds', 'pickup_week_hour', 'passenger_count']
categorical_feat = ['VendorID', 'RatecodeID', 'store_and_fwd_flag', 'PULocationID', 'DOLocationID']
features = numeric_feat + categorical_feat
y_col = 'total_amount'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment