Skip to content

Instantly share code, notes, and snippets.

@unutbu
Created November 29, 2018 01:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save unutbu/efe6000b1eae4b51c6f9035203772f5e to your computer and use it in GitHub Desktop.
Save unutbu/efe6000b1eae4b51c6f9035203772f5e to your computer and use it in GitHub Desktop.
import pandas as pd
df = pd.read_excel('2016_Bike_Share_Toronto_Ridership_Q4.xlsx')
for col in ['trip_start_time', 'trip_stop_time']:
df[col] = pd.to_datetime(df[col])
for col in ['trip_start_time', 'trip_stop_time']:
swapped = pd.to_datetime({'year':df[col].dt.year,
'month':df[col].dt.day,
'day':df[col].dt.month,
'hour':df[col].dt.hour,
'minute':df[col].dt.minute,
'second':df[col].dt.second,}, errors='coerce')
swapped = swapped.dropna()
mask = swapped.index
df.loc[mask, col] = swapped
# check that now all dates are in 2016Q4
for col in ['trip_start_time', 'trip_stop_time']:
mask = (pd.PeriodIndex(df[col], freq='Q') == '2016Q4')
assert mask.all()
# check that `trip_start_times` are in chronological order
assert (df['trip_start_time'].diff().dropna() >= pd.Timedelta(0)).all()
# check that `trip_stop_times` are always greater than `trip_start_times`
assert ((df['trip_stop_time']-df['trip_start_time']).dropna() >= pd.Timedelta(0)).all()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment