Skip to content

Instantly share code, notes, and snippets.

@dottyz
Created May 2, 2019 18:29
Show Gist options
  • Save dottyz/46e5f7270f50bd92e59a40fe7870a854 to your computer and use it in GitHub Desktop.
Save dottyz/46e5f7270f50bd92e59a40fe7870a854 to your computer and use it in GitHub Desktop.
# Identify the date structure used by each of the files as a dict:
# * Key: data file name
# * Value: [datetime format, hour difference bewteen timezone used and Eastern timezone]
date_formats = {
'Bikeshare Ridership (2017 Q1).csv': ['%d/%m/%Y %H:%M', -4],
'Bikeshare Ridership (2017 Q2).csv': ['%d/%m/%Y %H:%M', -4],
'Bikeshare Ridership (2017 Q3).csv': ['%m/%d/%Y %H:%M', 0],
'Bikeshare Ridership (2017 Q4).csv': ['%m/%d/%y %H:%M:%S', 0],
}
df = pd.DataFrame() # Initiate an empty DataFrame
for fn, fmt in date_formats.items():
tmp = pd.read_csv(os.path.join('./data', fn))
# Read the datetime in the specified format
tmp['trip_start_time'] = pd.to_datetime(tmp['trip_start_time'], format=fmt[0], errors='coerce')
# Convert the input time to the Easter timezone
tmp['trip_start_time'] = tmp['trip_start_time'] + timedelta(hours=fmt[1])
df = pd.concat([df, tmp], sort=False).reset_index(drop=True)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment