Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
pandas' read_csv parse_dates vs explicit date conversion
# When you're sure of the format, it's much quicker to explicitly convert your dates than use `parse_dates`
# Makes sense; was just surprised by the time difference.
import pandas as pd
from datetime import datetime
to_datetime = lambda d: datetime.strptime(d, '%m/%d/%Y %H:%M')
%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', parse_dates=['starttime', 'stoptime'])
# CPU times: user 1min 29s, sys: 331 ms, total: 1min 29s
# Wall time: 1min 30s
%time trips = pd.read_csv('data/divvy/Divvy_Trips_2013.csv', converters={'starttime': to_datetime, 'stoptime': to_datetime})
# CPU times: user 17.6 s, sys: 269 ms, total: 17.9 s
# Wall time: 17.9 s
# $ wc -l divvy/Divvy_Trips_2013.csv
# 759789 divvy/Divvy_Trips_2013.csv
@DanGolding
Copy link

DanGolding commented Apr 13, 2018

Have you tried:

df = pd.read_csv('data/divvy/Divvy_Trips_2013.csv')
df['starttime'] = df['starttime'].astype('datetime64[ns]')

It's much faster than using a converter in my case

@zkhodzhaev
Copy link

zkhodzhaev commented Mar 29, 2019

df['starttime'] = df['starttime'].astype('datetime64[ns]')

what "ns" means here ?

@jumptable
Copy link

jumptable commented Apr 6, 2019

Nanoseconds (as an offset from the Unix epoch I think).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment