Skip to content

Instantly share code, notes, and snippets.

@jmaupetit
Created October 6, 2016 15:57
Show Gist options
  • Star 2 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jmaupetit/7b6250a06de5a322c278c3c7811ef063 to your computer and use it in GitHub Desktop.
Save jmaupetit/7b6250a06de5a322c278c3c7811ef063 to your computer and use it in GitHub Desktop.
dateparser.parse vs dateutil.parser.parse
#!/usr/bin/env python3
"""Compare (fuzzy) dateutils vs dateparser `parse` methods"""
import sys
from dateparser import parse as dp_parse
from datetime import datetime, timedelta
from dateutil.parser import parse as du_parse
NOW = datetime.now()
DP_SETTINGS = {
'RELATIVE_BASE': NOW,
}
EXPECTED_DATETIME = datetime(year=2016, month=9, day=1)
DATASET = (
# (query, expected)
('2016/09/01', EXPECTED_DATETIME),
('2016-09-01', EXPECTED_DATETIME),
('09/01/2016', EXPECTED_DATETIME),
('09-01-2016', EXPECTED_DATETIME),
('09012016', EXPECTED_DATETIME),
('09/01/2016 15:20', EXPECTED_DATETIME.replace(hour=15, minute=20)),
('09/01/2016 at 15h20', EXPECTED_DATETIME.replace(hour=15, minute=20)),
('15 min ago', NOW - timedelta(minutes=15)),
('two hours ago', NOW - timedelta(hours=2)),
('a day ago', NOW - timedelta(days=1)),
('tuesday', (
NOW.replace(hour=0, minute=0, second=0, microsecond=0) - \
timedelta(days=(NOW.weekday() - 1)))),
('monday at noon', (
NOW.replace(hour=12, minute=0, second=0, microsecond=0) - \
timedelta(days=NOW.weekday()))),
)
def is_equal(time1, time2):
return time1 == time2
def parse(parser, query, expected, **options):
try:
result = parser(query, **options)
except:
return 0
if result and is_equal(result, expected):
return 1
return 0
def bench(dataset):
du_scores = []
dp_scores = []
template = '| {:25} | {:>10} | {:>10} |'
separator = template.format('-' * 25, '-' * 10, '-' * 10)
print(template.format('query', 'dateutil', 'dateparser'))
print(separator)
for query, expected in dataset:
du_score = parse(du_parse, query, expected, fuzzy=True)
dp_score = parse(dp_parse, query, expected, settings=DP_SETTINGS)
du_scores.append(du_score)
dp_scores.append(dp_score)
print(template.format(query, du_score, dp_score))
print(separator)
print(template.format(
'total ({})'.format(len(du_scores)),
sum(du_scores),
sum(dp_scores))
)
def main():
bench(DATASET)
return 0
if __name__ == '__main__':
sys.exit(main() or 0)
@amnonkhen
Copy link

Hi,
Nice benchmark code.
I found that dateparser is much slower ~8x that dateutil.parser.
Did you notice the same?
Sincerely,
Amnon

@jmaupetit
Copy link
Author

TBH I don't remember...

@megz15
Copy link

megz15 commented Nov 2, 2020

Dateparser is noticeably slower than dateutil when parsing a lot of datestrings, and even if dateparser parses fuzzy strings more accurately the speed is a drawback and it's better to use dateutil parser imo

@pklaus
Copy link

pklaus commented Nov 26, 2020

@megz15. Depends on the use case, I guess:

  • If you need to parse a large text file with many dates, you may care about speed/performance.
  • If you write a CLI with --start and --end parameters, you may want the best experience for your users, even if it takes a ms more to parse.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment