Skip to content

Instantly share code, notes, and snippets.

View jowagner's full-sized avatar

Joachim Wagner jowagner

View GitHub Profile
@jowagner
jowagner / loading_wikipedia.py
Last active June 18, 2020 14:10 — forked from thomwolf/loading_wikipedia.py
Load full English Wikipedia dataset in HuggingFace nlp library
#!/usr/bin/env python
# based on https://gist.github.com/thomwolf/13ca2b2b172b2d17ac66685aa2eeba62
# support for --len adapted from https://gist.github.com/lhoestq/8f317e47c6f8b6bc50ef1275f655a3a3
# support for --count-spaces Joachim Wagner 2020-06-17
# support for --read-first Joachim Wagner 2020-06-18
# requirements: pip install nlp psutil six
import os; import psutil; import timeit