Skip to content

Instantly share code, notes, and snippets.

View jfilter's full-sized avatar

Johannes Filter jfilter

View GitHub Profile
@jfilter
jfilter / ft_wiki_preproc.py
Last active May 29, 2018 22:54 — forked from bittlingmayer/ft_wiki_preproc.py
fastText pre-trained vectors preprocessing [moved to ftio.wiki.preproc - pip install ftio / https://github.com/SignalN/ftio]
# Taken from: https://gist.github.com/bittlingmayer/7139a6a75ba0dbbc3a06325394ae3a13
# See https://github.com/facebookresearch/fastText/blob/master/get-wikimedia.sh
#
# From https://github.com/facebookresearch/fastText/issues/161:
#
# We now have a script called 'get-wikimedia.sh', that you can use to download and
# process a recent wikipedia dump of any language. This script applies the preprocessing
# we used to create the published word vectors.
#