Skip to content

Instantly share code, notes, and snippets.

View algorythmik's full-sized avatar

Mojtaba Tabatabaeipour algorythmik

View GitHub Profile
import os
import argparse
from gensim.corpora import WikiCorpus
import tqdm
def convert_wiki_dump_to_txt(input_file, output_file):
"""Converts Wikipedia xml dump file to text corpus."""
with open(output_file, "w") as out_f:
### Keybase proof
I hereby claim:
* I am algorythmik on github.
* I am mojtabataba (https://keybase.io/mojtabataba) on keybase.
* I have a public key ASDj1LqTVT3O-P4X9Rfis9Tu2shN78JjJr46OBqi01YZrwo
To claim this, I am signing this object: