Skip to content

Instantly share code, notes, and snippets.

View maulberto3's full-sized avatar

Mauricio maulberto3

View GitHub Profile
@maulberto3
maulberto3 / run_mlm_big_text_files.py
Created June 8, 2023 21:27 — forked from finiteautomata/run_mlm_big_text_files.py
Train MLM with big text files (Workaround)
"""
This is a workaround for `examples/run_mlm.py` for pretraining models
with big text files line-by-line.
For the time being, `datasets` is facing some issues dealing with really
big text files, so we use a custom dataset until this is fixed.
August 3th 2021
Author: Juan Manuel Pérez