Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save shahbazsyed/6f5b43a7ad29139451472cfe64d5e61d to your computer and use it in GitHub Desktop.
Save shahbazsyed/6f5b43a7ad29139451472cfe64d5e61d to your computer and use it in GitHub Desktop.
source ~/miniconda3/bin/activate allen
LANG=en
TASK=qa_en_small
for SPLIT in train valid
do
python -m examples.roberta.multiprocessing_bpe_encoder \
--encoder-json encoder.json \
--vocab-bpe vocab.bpe \
--inputs "$TASK/$SPLIT.$LANG" \
--outputs "$TASK/$SPLIT.bpe.$LANG" \
--workers 8 \
--keep-empty;
done
DESTDIR="${TASK}-bin/"
rm -rf $DESTDIR
fairseq-preprocess \
--source-lang "en" \
--trainpref "${TASK}/train.bpe" \
--validpref "${TASK}/valid.bpe" \
--destdir ${DESTDIR} \
--workers 8 \
--srcdict dict.txt \
--joined-dictionary \
--only-source
cp dict.txt qa_en_small-bin/dict.txt
rename s/en-None.en.// qa_en_small-bin/*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment