Skip to content

Instantly share code, notes, and snippets.

@tomsherborne
Created August 6, 2020 11:33
Show Gist options
  • Save tomsherborne/ab1a5a28f9d843cf633d6f7843e96a63 to your computer and use it in GitHub Desktop.
Save tomsherborne/ab1a5a28f9d843cf633d6f7843e96a63 to your computer and use it in GitHub Desktop.
source ~/miniconda3/bin/activate allen
LANG=en
TASK=qa_en_small
for SPLIT in train valid
do
python -m examples.roberta.multiprocessing_bpe_encoder \
--encoder-json encoder.json \
--vocab-bpe vocab.bpe \
--inputs "$TASK/$SPLIT.$LANG" \
--outputs "$TASK/$SPLIT.bpe.$LANG" \
--workers 8 \
--keep-empty;
done
DESTDIR="${TASK}-bin/"
rm -rf $DESTDIR
fairseq-preprocess \
--source-lang "en" \
--trainpref "${TASK}/train.bpe" \
--validpref "${TASK}/valid.bpe" \
--destdir ${DESTDIR} \
--workers 8 \
--srcdict dict.txt \
--joined-dictionary \
--only-source
cp dict.txt qa_en_small-bin/dict.txt
rename s/en-None.en.// qa_en_small-bin/*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment