This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"max_tokens": 150, | |
"min_agreement": 0.79, | |
"epochs": 3, | |
"eval_steps": 32, | |
"warmup_steps": 16, | |
"lr": 0.00003, | |
"seed": 42, | |
"batch_size": 16, | |
"grad_accum_steps": 4, |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import argparse | |
import json | |
import random | |
import torch | |
from torch.utils.data import Dataset | |
from transformers import AutoTokenizer, AutoModelForSequenceClassification | |
from transformers import Trainer, TrainingArguments | |
from tqdm import tqdm |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import numpy as np | |
from tqdm import tqdm | |
import tensorflow as tf | |
import tensorflow_hub as hub | |
import tensorflow_text as text | |
DEFAULT_ENCODER_PATH = "https://tfhub.dev/google/LaBSE/2" | |
DEFAULT_PREPROCESSOR_PATH = "https://tfhub.dev/google/universal-sentence-encoder-cmlm/multilingual-preprocess/2" | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
rmq: | |
image: rabbitmq:3.8.7 | |
container_name: rabbitmq | |
volumes: | |
- /data/rabbitmq/etc/:/etc/rabbitmq/ | |
- /data/rabbitmq/data/:/var/lib/rabbitmq/ | |
- /data/rabbitmq/logs/:/var/log/rabbitmq/ | |
ports: | |
- 5672:5672 | |
- 15672:15672 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
from tqdm import tqdm | |
from transformers import BartForConditionalGeneration, MBartTokenizer | |
def chunks(lst, n): | |
"""Yield successive n-sized chunks from lst.""" | |
for i in range(0, len(lst), n): | |
yield lst[i : i + n] | |
def generate_summaries(examples, model_name, batch_size): |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
from transformers import ( | |
BartConfig, | |
BartForConditionalGeneration | |
) | |
def remove_ignore_keys_(state_dict): | |
ignore_keys = [ | |
"encoder.version", | |
"decoder.version", |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
BART_PATH="mbart.cc25" | |
TASK="data" | |
rm -rf "${TASK}-bin/" | |
fairseq-preprocess \ | |
--source-lang "source" \ | |
--target-lang "target" \ | |
--trainpref "${TASK}/train.bpe" \ | |
--validpref "${TASK}/val.bpe" \ | |
--testpref "${TASK}/test.bpe" \ | |
--destdir "${TASK}-bin/" \ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
import torch | |
from setuptools import setup, find_packages | |
import subprocess | |
import sys | |
import warnings | |
import os | |
# ninja build does not work unless include_dirs are abs path | |
this_dir = os.path.dirname(os.path.abspath(__file__)) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
FAIRSEQ_PATH="fairseq" | |
CHECKPOINT_PATH="checkpoints/checkpoint_best.pt" | |
DATA_BIN_PATH="data-bin" | |
BART_PATH="mbart.cc25" | |
langs=ar_AR,cs_CZ,de_DE,en_XX,es_XX,et_EE,fi_FI,fr_XX,gu_IN,hi_IN,it_IT,ja_XX,kk_KZ,ko_KR,lt_LT,lv_LV,my_MM,ne_NP,nl_XX,ro_RO,ru_RU,si_LK,tr_TR,vi_VN,zh_CN | |
python3.7 "${FAIRSEQ_PATH}"/generate.py "${DATA_BIN_PATH}" --path "${CHECKPOINT_PATH}" \ | |
--task translation_from_pretrained_bart --gen-subset test -t target -s source \ | |
--bpe 'sentencepiece' --sentencepiece-vocab "${BART_PATH}/sentence.bpe.model" \ | |
--sacrebleu --max-sentences 32 --langs $langs > predicted.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
# Image: nvidia/cuda:11.7.0-cudnn8-devel-ubuntu20.04 | |
touch ~/.no_auto_tmux | |
apt-get update | |
apt-get install -y vim screen wget g++ git python3.8 python3.8-dev python3-distutils git-lfs | |
wget https://www.dropbox.com/s/cxws7jpt3nlxn2l/vimrc -O ~/.vimrc | |
wget https://www.dropbox.com/s/wbdlntxmujpat9o/screenrc -O ~/.screenrc | |
wget https://bootstrap.pypa.io/get-pip.py && python3.8 get-pip.py | |
python3.8 -m pip install torch transformers |
NewerOlder