Skip to content

Instantly share code, notes, and snippets.

Avatar
🏠
Working from home

Sam Shleifer sshleifer

🏠
Working from home
View GitHub Profile
View remove_opt_state_instructions.md
  1. remove optimizer state and save to $HOME for example:
MODEL_DIR=/large_experiments/xlmg/models/moe/52B/xlmg.52b.fp16.bm_none.tps2048.transformer_lm_gpt2_bigger.dl24.demb1024.dffn4096.moe_w0.01.all.share.adam.b2_0.98.eps1e-08.cl0.0.lr0.0003.sqrt_world_size.wu715.dr0.0.atdr0.0.wd0.01.ms2.uf1.mu572204.s1.ngpu128

python scripts/remove_opt_state.py \
    $MODEL_DIR/checkpoint_1_105000/checkpoint_1_105000 \
View srun_workflow.md

The way I test things quickly with srun:

(1) on devfair:

srun --gres=gpu:8 --partition=devaccel --nodes=1 --cpus-per-task 64 \
    --ntasks-per-node 1 --mem=400G --constraint volta32gb \
    --time="2-00:00:00" --pty /bin/zsh -l

(2) on the resultant shell:

@sshleifer
sshleifer / adam8bit_fair_usage.md
Last active Jul 28, 2021
How to use adam8bit
View adam8bit_fair_usage.md

Setup

To use it on the fair cluster gshard branch, you need the following dependencies: (from inside fairseq env, assuming cuda 11.0)

pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U
pip install -U fairscale

WARNING: if you dont do this step your checkpoints will not be usable!

View model_param_math.md

Results

Params 209,190,912. Fraction Embedding: 19%
Params 265,814,016. Fraction Embedding: 15%
Params 354,418,688. Fraction Embedding: 15%
Params 455,081,984. Fraction Embedding: 12%
Params 1,312,817,152. Fraction Embedding: 8%
Params 1,715,470,336. Fraction Embedding: 6%
Params 2,875,195,392. Fraction Embedding: 5%
@sshleifer
sshleifer / optim_cmds.md
Last active Jul 22, 2021
gshard optimizer expeiment cmds
View optim_cmds.md

Setup

  • git clone git@github.com:fairinternal/fairseq-py.git && cd fairseq-py && git checkout stable-emb
  • if you don't have the fairseq conda env, follow these instructions
  • pip install numpy==1.20. (optional, but some people needed this)
  • pip install fairscale (should be > 0.3.7, as of writing)
  • on FAIR cluster: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U)
  • OR on AWS: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda111 -U)

Common Logic for all commands

Edit this as needed

@sshleifer
sshleifer / sharded_data_doc.md
Last active Apr 15, 2021
Construct+Use sharded dataset in fairseq
View sharded_data_doc.md

Constructing a sharded dataset

  • cat all your raw text into one huge file in /scratch/
  • run your favorite bpe on that file (20mins for 160GB with 20 workers), writing the result to /scratch.

Then we do some filtering of newlines

grep -A1 . /scratch/rc_train_big.bpe | grep -v "^--$" > /scratch/rc.filtered.train.bpe
View anki_setup.md
View time_dbart_generate.py
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import time
from tqdm import tqdm
from pathlib import Path
import pandas as pd
models = ['sshleifer/distilbart-cnn-12-3',
'sshleifer/distilbart-cnn-12-6',
'sshleifer/distilbart-cnn-6-6',
@sshleifer
sshleifer / latex_style.md
Created Oct 19, 2020
Sasha's latex style rules
View latex_style.md

Avoid:

  • [!h] for figures/tables.
  • two datasets in one 1plot
  • NameError introducing terms that haven't been defined.
@sshleifer
sshleifer / download_summ_data.py
Created Oct 7, 2020
Fetching summarization datasets
View download_summ_data.py
from pathlib import Path
import fire
from tqdm import tqdm
DS_TO_KEY = {
'gigaword': ('document', 'summary'),
'xsum': ('document', 'summary'),
'aeslc': ('email_body', 'subject_line'),