Sam Shleifer sshleifer

## keybindings.json
// Place your key bindings in this file to override the defaults
[
    // Switching between editor and terminal
    {
        "key": "ctrl+j",
        "command": "workbench.action.terminal.focus",
        "when": "editorFocus || !editorIsOpen"
    },
    {
        "key": "ctrl+j",

## keybindings.json
// Place your key bindings in this file to override the defaults
[
    // Switching between editor and terminal
    {
        "key": "ctrl+j",
        "command": "workbench.action.terminal.focus",
        "when": "editorFocus || !editorIsOpen"
    },
    {
        "key": "ctrl+j",

## split_linear.py
import torch
import torch.nn.functional as F

d = 8
seq_len = 13
bs = 1
wt = torch.rand((d, d))
x = torch.rand((seq_len, bs, d))
x_r0, x_r1 = x[:,:, :d//2], x[:,:, d//2:]
wt_r0, wt_r1 = wt[:, :d//2], wt[:, d//2:]

## naman_cmd.sh
git fetch
git checkout paper-v2
export SD=/data/users/sshleifer/fairseq-py/roberta_azure
train_roberta_base () {
  export NCCL_DEBUG="warn"
  ./fb_sweep/bmr.py -g 8 -t 1 -n 8  --dl 12 --embed-dim 768 \
    --bs 32 --li 50 --epg 0 --mu 2000000 --ebs 2048  --arch prenorm \
    --resume-failed --nw 0 -p bl \
     --opt adam --local-checkpoints-dir $SD --checkpoints-dir $SD --use-fused-softmax \
    --ddp fully_sharded "$@"

## remove_opt_state_instructions.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / remove_opt_state_instructions.md
            
            
              Last active
              August 5, 2021 20:11
            
          
remove optimizer state and save to $HOME
for example:

MODEL_DIR=/large_experiments/xlmg/models/moe/52B/xlmg.52b.fp16.bm_none.tps2048.transformer_lm_gpt2_bigger.dl24.demb1024.dffn4096.moe_w0.01.all.share.adam.b2_0.98.eps1e-08.cl0.0.lr0.0003.sqrt_world_size.wu715.dr0.0.atdr0.0.wd0.01.ms2.uf1.mu572204.s1.ngpu128

python scripts/remove_opt_state.py \
 $MODEL_DIR/checkpoint_1_105000/checkpoint_1_105000 \

  
## srun_workflow.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / srun_workflow.md
            
            
              Last active
              August 5, 2021 20:19
            
          
    The way I test things quickly with srun:
(1) on devfair:
srun --gres=gpu:8 --partition=devaccel --nodes=1 --cpus-per-task 64 \
    --ntasks-per-node 1 --mem=400G --constraint volta32gb \
    --time="2-00:00:00" --pty /bin/zsh -l
(2) on the resultant shell:

  
## adam8bit_fair_usage.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / adam8bit_fair_usage.md
            
            
              Last active
              July 28, 2021 22:02
            
              
                How to use adam8bit
              
          
    Setup

To use it on the fair cluster gshard branch, you need the following dependencies: (from inside fairseq env, assuming cuda 11.0)
pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U
pip install -U fairscale

WARNING: if you dont do this step your checkpoints will not be usable!

  
## model_param_math.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / model_param_math.md
            
            
              Last active
              July 28, 2021 15:41
            
          
    Results

Params 209,190,912. Fraction Embedding: 19%
Params 265,814,016. Fraction Embedding: 15%
Params 354,418,688. Fraction Embedding: 15%
Params 455,081,984. Fraction Embedding: 12%
Params 1,312,817,152. Fraction Embedding: 8%
Params 1,715,470,336. Fraction Embedding: 6%
Params 2,875,195,392. Fraction Embedding: 5%


## optim_cmds.md

      
              1 file
            
          
              0 forks
            
          
              1 comment
            
          
              0 stars
            
          
                sshleifer
                / optim_cmds.md
            
            
              Last active
              July 22, 2021 23:39
            
              
                gshard optimizer expeiment cmds
              
          
    Setup


git clone git@github.com:fairinternal/fairseq-py.git && cd fairseq-py && git checkout stable-emb
if you don't have the fairseq conda env, follow these instructions
pip install numpy==1.20. (optional, but some people needed this)
pip install fairscale (should be > 0.3.7, as of writing)
on FAIR cluster: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U)
OR on AWS: pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda111 -U)

Common Logic for all commands

Edit this as needed

  
## sharded_data_doc.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / sharded_data_doc.md
            
            
              Last active
              April 15, 2021 09:11
            
              
                Construct+Use sharded dataset in fairseq
              
          
    Constructing a sharded dataset


cat all your raw text into one huge file in /scratch/
run your favorite bpe on that file (20mins for 160GB with 20 workers), writing the result to /scratch.

Then we do some filtering of newlines
grep -A1 . /scratch/rc_train_big.bpe | grep -v "^--$" > /scratch/rc.filtered.train.bpe
	// Place your key bindings in this file to override the defaults
	[
	// Switching between editor and terminal
	{
	"key": "ctrl+j",
	"command": "workbench.action.terminal.focus",
	"when": "editorFocus \|\| !editorIsOpen"
	},
	{
	"key": "ctrl+j",
	import torch
	import torch.nn.functional as F

	d = 8
	seq_len = 13
	bs = 1
	wt = torch.rand((d, d))
	x = torch.rand((seq_len, bs, d))
	x_r0, x_r1 = x[:,:, :d//2], x[:,:, d//2:]
	wt_r0, wt_r1 = wt[:, :d//2], wt[:, d//2:]
	git fetch
	git checkout paper-v2
	export SD=/data/users/sshleifer/fairseq-py/roberta_azure
	train_roberta_base () {
	export NCCL_DEBUG="warn"
	./fb_sweep/bmr.py -g 8 -t 1 -n 8 --dl 12 --embed-dim 768 \
	--bs 32 --li 50 --epg 0 --mu 2000000 --ebs 2048 --arch prenorm \
	--resume-failed --nw 0 -p bl \
	--opt adam --local-checkpoints-dir $SD --checkpoints-dir $SD --use-fused-softmax \
	--ddp fully_sharded "$@"