Sam Shleifer sshleifer

## apps.md

      
              1 file
            
          
              1 fork
            
          
              0 comments
            
          
              7 stars
            
          
                sshleifer
                / apps.md
            
            
              Last active
              September 1, 2023 15:12
            
              
                My Favorite apps and workflow stuff (for mac/iOS/python)
              
          
    Mac


Spectacle


Rescuetime


Self Control


iTerm2


Fluid made standalone Gmail, Trello apps for cmd-tab


iStat


Alfred


## keybindings.json
// Place your key bindings in this file to override the defaults
[
    // Switching between editor and terminal
    {
        "key": "ctrl+j",
        "command": "workbench.action.terminal.focus",
        "when": "editorFocus || !editorIsOpen"
    },
    {
        "key": "ctrl+j",

## keybindings.json
// Place your key bindings in this file to override the defaults
[
    // Switching between editor and terminal
    {
        "key": "ctrl+j",
        "command": "workbench.action.terminal.focus",
        "when": "editorFocus || !editorIsOpen"
    },
    {
        "key": "ctrl+j",

## bart_layernorm.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              1 star
            
          
                sshleifer
                / bart_layernorm.md
            
            
              Last active
              March 25, 2022 02:58
            
          
    How BartConfig controls when LayerNorm is applied

6 groups of models inherit from BartForConditionalGeneration.
The major differences between them are:

pretraining objective & data
finetuning objective & data
number of layers and dimension of each layer
when layernorm is applied

This document focuses on layernorm timing.

  
## split_linear.py
import torch
import torch.nn.functional as F

d = 8
seq_len = 13
bs = 1
wt = torch.rand((d, d))
x = torch.rand((seq_len, bs, d))
x_r0, x_r1 = x[:,:, :d//2], x[:,:, d//2:]
wt_r0, wt_r1 = wt[:, :d//2], wt[:, d//2:]

## naman_cmd.sh
git fetch
git checkout paper-v2
export SD=/data/users/sshleifer/fairseq-py/roberta_azure
train_roberta_base () {
  export NCCL_DEBUG="warn"
  ./fb_sweep/bmr.py -g 8 -t 1 -n 8  --dl 12 --embed-dim 768 \
    --bs 32 --li 50 --epg 0 --mu 2000000 --ebs 2048  --arch prenorm \
    --resume-failed --nw 0 -p bl \
     --opt adam --local-checkpoints-dir $SD --checkpoints-dir $SD --use-fused-softmax \
    --ddp fully_sharded "$@"

## srun_workflow.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / srun_workflow.md
            
            
              Last active
              August 5, 2021 20:19
            
          
    The way I test things quickly with srun:
(1) on devfair:
srun --gres=gpu:8 --partition=devaccel --nodes=1 --cpus-per-task 64 \
    --ntasks-per-node 1 --mem=400G --constraint volta32gb \
    --time="2-00:00:00" --pty /bin/zsh -l
(2) on the resultant shell:

  
## remove_opt_state_instructions.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / remove_opt_state_instructions.md
            
            
              Last active
              August 5, 2021 20:11
            
          
remove optimizer state and save to $HOME
for example:

MODEL_DIR=/large_experiments/xlmg/models/moe/52B/xlmg.52b.fp16.bm_none.tps2048.transformer_lm_gpt2_bigger.dl24.demb1024.dffn4096.moe_w0.01.all.share.adam.b2_0.98.eps1e-08.cl0.0.lr0.0003.sqrt_world_size.wu715.dr0.0.atdr0.0.wd0.01.ms2.uf1.mu572204.s1.ngpu128

python scripts/remove_opt_state.py \
 $MODEL_DIR/checkpoint_1_105000/checkpoint_1_105000 \

  
## adam8bit_fair_usage.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / adam8bit_fair_usage.md
            
            
              Last active
              July 28, 2021 22:02
            
              
                How to use adam8bit
              
          
    Setup

To use it on the fair cluster gshard branch, you need the following dependencies: (from inside fairseq env, assuming cuda 11.0)
pip install -i https://test.pypi.org/simple/ bitsandbytes-cuda110 -U
pip install -U fairscale

WARNING: if you dont do this step your checkpoints will not be usable!

  
## model_param_math.md

      
              1 file
            
          
              0 forks
            
          
              0 comments
            
          
              0 stars
            
          
                sshleifer
                / model_param_math.md
            
            
              Last active
              July 28, 2021 15:41
            
          
    Results

Params 209,190,912. Fraction Embedding: 19%
Params 265,814,016. Fraction Embedding: 15%
Params 354,418,688. Fraction Embedding: 15%
Params 455,081,984. Fraction Embedding: 12%
Params 1,312,817,152. Fraction Embedding: 8%
Params 1,715,470,336. Fraction Embedding: 6%
Params 2,875,195,392. Fraction Embedding: 5%
	// Place your key bindings in this file to override the defaults
	[
	// Switching between editor and terminal
	{
	"key": "ctrl+j",
	"command": "workbench.action.terminal.focus",
	"when": "editorFocus \|\| !editorIsOpen"
	},
	{
	"key": "ctrl+j",
	import torch
	import torch.nn.functional as F

	d = 8
	seq_len = 13
	bs = 1
	wt = torch.rand((d, d))
	x = torch.rand((seq_len, bs, d))
	x_r0, x_r1 = x[:,:, :d//2], x[:,:, d//2:]
	wt_r0, wt_r1 = wt[:, :d//2], wt[:, d//2:]
	git fetch
	git checkout paper-v2
	export SD=/data/users/sshleifer/fairseq-py/roberta_azure
	train_roberta_base () {
	export NCCL_DEBUG="warn"
	./fb_sweep/bmr.py -g 8 -t 1 -n 8 --dl 12 --embed-dim 768 \
	--bs 32 --li 50 --epg 0 --mu 2000000 --ebs 2048 --arch prenorm \
	--resume-failed --nw 0 -p bl \
	--opt adam --local-checkpoints-dir $SD --checkpoints-dir $SD --use-fused-softmax \
	--ddp fully_sharded "$@"