Sam Shleifer sshleifer

## anki_setup.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sshleifer
                / anki_setup.md
            
            
              Created
              March 6, 2021 19:27
            
              
                Anki Setup
              
          
    Plugins:

Time remaining 1508357010 https://ankiweb.net/shared/info/1508357010
Customize keyboard shortcuts 24411424 https://ankiweb.net/shared/info/24411424
I set editor cloze to Ctrl-shift-m (ctrl =cmd on mac) m stands for "mask"


## time_dbart_generate.py
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
import time
from tqdm import tqdm
from pathlib import Path
import pandas as pd


models = ['sshleifer/distilbart-cnn-12-3',
          'sshleifer/distilbart-cnn-12-6',
          'sshleifer/distilbart-cnn-6-6',

## latex_style.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sshleifer
                / latex_style.md
            
            
              Created
              October 19, 2020 15:19
            
              
                Sasha's latex style rules
              
          
    Avoid:

[!h] for figures/tables.
two datasets in one 1plot
NameError introducing terms that haven't been defined.


## download_summ_data.py
from pathlib import Path

import fire
from tqdm import tqdm


DS_TO_KEY = {
    'gigaword': ('document', 'summary'),
    'xsum': ('document', 'summary'),
    'aeslc': ('email_body', 'subject_line'),

## download_summ_data.py
from pathlib import Path

import fire
from tqdm import tqdm


DS_TO_KEY = {
    'gigaword': ('document', 'summary'),
    'xsum': ('document', 'summary'),
    'aeslc': ('email_body', 'subject_line'),

## bart_layernorm.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              1 star
            
          
                sshleifer
                / bart_layernorm.md
            
            
              Last active
              March 25, 2022 02:58
            
          
    How BartConfig controls when LayerNorm is applied

6 groups of models inherit from BartForConditionalGeneration.
The major differences between them are:

pretraining objective & data
finetuning objective & data
number of layers and dimension of each layer
when layernorm is applied

This document focuses on layernorm timing.

  
## stas_cp_commands.sh
export b="s3://models.huggingface.co/bert"
stas_to_fb () {
	src=$1
	shift
	aws s3 sync $b/stas/$src $b/facebook/$src $@
}

stas_to_allenai () {
	src=$1
	shift

## dynb.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sshleifer
                / dynb.md
            
            
              Last active
              September 9, 2020 19:25
            
          
    Problem:


In WMT datasets, there is wide variation in the length of examples. Some are one sentence. Some are 10 sentences.
The max batch size that can fit on a v100 is roughly (4, 512)
you end up with lots of batches of shape (4, 12) or (4, small_int) which don't fully utilize the GPU.

Dynamic Batch Size: try to organize batches to be 4*512=2048 tokens, one batch might be shaped (4,512) another (32, 64).
Details of Fairseq Solution:


## finetune_pegasus_xsum.sh
python finetune.py \
  --task summarization \
  --learning_rate=3e-4 \
  --do_train \
  --do_predict \
  --val_check_interval 0.25 --n_val 1000 \
  --data_dir xsum \
  --max_source_length 512 --max_target_length=56 \
  --freeze_embeds \
  --model_name_or_path google/pegasus-large \

## mbart_lb.md

      
              1 file
            
          
              0 forks
            
          
                0 comments
              
            
              0 stars
            
          
                sshleifer
                / mbart_lb.md
            
            
              Last active
              September 2, 2020 15:13
            
              
                Experiment results distilling mbart-large-en-ro and finetuning mbart-large-cc-25
              
          
    for file in ls */*bleu.json
do
   echo "$file:"
   cat "$file" | sed -n '/^\s*$/!{p;q}' 
   echo  "------"
done
enro test bleu (distil-mbart unless otherwise specified, before post processing).
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
	import time
	from tqdm import tqdm
	from pathlib import Path
	import pandas as pd


	models = ['sshleifer/distilbart-cnn-12-3',
	'sshleifer/distilbart-cnn-12-6',
	'sshleifer/distilbart-cnn-6-6',
	from pathlib import Path

	import fire
	from tqdm import tqdm


	DS_TO_KEY = {
	'gigaword': ('document', 'summary'),
	'xsum': ('document', 'summary'),
	'aeslc': ('email_body', 'subject_line'),
	export b="s3://models.huggingface.co/bert"
	stas_to_fb () {
	src=$1
	shift
	aws s3 sync $b/stas/$src $b/facebook/$src $@
	}

	stas_to_allenai () {
	src=$1
	shift
	python finetune.py \
	--task summarization \
	--learning_rate=3e-4 \
	--do_train \
	--do_predict \
	--val_check_interval 0.25 --n_val 1000 \
	--data_dir xsum \
	--max_source_length 512 --max_target_length=56 \
	--freeze_embeds \
	--model_name_or_path google/pegasus-large \