Skip to content

Instantly share code, notes, and snippets.

@peterjliu
Last active April 25, 2023 18:03
Show Gist options
  • Star 33 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save peterjliu/f0dc9152a630520dc604c783db963aa7 to your computer and use it in GitHub Desktop.
Save peterjliu/f0dc9152a630520dc604c783db963aa7 to your computer and use it in GitHub Desktop.
@coventry
Copy link

Thanks for linking that, @peterjliu. Am I reading the README.md correctly, here, that training uses a full transformer architecture, rather than a decoder-only architecture with memory-compressed attention?


Training

TODO(rsepassi): Put actual results achieved on wikisum_web and/or
wikisum_commoncrawl and with what hparams_set.

PROBLEM=wikisum_web  # or wikisum_commoncrawl
t2t-trainer \
  --problem=$PROBLEM \
  --model=transformer \
  --hparams_set=transformer_base \
  --train_steps=250000 \
  --eval_steps=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR

@rfdearborn
Copy link

Does anyone have processed training examples (i.e., the output of step 3 here) available to share? I'm having trouble getting GCP to release IP addresses for data generation, so I'm hoping to be able to bypass this for the time being...

Also, as @nlothian and @hoang-ho have asked, are pre-trained model weights available anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment