Skip to content

Instantly share code, notes, and snippets.

@peterjliu
Last active April 25, 2023 18:03
Show Gist options
  • Star 33 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save peterjliu/f0dc9152a630520dc604c783db963aa7 to your computer and use it in GitHub Desktop.
Save peterjliu/f0dc9152a630520dc604c783db963aa7 to your computer and use it in GitHub Desktop.
@peterjliu
Copy link
Author

Hi folks, check out the updated link now. Thanks for the patience.

@peterjliu
Copy link
Author

@rafaelbou @sai-prasanna @vedant @SHohentanner @leisurehippo @coventry @cyberandy @Diego999 @Legalamb77 @tfmorris

Mentioning folks who specifically expressed interest here.

@Diego999
Copy link

@peterjliu

Thank you for the share. I was wondering if it would be possible to store the preprocessed datasets on a local computer (after the preprocessing on the cloud) of it is too large ? Do you have an estimate of the necessary space ? 10 GB ? 100 GB ? 1 TB ?

Thank you for your help !

@nlothian
Copy link

This looks really useful. I noticed that the pre-processed vocabs seem to be available in the gs://tensor2tensor-data/ bucket too (vocab.wikisum_commoncrawl.32768 and vocab.wikisum_web.32768)

The TODO says you release the hparams_set, which would be great, but can I request a pre-trained model release too?

@hoang-ho
Copy link

hoang-ho commented Oct 6, 2018

Dear all,

Is there any available pre-trained model released for this wikisum problem? If there is, may I have the link to that pre-trained model?

Thank you so much

@coventry
Copy link

Thanks for linking that, @peterjliu. Am I reading the README.md correctly, here, that training uses a full transformer architecture, rather than a decoder-only architecture with memory-compressed attention?


Training

TODO(rsepassi): Put actual results achieved on wikisum_web and/or
wikisum_commoncrawl and with what hparams_set.

PROBLEM=wikisum_web  # or wikisum_commoncrawl
t2t-trainer \
  --problem=$PROBLEM \
  --model=transformer \
  --hparams_set=transformer_base \
  --train_steps=250000 \
  --eval_steps=100 \
  --data_dir=$DATA_DIR \
  --output_dir=$TRAIN_DIR

@rfdearborn
Copy link

Does anyone have processed training examples (i.e., the output of step 3 here) available to share? I'm having trouble getting GCP to release IP addresses for data generation, so I'm hoping to be able to bypass this for the time being...

Also, as @nlothian and @hoang-ho have asked, are pre-trained model weights available anywhere?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment