Skip to content

Instantly share code, notes, and snippets.

@commonsensepretraining
Created August 13, 2019 06:16
Show Gist options
  • Star 3 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save commonsensepretraining/507aefddcd00f891c83ebf6936df15e8 to your computer and use it in GitHub Desktop.
Save commonsensepretraining/507aefddcd00f891c83ebf6936df15e8 to your computer and use it in GitHub Desktop.
RoBERTa + CSPT (single model)

Model Name:

RoBERTa + CSPT (single model)

Model Description:

We first train a generation model to generate synthetic data from ConceptNet. We then build the commonsense pre-trained model by finetuning RoBERTa-large model on the synthetic data and Open Mind Common Sense (OMCS) corpus. The final model is finetuned from the pretrained commonsense model on CSQA.

Experiment Details:

Commonsense Pre-training:

  • epochs: 5
  • maximum sequence length: 35
  • learning rate: 3e-5

Finetuning on CSQA:

  • epochs: 5
  • maximum sequence length: 80
  • batch size: 8
  • learning rate: 8e-6

The best model achieves 76.2% accuracy on the dev set and 69.6% on test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment