RoBERTa + CSPT (single model)
We first train a generation model to generate synthetic data from ConceptNet. We then build the commonsense pre-trained model by finetuning RoBERTa-large model on the synthetic data and Open Mind Common Sense (OMCS) corpus. The final model is finetuned from the pretrained commonsense model on CSQA.
Commonsense Pre-training:
- epochs: 5
- maximum sequence length: 35
- learning rate: 3e-5
Finetuning on CSQA:
- epochs: 5
- maximum sequence length: 80
- batch size: 8
- learning rate: 8e-6
The best model achieves 76.2% accuracy on the dev set and 69.6% on test.