commonsensepretraining/csqa.md

## csqa.md

      
    Raw
  

              csqa.md
            
          
    Model Name:

RoBERTa + CSPT (single model)
Model Description:

We first train a generation model to generate synthetic data from ConceptNet. We then build the commonsense pre-trained model by finetuning RoBERTa-large model on the synthetic data and Open Mind Common Sense (OMCS) corpus. The final model is finetuned from the pretrained commonsense model on CSQA.
Experiment Details:

Commonsense Pre-training:

epochs: 5
maximum sequence length: 35
learning rate: 3e-5

Finetuning on CSQA:

epochs: 5
maximum sequence length: 80
batch size: 8
learning rate: 8e-6

The best model achieves 76.2% accuracy on the dev set and 69.6% on test.