Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing
Can these scores be improved? YES!
Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.
- Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)
The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.
Bolding #1 score and other models within 1 percentage point of winner:
Model | +/- Sentiment | Hate Speech | News Topic |
---|---|---|---|
random | 50.0 | 20.0 | 16.7 |
mBERT | 68.1 | 52.3 | 72.3 |
Bangla-ELECTRA | 69.2 | 31.0 | 82.3 |
Bangla-BERT | 70.4 | 71.8 | 89.2 |
neuralspace-reverie | 68.6 | 73.1 | 88.9 |
Indic-BERT | 71.2 | 42.1 | 88.4 |
MuRIL | 69.5 | 72.1 | 88.9 |
Revised hate speech csv / split
Model | Hate Speech v2 |
---|---|
random | 16.7 |
mBERT | 50.9 |
Bangla-ELECTRA | 34.3 |
Bangla-BERT | 69.1 |
neuralspace-reverie | 76.3 |
Indic-BERT | 59.1 |
MuRIL | 62.0 |