Skip to content

Instantly share code, notes, and snippets.

@mapmeld
Last active Jan 4, 2021
Embed
What would you like to do?
Bangla Benchmark runs

Code: https://colab.research.google.com/drive/1vltPI81atzRvlALv4eCvEB0KdFoEaCOb?usp=sharing

Can these scores be improved? YES!

Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.

  • Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)

The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.

Bolding #1 score and other models within 1 percentage point of winner:

Model +/- Sentiment Hate Speech News Topic
random 50.0 20.0 16.7
mBERT 68.1 52.3 72.3
Bangla-ELECTRA 69.2 31.0 82.3
Bangla-BERT 70.4 71.8 89.2
neuralspace-reverie 68.6 73.1 88.9
Indic-BERT 71.2 42.1 88.4
MuRIL 69.5 72.1 88.9

Revised hate speech csv / split

Model Hate Speech v2
random 16.7
mBERT 50.9
Bangla-ELECTRA 34.3
Bangla-BERT 69.1
neuralspace-reverie 76.3
Indic-BERT 59.1
MuRIL 62.0
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment