Can these scores be improved? YES!
Rerunning with more training data, more epochs of training, or using other libraries to set a learning rate / other hyperparameters before training.
- Experimenting with epochs - when I doubled the number of epochs, MuRIL improves only slightly (69.5->69.7 on one task)
The point of a benchmark is to run these models through a reasonable and identical process; you can tweak hyperparameters on any model to improve results.