This model conducts unstructured magnitude pruning, quantization and distillation at the same time on BERT-base when finetuning on the GLUE SST2 dataset. It achieves the following results on the evaluation set:
- Torch accuracy: 0.9128
- OpenVINO IR accuracy: 0.9128
- Sparsity in transformer block linear layers: 0.80