- https://huzi96.github.io/compression-bench.html
- https://paperswithcode.com/paper/quantisation-and-pruning-for-neural-network
- https://github.com/yoshitomo-matsubara/torchdistill
- https://arxiv.org/pdf/2006.03669.pdf
- https://web.stanford.edu/~jurafsky/slp3/ed3book_jan122022.pdf
Probably useful but not relevant
- https://paperswithcode.com/task/neural-network-compression
- https://sites.google.com/view/vnn20?pli=1
https://arxiv.org/pdf/2210.14991.pdf
-
A Survey of Neural Network Compression, [arxiv]
Transformer-based architectures that are commonly used in NLP and CV have millions of parameters for each fully-connected layer.
-
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks [arxiv]
we use neural architecture search to design a new baseline network and scale it up to obtain a family of models, called EfficientNets, which achieve much better accuracy and efficiency than previous ConvNets. In particular, our EfficientNet-B7 achieves state-of-the-art 84.3% top-1 accuracy on ImageNet, while being 8.4x smaller and 6.1x faster on inference than the best existing ConvNet. Our EfficientNets also transfer well and achieve state-of-the-art accuracy on CIFAR-100 (91.7%), Flowers (98.8%), and 3 other transfer learning datasets, with an order of magnitude fewer parameters.
-
Source Code: Github
-
"Smaller Networks" (~5M parameters): https://github.com/tensorflow/tpu/blob/master/models/official/efficientnet/lite/README.md