jdsgomes/DeepLearningSpeedAndCompression.md

## DeepLearningSpeedAndCompression.md

      
    Raw
  

              DeepLearningSpeedAndCompression.md
            
          
    Speed Improvements and Compression for Deep Learning

Notes


Quantized Convolutional Neural Networks for Mobile Devices CVPR 2016
Deep SimNets CVPR 2016
Quantized Convolutional Neural Networks for Mobile Devices (2016)

Quantization is applied to both convolutional and fully connected layers. This method has the advantage of accelerating the convolutional layers runtime, which is very important given that those are the most computationally expensive layers in CNNs. The disadvantage of this method is that it leads to a small loss in accuracy.


SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (2016)

Use 1x1 convolutions to reduce the number of convolution maps. I.e. if we have 10 traditional convolution filters this will generate 10 feature maps. If then we have N convolutions (where N < 10) the result will be a reduced number of feature maps. They propose a mini architecture of several 1x1 convolutions (squeeze) followed by several 1x1 and 3x3 convolutions (expand). This mini architecture is repeated several times through the network. Furthermore they use Song Han's compression methods (prunning + quantization) to reduce the number of parameters. They have also shown that by compressing and then de-compressing and re-trainning the from the de-compressed model actually improves the network accuracy.


Deep  compression: compressing deep neural networks with pruning, trained quantisation and huffmann coding (2016)

Very nice paper and easy to understand technique for reducing the number and size of parameters in a CNN. There are three main steps that take place in the process. First the iterative pruning an re-training reduces the number of connections. Then the quantization of the weights reduce the size of each parameter to a few bits. Finally the parameters are encoded using Huffman coding which uses a larger number of bits to encode weights and indexes weights that occur less often and a lower number of bit for the most frequent ones.


Deep Fried Convnets (2015)

An approach to reduce both the storage and computational cost of the fully connected layers in neural networks is presented. This reduction is obtained by re-parameterizing the matrix of parameters W (weights) connecting the layer l to l+1. Considering that l has d activations and l+1 has n activations the expected computational and storage costs are O(nd), while with this approach the storage cost is reduced to O(n) and the computational cost is reduced to O(d). The re-parameterization is called Adaptive Fastfood Transform and takes advantage of the Fast Hadamard Transform to replace the weights matrix W by a multiplication of several smaller matrices that can can be learned with standard back-propagation.


Compressing Deep Convolutional Networks using Vector Quantization

Uses vector quantization to reduce the size of the network 16 to 24 times, but at a cost of 1% loss of classification accuracy.


Predicting Parameters in Deep Learning (2014)

Pre-learn or predict U in which the columns are basis functions parametrized by V (W = UV).


Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation (2014)

They use a less advanced model than Han et al. (2016) thus achieving a smaller compression and speed up while reducing the final accuracy by 1%.