Skip to content

Instantly share code, notes, and snippets.

@i-amgeek
Last active November 11, 2019 10:24
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save i-amgeek/5adcec405927443881b25c1e345d4095 to your computer and use it in GitHub Desktop.
Save i-amgeek/5adcec405927443881b25c1e345d4095 to your computer and use it in GitHub Desktop.

Making sense of Network Pruning

Why are we even talking about this?

Well, Because in past years, most of AI researchers didn't talk about this. Majority was focused on increasing 1% imagenet accuracy even if it makes model size 3x (It have its own advantages). But now, we have good accuracy with models in GBs and we can't deploy them (more problematic for edge devices).

Do we have some direction to solve this issue?🙄

Umm.. Yes. While designing models, one thing researchers find particularly interesting is that most of the weights in neural networks are redundant. They don't contribute in increasing accuracy (sometimes even decrease).

So, how pruning leverage this observation?

Pruning steps

In Pruning, we rank the neurons in the network according to how much they contribute. The ranking can be done according to the L1/L2 mean of neuron weights, their mean activations, the number of times a neuron wasn’t zero on some validation set, and other creative methods. The simplest method being ranking by sorting absolute values of weights. Then remove the low ranking neurons from the network, resulting in a smaller and faster network.

Does it work?

You might not believe it but it works exceptionally well.🔥 Even simplest methods can remove 90% connections. More attentive approaches can remove even upto 95% weights without any significant accuracy loss (sometimes even gain). That's crazy I know.

Why ain't everybody using it?

Well, if you look into implementations of major Deep learning frameworks, you will find they make heavy use of GEMM operations through BLAS libraries. These libraries are very efficient in computing dense matrix multiplications. But, when, we start removing weights by pruning, it creates sparsity in matrices. Even after 90% less calculations, sparse matrices takes more time for matrix multiplication.

Can this problem be solved?

To overcome this problem, research have been splitted in mainly two directions.

  • Creating efficient sparse algebra libraries.
  • Structured Pruning - Pruning whole layers, filters or channels instead of particular weights.

Which approach is more practical?

Actually, I haven't seen much progress in the 1st approach but a lot of papers are getting published improving 2nd approach. For the time being, pruning whole filters and channels is better if you want to compress your models.

Any tools to try it myself?

Although almost all big players are working on this, I find Tencent's Pockeflow and Intel's Distiller are only working frameworks (Upto my knowledge. Let me know if you know some better tools).

Where can I read further about this?😁

Here are some good links:

https://jacobgil.github.io/deeplearning/pruning-deep-learning

https://github.com/Eric-mingjie/rethinking-network-pruning

https://www.youtube.com/watch?v=s7DqRZVvRiQ

https://eng.uber.com/deconstructing-lottery-tickets/

https://github.com/he-y/Awesome-Pruning

https://github.com/memoiry/Awesome-model-compression-and-acceleration

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment