FrancescoSaverioZuppichini/blog.md Secret

## blog.md

      
    Raw
  

              blog.md
            
          
    Pytorch: how and when to use Module, Sequential, ModuleList and ModuleDict

Effective way to share, reuse and break down the complexity of your models

Updated at Pytorch 4.1
You can find the code here
Pytorch is an open source deep learning frameworks that provide a smart way to create ML models. Even if the documentation is well made, I still find that most people still are able to write bad and not organized PyTorch code.
Today, we are going to see how to use the three main building blocks of PyTorch: Module, Sequential and ModuleList. We are going to start with an example and iteratively we will make it better.
All these four classes are contained into torch.nn
https://gist.github.com/4065fd8153a85aaf018ad196dda5f0cf
Module: the main building block

The Module is the main building block, it defines the base class for all neural network and you MUST subclass it.
Let's create a classic CNN classifier as example:
https://gist.github.com/6a4db72b27d5d6aa2519ef9917b3c821
https://gist.github.com/29e4c30718ef87969e6d7a28e78f8f01
MyCNNClassifier(
  (conv1): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (bn1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (conv2): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (bn2): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (fc1): Linear(in_features=25088, out_features=1024, bias=True)
  (fc2): Linear(in_features=1024, out_features=10, bias=True)
)

This is a very simple classifier with an encoding part that uses two layers with 3x3 convs + batchnorm + relu and a decoding part with two linear layers. If you are not new to PyTorch you may have seen this type of coding before, but there are two problems.
If we want to add a layer we have to again write lots of code in the __init__ and in the forward function. Also, if we have some common block that we want to use in another model, e.g. the 3x3 conv + batchnorm + relu, we have to write it again.
Sequential: stack and merge layers

Sequential is a container of Modules that can be stacked together and run at the same time.
You can notice that we have to store into self everything. We can use Sequential to improve our code.
https://gist.github.com/008c3cce811f1110b0db7feecac0cd93
https://gist.github.com/20ca227df5e962f066d0afd793b5a72d
MyCNNClassifier(
  (conv_block1): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (conv_block2): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (decoder): Sequential(
    (0): Linear(in_features=25088, out_features=1024, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=1024, out_features=10, bias=True)
  )
)

Much Better uhu?
Did you notice that conv_block1 and conv_block2 looks almost the same? We could create a function that reteurns a nn.Sequential to even simplify the code!
https://gist.github.com/52f470958f774842686b89a65b40966e
Then we can just call this function in our Module
https://gist.github.com/c1521dc10a5199ac13913405e1b4520c
https://gist.github.com/ac3fba480d711ca01dbf2bfd2e65180a
MyCNNClassifier(
  (conv_block1): Sequential(
    (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (conv_block2): Sequential(
    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
    (2): ReLU()
  )
  (decoder): Sequential(
    (0): Linear(in_features=25088, out_features=1024, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=1024, out_features=10, bias=True)
  )
)

Even cleaner! Still conv_block1 and conv_block2 are almost the same! We can merge them using nn.Sequential
https://gist.github.com/bccd56668c88ec62a0705978d381c10e
https://gist.github.com/79889670900342ec6de59f94ee800dfb
MyCNNClassifier(
  (encoder): Sequential(
    (0): Sequential(
      (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (1): Sequential(
      (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (decoder): Sequential(
    (0): Linear(in_features=25088, out_features=1024, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=1024, out_features=10, bias=True)
  )
)

self.encoder now holds booth conv_block. We have decoupled logic for our model and make it easier to read and reuse. Our conv_block function can be imported and used in another model.
Dynamic Sequential: create multiple layers at once

What if we can to add a new layers in self.encoder, hardcoded them is not convinient:
https://gist.github.com/d1d594fdfbbce72a7507fc4668cc9aaa
Would it be nice if we can define the sizes as an array and automatically create all the layers without writing each one of them? Fortunately we can create an array and pass it to Sequential
https://gist.github.com/8c4f3c06b304ca6eb2739987bed236f7
https://gist.github.com/e0b6a85ee20e7a68e3ace8369c07e31b
MyCNNClassifier(
  (encoder): Sequential(
    (0): Sequential(
      (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (1): Sequential(
      (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (decoder): Sequential(
    (0): Linear(in_features=25088, out_features=1024, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=1024, out_features=10, bias=True)
  )
)

Let's break it down. We created an array self.enc_sizes that holds the sizes of our encoder. Then we create an array conv_blocks by iterating the sizes. Since we have to give booth a in size and an outsize for each layer we ziped the size'array with itself by shifting it by one.
Just to be clear, take a look at the following example:
https://gist.github.com/18aa77e223bcd230a3738962033d56c0
1 32
32 64

Then, since Sequential does not accept a list, we decompose it by using the * operator.
Tada! Now if we just want to add a size, we can easily add a new number to the list. It is a common practice to make the size a parameter.
https://gist.github.com/45d449493b04dffae64f334965824d94
https://gist.github.com/99ad0ab21bec7917a046f358870de348
MyCNNClassifier(
  (encoder): Sequential(
    (0): Sequential(
      (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (1): Sequential(
      (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (2): Sequential(
      (0): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (decoder): Sequential(
    (0): Linear(in_features=25088, out_features=1024, bias=True)
    (1): Sigmoid()
    (2): Linear(in_features=1024, out_features=10, bias=True)
  )
)

We can do the same for the decoder part
https://gist.github.com/73f690e7514088abb8c16168977a0fd9
https://gist.github.com/b597cbed42b2ca9bc03adf7fd14f2f5f
MyCNNClassifier(
  (encoder): Sequential(
    (0): Sequential(
      (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
    (1): Sequential(
      (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (2): ReLU()
    )
  )
  (decoder): Sequential(
    (0): Sequential(
      (0): Linear(in_features=25088, out_features=1024, bias=True)
      (1): Sigmoid()
    )
    (1): Sequential(
      (0): Linear(in_features=1024, out_features=512, bias=True)
      (1): Sigmoid()
    )
  )
  (last): Linear(in_features=512, out_features=10, bias=True)
)

We followed the same pattern, we create a new block for the decoding part, linear + sigmoid, and we pass an array with the sizes. We had to add a self.last since we do not want to activate the output
Now, we can even break down our model in two! Encoder + Decoder
https://gist.github.com/0ba9c11ce6a44a183914386d9299e59c
https://gist.github.com/12b66f62a7e5f7532d9064b3b9862990
MyCNNClassifier(
  (encoder): MyEncoder(
    (conv_blokcs): Sequential(
      (0): Sequential(
        (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
      )
      (1): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): ReLU()
      )
    )
  )
  (decoder): MyDecoder(
    (dec_blocks): Sequential(
      (0): Sequential(
        (0): Linear(in_features=1024, out_features=512, bias=True)
        (1): Sigmoid()
      )
    )
    (last): Linear(in_features=512, out_features=10, bias=True)
  )
)

Be aware that MyEncoder and MyDecoder could also be functions that returns a nn.Sequential. I prefer to use the first pattern for models and the second for building blocks.
By diving our module into submodules it is easier to share the code, debug it and test it.
ModuleList : when we need to iterate

ModuleList allows you to store Module as a list. It can be useful when you need to iterate through layer and store/use some information, like in U-net.
The main difference between Sequential is that ModuleList have not a forward method so the inner layers are not connected. Assuming we need each output of each layer in the decoder, we can store it by:
https://gist.github.com/abc4a138a23f20c2a84052340e960275
https://gist.github.com/f2cd6f79a875c32c47ff75fd83ddc701
torch.Size([4, 16])
torch.Size([4, 32])


[None, None]

ModuleDict: when we need to choose

What if we want to switch to LearkyRelu in our conv_block? We can use ModuleDict to create a dictionary of Module and dynamically switch Module when we want
https://gist.github.com/8b13a783083b1b8d3b38900d8a53e48b
https://gist.github.com/6d1c207f0f733df9b76d8a48fb9aecfe
Sequential(
  (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): LeakyReLU(negative_slope=0.01)
)
Sequential(
  (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
)

Final implementation

Let's wrap it up everything!
https://gist.github.com/c2aa8f4abc174d3718daedbb7a2dc344
https://gist.github.com/ace0f186ca4d0ebebbb7e385ca0f9b4e
MyCNNClassifier(
  (encoder): MyEncoder(
    (conv_blokcs): Sequential(
      (0): Sequential(
        (0): Conv2d(1, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): LeakyReLU(negative_slope=0.01)
      )
      (1): Sequential(
        (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
        (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
        (2): LeakyReLU(negative_slope=0.01)
      )
    )
  )
  (decoder): MyDecoder(
    (dec_blocks): Sequential(
      (0): Sequential(
        (0): Linear(in_features=1024, out_features=512, bias=True)
        (1): Sigmoid()
      )
    )
    (last): Linear(in_features=512, out_features=10, bias=True)
  )
)

Conclusion

So, in summary.

Use Module when you have a big block compose of multiple smaller blocks
Use Sequential when you want to create a small block from layers
Use ModuleList when you need to iterate through some layers or building blocks and do something
Use ModuleDict when you need to parametise some blocks of your model, for example an activation function

That's all folks!
Thank you for reading