Skip to content

Instantly share code, notes, and snippets.

@szagoruyko
Last active June 12, 2017 03:20
Show Gist options
  • Star 19 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save szagoruyko/1e994e713fce4a41773e to your computer and use it in GitHub Desktop.
Save szagoruyko/1e994e713fce4a41773e to your computer and use it in GitHub Desktop.

Torch7-FAQ

  1. Q: I run the installation commands from the website, type th and nothing happens
    A: add torch/install/bin/torch-activate to your .bash_profile

  2. Q: When I run image.display(image.lena()) there is an error
    A: Start torch with qlua instead of th

  3. Q: I updated or tried to update some package and have build or runtime errors
    A: Most probably another package is not up-to-date. Update the core packages by running update.sh or update torch/nn/cutorch/cunn in this order

  4. Q: I have build/runtime errors on OS X with cutorch/cunn like malformed mach-o or failed to load lib*.dylib
    A: Update cmake and CUDA to the latest version

  5. Q: How to set different learning rates/weight decays per layer
    A: Two ways of doing it:

  • if you are using optim.sgd, pass as argument to the optimState the fields learningRates/weightDecays, containing a Tensor with the multiplying factors (for the learning rate) or the values itself (for the weight decay) per parameter of the network. Here is an example. The downside of this approach is that you need to store an extra tensor of the size of the network.
  • instead of doing parameters, gradParameters = model:getParameters(), do parameters, gradParameters = model:parameters(). This will give you a table of tensors, each one of them corresponding to a separate weight/bias per layer. While optimizing using optim, keep a separate optimState for each parameter (which implies calling optim.sgd in a for loop).
  1. Q: How to convert a linear layer into a convolutional layer (e.g., to use my pre-trained network in a fully-convolutional manner) ?
    A: Use this function https://gist.github.com/szagoruyko/4e0d2b7f5fdaf877a6a9

  2. Q: I trained a network on GPU, can I use it on CPU?
    A: Yes! net = net:float() will copy it to CPU

  3. Q: I am trying to make a siamese network and do net_b = net_a:share('weight','bias'), but when I try to call net:getParameters() there is an error.
    A: You need to share gradWeight and gradBias too: net_b = net_a:share('weight','bias','gradWeight','gradBias')

  4. Q: Is there multiGPU neural network training support in Torch?
    A: Sure! see https://github.com/soumith/imagenet-multiGPU.torch

  5. Q: I want to implement my own neural network module, what should I do?
    A: Check the docs http://torch.ch/docs/developer-docs.html

  6. Q: I am curious about the difference between nn.SpatialConvolution and nn.SpatialConvolutionMM. Is it just a difference in the implementation or does it perform a different mathematical operation?
    A: There is no difference between nn.SpatialConvolution() and nn.SpatialConvolutionMM(). Both of them point to the MM implementation.

  7. Q: Is it possible to specify which GPU to be used?
    A: Use cutorch.setDevice(gpuId) or start torch with CUDA_VISIBLE_DEVICES=[gpuId-1] th. The latter is a low level call which isolates specified GPUs from the others.

  8. Q: Is torch supported on Windows:
    A: Not supported atm, although you can check this google question https://groups.google.com/forum/#!topic/torch7/IOB4aY6ClEA or check this repo https://github.com/diz-vara/luajit-rocks.

@fmassa
Copy link

fmassa commented Sep 28, 2015

  • Q: How to set different learning rates/weight decays per layer

    A: Two easy ways of doing it:
    • if you are using optim.sgd, pass as argument to the optimState the fields learningRates/weightDecays, containing a Tensor with the multiplying factors (for the learning rate) or the values itself (for the weight decay) per parameter of the network. Here is an example. The downside of this approach is that you need to store an extra tensor of the size of the network.
    • instead of doing parameters, gradParameters = model:getParameters(), do parameters, gradParameters = model:parameters(). This will give you a table of tensors, each one of them corresponding to a separate weight/bias per layer. While optimizing using optim, keep a separate optimState for each parameter (which implies calling optim.sgd in a for loop).
  • Q: How to convert a linear layer into a convolutional layer (e.g., to use my pre-trained network in a fully-convolutional manner) ?

    A: See this link

@szagoruyko
Copy link
Author

Thanks Francisco, added to the list

@vislab2013
Copy link

I though of some questions that new users might tend to ask or need clarification.

Q. What platforms/OS's is torch available on ?
A. Linux/MacOS: https://github.com/torch/torch7
iOS: https://github.com/clementfarabet/torch-ios
Android: https://github.com/soumith/torch-android
Windows: Not supported atm, although you can check this google question https://groups.google.com/forum/#!topic/torch7/IOB4aY6ClEA or check this repo https://github.com/diz-vara/luajit-rocks .

Q. Is CUDA 7.5 version compatible with torch? And CUDNN?
A. CUDA 7.0 only at this moment, soon to have 7.5. CUDNN R3 is supported.

Q. I keep getting nans when training my network. What should i do?
A. Reduce the learning rate, this should help with gradients exploding.

Q. Where can i get some help with code errors/training/overall questions?
A. https://groups.google.com/forum/#!forum/torch7

(--- Resumed questions taken from the google groups --- )

Q. I am curious about the difference between nn.SpatialConvolution and nn.SpatialConvolutionMM. Is it just a difference in the implementation or does it perform a different mathematical operation?
A. There is no difference between nn.SpatialConvolution() and nn.SpatialConvolutionMM(). Both of them point to the MM implementation.

Q. Is it possible to specify which GPU to be used?
A. Use cutorch.setDevice(gpuId).

Q. I've setup cutorch.setDevice(gpuID) but it uses the same default GPU regardless. What should I do?
A. Specify which GPU is visible to torch before launching your code by using export CUDA_VISIBLE_DEVICES=1,2. The environment variable can be useful to enforce that a process only sees devices 1 and 2, which lets you isolate GPUs to processes.

I'll try to get some more. Meanwhile, these are the ones I can think off.

@Atcold
Copy link

Atcold commented Sep 28, 2015

Q: Why Torch offers a static compilation as well?
A: No one has answered me yet... 😞

@Atcold
Copy link

Atcold commented Sep 28, 2015

@VisLab: There is no difference between nn.SpatialConvolution() and nn.SpatialConvolutionMM() anymore. Both of them point to the MM implementation.

@vislab2013
Copy link

I thought so. Updated my previous post accordingly. Thanks!

@szagoruyko
Copy link
Author

thanks @vislab2013, I included some of your QAs! sorry not all of them, want to keep it short

@vislab2013
Copy link

I've seen this question pop up a few times recently, and for newcomers this should be relevant.

Q: Hi, what are the differences amongst LuaJIT, qlua and th?
A: Luajit is the basic luajit interpreter. Qlua is the basic luajit interpreter with qt libs preloaded (so you can display things). Th is the basic luajit interpreter with the torch and paths libs preloaded + extra features (look at https://github.com/torch/trepl#trepl-a-repl-for-torch for the list)

The idea here is luajit is the base: you could do everything from it but it can be complicate to preload some libs (like qt) and th was made to work with torch with all its extra features. So its the best for beginners.
Qlua preloads one of the graphic backend (qt) but you could also use iTorch (https://github.com/facebook/iTorch) or display (https://github.com/szym/display)

Taken from the gitter chat (props to albanD).

@dmsedra
Copy link

dmsedra commented Mar 8, 2016

@fmassa I'm trying to implement method 2 for setting different learning rates. However, this appears very tricky due to the differences between parameters() and getParameters (the whole flatten thing). I've tried calling SGD once for each weight with a separate sgdState per your suggestion, but the gradients are off. Any ideas? Is there a simple example of this somewhere?

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment