Skip to content

Instantly share code, notes, and snippets.

@baldassarreFe
Created March 12, 2018 16:36
Show Gist options
  • Save baldassarreFe/363ed9aedd8bac23775aaa1fdf381bbf to your computer and use it in GitHub Desktop.
Save baldassarreFe/363ed9aedd8bac23775aaa1fdf381bbf to your computer and use it in GitHub Desktop.
Mixed-Scale Dense Network
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@EelcoHoogendoorn
Copy link

Interesting example! Let me see if I interpret it correctly; forward evaluation of the MixedScaleDense network is about a factor two slower to evaluate than comparable alternative networks.

However, alternative networks can easily have many more parameters; even if they are more efficient to evaluate despite having more parameters, that does not mean they are more efficient to train / converge. So the MixedScaleDense might still have an edge in training.

What I am after is to get some context about the claim in the original paper about efficiency; while this comparison does demonstrate the point the authors tried to make, they dont make quantitative claim; and this example suggests to me that the incurred inefficiency penalty in existing frameworks is managable; and one hardly needs their proprietary code to make this technique useful in practice.

There is some reading inbetween the lines in their paper that I still havnt quite figured out yet. Generally, I would say it is a good thing if you can implement your technique in existing frameworks. So did they make this remark in their paper in an attempt to justify to themselves the writing their own neural network framework in pycuda from scratch? Or are there actual real world examples where we see more than a factor two performance difference?

@wohe157
Copy link

wohe157 commented Dec 2, 2019

@EelcoHoogendoorn Indeed this example seems to show a comparison of the evaluation time for the MSDNet and a conventional CNN and I agree with you that this suggests that a conventional CNN (with a similar number of parameters) is faster, but would usually need (a lot!) more parameters than the MSDNet.

However, I don't really agree with your question about efficiency. I have tried both this example and the code from D. M. Pelt on a database of +/-1500 images of size 1x256x256 for denoising. The same network (w=1 and d=20) took 5min/epoch with this PyTorch implementation, but only 50sec/epoch with the original code.

Moreover, I'm limited to batch sizes of 8 images with this PyTorch implementation because of the GPU memory usage (6GB), while the original implementation doesn't even use 100MB and only uses my GPU for 55%. Therefore I believe that the original code can be even more optimized to perform even faster.

Lastly, I noticed that both network converge at about the same speed (in epochs) to a similar loss (at an MSE of 0.00018 in my case), but when the PyTorch network stops reducing the loss, the original network goes on to 0.00010 and better. Because of this, I'm not even sure if this PyTorch implementation is the same as the one from the paper.

@RyanPlt
Copy link

RyanPlt commented Apr 22, 2021

@wohe157 It's indeed not the same as in the paper, since it's missing a crucial component in any network, which would be a nonlinearity! In the paper they briefly mention using ReLU, so if you add that, it would probably perform much better.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment