shawwn/biggan_fixes.md

## biggan_fixes.md

      
    Raw
  

              biggan_fixes.md
            
          
    Hello,
When you look at a repo, be sure to use git log --all and look at
the latest commits across all branches. You would see that my work is
on the "dec6" branch, not the main branch.
In general, it's more reliable to DM me on twitter than to email me.
I'll usually respond within a day. Emails are hit-or-miss.
Here is the critical difference:
        gamma = linear(y, num_channels, scope="gamma", use_sn=use_sn, use_bias=use_bias)
        ###########################
        ## CRITICAL DIFFERENCE STARTS HERE
        ###########################
        if scale_start != 0.0:
          gamma += scale_start
        ###########################
        gamma = tf.reshape(gamma, [-1, 1, 1, num_channels])
Please notice that scale_start defaults to 0.0 to preserve backwards
compatibility with google's repository. you must set scale_start to
1.0 for this fix to work at all.
In fact, ignore all of that and just do this. It's simpler. Change
your code from this:
    gamma = tf.reshape(gamma, [-1, 1, 1, num_channels])
to this:
    gamma += 1.0
    gamma = tf.reshape(gamma, [-1, 1, 1, num_channels])
Presto, your BigGAN is now fixed.
The reason this works is because gamma is a multiplier -- it needs to
be centered around 1.0, otherwise the model is forced to learn from a
model where everything is multiplied by a value pretty close to zero!
That's why google/compare_gan's BigGAN-Deep implementation never
worked.
Our current training run is at http://song.tensorfork.com:8097/#images
if you're curious to see how the samples look. It's been training for
two months now.
However, there are two additional ways that our model differs from
Google's model, which you may want to merge:


we use evonorm-s0 instead of batchnorm, which means you don't have
to bother calculating any batchnorm statistics. This was a huge
relief. It was so annoying to have to recompute batchnorm averages
before being able to inference from it -- evonorm-s0 is almost a
copy-paste replacement for batchnorm, and seems to have zero
downsides.


we use a new technique called "stop loss" to prevent BigGAN from
collapsing. We believe this is a world-first; no one, until now, has
ever figured out how to avoid BigGAN collapse. We've been training two
months, and the model seems to keep getting better and better over
time.


The technique is very simple:
    if D_loss < 0.2:
      D_loss = 0.0
In other words, "if D is too smart, skip gradients for this example."
Presto, your BigGAN no longer collapses.
This logic must be applied per-example, not per-batch. (This is
another crucial detail. If you do it per batch, D will be severely
handicapped. Whereas doing it per-example seems to have no downsides
whatsoever as far as we can tell.)
0.2 was determined empirically as follows: "0.15 seems to still
collapse, whereas 0.2 has been training two months without
collapsing." Still, it's probably a good guess to start with.
Think of it this way: if your batch size is 64, then there is a 63 in
64 chance that at least one example will make it through. Therefore, D
will continue to improve over time. But it won't ever get so good that
G has to resort to degenerate behavior (aka collapse) to fool D. It's
kind of a "dynamic learning rate" in a certain sense, because D makes
less and less progress over time (which is exactly what you want,
otherwise G has no choice but to collapse in order to maximize D's
loss).
Here are exact instructions for reproducing our current model:
    git clone https://github.com/shawwn/compare_gan
    cd compare_gan
    git checkout dec6
    bash run_bigrun97_run6.sh
(Obviously, that bash command won't run for you; you don't have TPUs,
you don't have permission to read from our cloud bucket, etc. But this
gives you all information necessary for copying functionality to your
own repository.)
Let me know if you have more questions. Keep me posted with your work
-- I'd love to hear what you end up doing with this.
Best,
Shawn