Skip to content

Instantly share code, notes, and snippets.

@shawwn
Last active November 9, 2021 16:00
Show Gist options
  • Star 4 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save shawwn/5c4a995df99d4a1e3aef3e0f5b1e1006 to your computer and use it in GitHub Desktop.
Save shawwn/5c4a995df99d4a1e3aef3e0f5b1e1006 to your computer and use it in GitHub Desktop.

Hello,

When you look at a repo, be sure to use git log --all and look at the latest commits across all branches. You would see that my work is on the "dec6" branch, not the main branch.

In general, it's more reliable to DM me on twitter than to email me. I'll usually respond within a day. Emails are hit-or-miss.

Here is the critical difference:

        gamma = linear(y, num_channels, scope="gamma", use_sn=use_sn, use_bias=use_bias)
        ###########################
        ## CRITICAL DIFFERENCE STARTS HERE
        ###########################
        if scale_start != 0.0:
          gamma += scale_start
        ###########################
        gamma = tf.reshape(gamma, [-1, 1, 1, num_channels])

Please notice that scale_start defaults to 0.0 to preserve backwards compatibility with google's repository. you must set scale_start to 1.0 for this fix to work at all.

In fact, ignore all of that and just do this. It's simpler. Change your code from this:

    gamma = tf.reshape(gamma, [-1, 1, 1, num_channels])

to this:

    gamma += 1.0
    gamma = tf.reshape(gamma, [-1, 1, 1, num_channels])

Presto, your BigGAN is now fixed.

The reason this works is because gamma is a multiplier -- it needs to be centered around 1.0, otherwise the model is forced to learn from a model where everything is multiplied by a value pretty close to zero! That's why google/compare_gan's BigGAN-Deep implementation never worked.

Our current training run is at http://song.tensorfork.com:8097/#images if you're curious to see how the samples look. It's been training for two months now.

However, there are two additional ways that our model differs from Google's model, which you may want to merge:

  1. we use evonorm-s0 instead of batchnorm, which means you don't have to bother calculating any batchnorm statistics. This was a huge relief. It was so annoying to have to recompute batchnorm averages before being able to inference from it -- evonorm-s0 is almost a copy-paste replacement for batchnorm, and seems to have zero downsides.

  2. we use a new technique called "stop loss" to prevent BigGAN from collapsing. We believe this is a world-first; no one, until now, has ever figured out how to avoid BigGAN collapse. We've been training two months, and the model seems to keep getting better and better over time.

The technique is very simple:

    if D_loss < 0.2:
      D_loss = 0.0

In other words, "if D is too smart, skip gradients for this example." Presto, your BigGAN no longer collapses.

This logic must be applied per-example, not per-batch. (This is another crucial detail. If you do it per batch, D will be severely handicapped. Whereas doing it per-example seems to have no downsides whatsoever as far as we can tell.)

0.2 was determined empirically as follows: "0.15 seems to still collapse, whereas 0.2 has been training two months without collapsing." Still, it's probably a good guess to start with.

Think of it this way: if your batch size is 64, then there is a 63 in 64 chance that at least one example will make it through. Therefore, D will continue to improve over time. But it won't ever get so good that G has to resort to degenerate behavior (aka collapse) to fool D. It's kind of a "dynamic learning rate" in a certain sense, because D makes less and less progress over time (which is exactly what you want, otherwise G has no choice but to collapse in order to maximize D's loss).

Here are exact instructions for reproducing our current model:

    git clone https://github.com/shawwn/compare_gan
    cd compare_gan
    git checkout dec6
    bash run_bigrun97_run6.sh

(Obviously, that bash command won't run for you; you don't have TPUs, you don't have permission to read from our cloud bucket, etc. But this gives you all information necessary for copying functionality to your own repository.)

Let me know if you have more questions. Keep me posted with your work -- I'd love to hear what you end up doing with this.

Best, Shawn

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment