Skip to content

Instantly share code, notes, and snippets.

@rdednl
Last active September 7, 2023 15:01
Show Gist options
  • Save rdednl/64e8fb4b7d4a0e4d047f91188cbfaaed to your computer and use it in GitHub Desktop.
Save rdednl/64e8fb4b7d4a0e4d047f91188cbfaaed to your computer and use it in GitHub Desktop.
batch norm is bad (td3/sac)
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@honglu2875
Copy link

honglu2875 commented Sep 29, 2022

Check out properties whose names start with "running_" (either in your batch norm layer or state_dict). They are "learnable", meaning they change under training but not by gradients. They are not present in parameters().

All learnable parameters are in state_dict(). parameters() are only those that are updated by gradients.

@honglu2875
Copy link

honglu2875 commented Sep 29, 2022

My code is not from stable baselines.

Ahh.... So this misunderstanding spread wider than I thought... Maybe there is a chain of misuse and people never bother checking.
When stable-baseline came out there was no such thing as batch norm by the way. The code is great and should indeed be our implement baseline. But we, "the later generations", really have more responsibilities when working on earlier codes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment