-
-
Save dhpollack/2d7b906cd51bbdc870cafd0801da0693 to your computer and use it in GitHub Desktop.
Policy gradient method for solving n-armed bandit problems.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
using python 3 with tensor flow r0.12.