-
-
Save dhpollack/0bf9ba76f99261b534486d0777fb2ec5 to your computer and use it in GitHub Desktop.
A Policy-Gradient algorithm that solves Contextual Bandit problems.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here's a python3 version of the tutorial.