Skip to content

Instantly share code, notes, and snippets.

@jknthn
Created April 4, 2018 08:15
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jknthn/16066c299db911e075494122526891fe to your computer and use it in GitHub Desktop.
Save jknthn/16066c299db911e075494122526891fe to your computer and use it in GitHub Desktop.
def policy_iterator(env, n, t, epsilon=0.01):
random_policy = create_random_policy(env)
random_policy_score = test_policy(random_policy, env)
best_policy = (random_policy, random_policy_score)
for i in tqdm.tqdm(range(t)):
new_policy = monte_carlo_e_soft(env, policy=best_policy[0], episodes=n, epsilon=epsilon)
new_policy_score = test_policy(new_policy, env)
if new_policy_score > best_policy[1]:
best_policy = (new_policy, new_policy_score)
return best_policy
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment