Skip to content

Instantly share code, notes, and snippets.

@adityavk
Last active March 29, 2024 22:35
Show Gist options
  • Save adityavk/d946f7c329ca8571667753e386cb22f6 to your computer and use it in GitHub Desktop.
Save adityavk/d946f7c329ca8571667753e386cb22f6 to your computer and use it in GitHub Desktop.
Example implementation of policy iteration for a toy MDP
# Extract the value function, the history of log V(s), and the optimal policy learnt by PI
value_func_pi, log_value_func_history_pi, policy_iteration_policy = planner.policy_iteration(gamma=0.99)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment