Skip to content

Instantly share code, notes, and snippets.

@pierrelux
Created April 30, 2019 21:07
Show Gist options
  • Save pierrelux/a1c97ef9fede4c7f845c7198072e0cde to your computer and use it in GitHub Desktop.
Save pierrelux/a1c97ef9fede4c7f845c7198072e0cde to your computer and use it in GitHub Desktop.
def induced_chain(transition, policy):
"""Marginalize the choice of actions under the given policy
Args:
transition (numpy.ndarray): Transition kernel as a (A x S x S) tensor
policy (numpy.ndarray): Policy as a (S x A) matrix
Returns:
numpy.ndarray: Marginalized transition matrix as a (S x S) matrix,
where the first dimension denote "source" states and the second is for
"destination" states. From i to j.
"""
return np.einsum('kij,ik->ij', transition, policy)
def discounted_stationary_distribution(transition, policy, initial_distribution, discount):
"""Solve the discounted stationary distribution equations
Args:
transition (numpy.ndarray): Transition kernel as a (A x S x S) tensor
policy (numpy.ndarray): Policy as a (S x A) matrix
initial_distribution (numpy.ndarray): Initial distribution as a (S,) vector
discount (float): Discount factor
Returns:
numpy.ndarray: The discounted stationary distribution as a (S,) vector
"""
transition_policy = induced_chain(transition, policy)
A = np.eye(transition_policy.shape[0]) - discount*transition_policy
b = (1 - discount)*initial_distribution
return np.linalg.solve(A.T, b)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment