Skip to content

Instantly share code, notes, and snippets.

@thomasahle
Created January 6, 2022 18:46
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save thomasahle/9b18a923dc9b64ba0a06a29de11bd9eb to your computer and use it in GitHub Desktop.
Save thomasahle/9b18a923dc9b64ba0a06a29de11bd9eb to your computer and use it in GitHub Desktop.
def update(state, t):
pi = compute_policy(state, t)
score = 0
for i in actions(state):
score_i = update(state + i)
score += pi[i] * score_i
state.mean_score = (state.mean_score * t + score)/(t + 1)
return score
def compute_policy(state):
pi = []
for i in actions(state):
regret = (state + i).mean_score - state.mean_score
pi.append(max(regret, 0))
return [p/sum(pi) for p in pi]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment