Skip to content

Instantly share code, notes, and snippets.

@Sultan91
Last active July 19, 2022 11:55
Show Gist options
  • Save Sultan91/9b7fbd7c36ccbb4d76193a642c53ab48 to your computer and use it in GitHub Desktop.
Save Sultan91/9b7fbd7c36ccbb4d76193a642c53ab48 to your computer and use it in GitHub Desktop.
Pseudo code for sarsa lambda
Procedure SARSA(lambda)
Initialize Q(s,a) arbitrarily and e(s,a)=0 for all s,a pairs
Repeat(per episode):
Initialize s,a
e(s,a)=0
Repeat(per timestep in each episode):
Take action a, observe r,s`
Choose a` from s` via epsilon-greedy policy Q
error = r+discount*Q(s`,a`)-Q(s,a)
any e(s,:)=0
e(s,a)=1
Q = Q + learn_rate*error*e
e = e*discount*lambda
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment