Skip to content

Instantly share code, notes, and snippets.

@sunskyhsh
Created December 29, 2016 13:22
Show Gist options
  • Save sunskyhsh/e966aa5f09b14e897aec142a076fa396 to your computer and use it in GitHub Desktop.
Save sunskyhsh/e966aa5f09b14e897aec142a076fa396 to your computer and use it in GitHub Desktop.
qlearning algorithm to solve CartPole on openAI gym.
def qlearning(env, policy, num_iter1, alpha, gamma):
actions = policy.actions
for i in xrange(len(policy.theta)):
policy.theta[i] = 0.1
for iter1 in xrange(num_iter1):
s_f = env.reset()
a = policy.epsilon_greedy(s_f)
count = 0
t = False
while False == t and count < 10000:
s_f1,r,t,i = env.step(a)
qmax = policy.qfunc(s_f1,a) #random
for a1 in policy.actions:
pvalue = policy.qfunc(s_f1, a1)
if qmax < pvalue:
qmax = pvalue;
update(policy, s_f, a, r + gamma * qmax, alpha);
s_f = s_f1
a = policy.epsilon_greedy(s_f)
count += 1
if iter1%100 == 0:
print "complete the %d epoches"%(iter1)
return policy
@sksq96
Copy link

sksq96 commented Dec 30, 2016

Based on PEP8 you should avoid unnecessary whitespace.

Avoid more than one space around an assignment (or other) operator to align it with another.

Just a suggestion though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment