Last active
May 2, 2018 18:55
-
-
Save awjuliani/86ae316a231bceb96a3e2ab3ac8e646a to your computer and use it in GitHub Desktop.
Reinforcement Learning Tutorial 2 (Cart Pole problem)
for this line
tGrad,bggg = sess.run([newGrads,begining],feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})
the variable begining
did not defined
and for this line
for ix,grad in enumerate(tGrad):
gradBuffer[ix] += grad
the grad is a list with two array, and gradBuffer[ix] is an array with shape (4,10)
I make it work by change to gradBuffer[ix] += grad[0]
tGrad,bggg = sess.run([newGrads,begining],feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})
The variable begining
did not defined.
I change the code like this.
tGrad = sess.run(newGrads, feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})
And it works well.
log(P(y|x)) = (1-input_y)log(probability) + input_ylog(1-probability) , and the loss function in above code happened to the same result as this maximum likelihood.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In while loop,
loss
value is alwaysnan
.