Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Reinforcement Learning Tutorial 2 (Cart Pole problem)
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@muik

This comment has been minimized.

Copy link

muik commented Oct 16, 2016

In while loop, loss value is always nan.

@yjmade

This comment has been minimized.

Copy link

yjmade commented Nov 4, 2016

for this line

tGrad,bggg = sess.run([newGrads,begining],feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

the variable begining did not defined

@yjmade

This comment has been minimized.

Copy link

yjmade commented Nov 4, 2016

and for this line

for ix,grad in enumerate(tGrad):
     gradBuffer[ix] += grad

the grad is a list with two array, and gradBuffer[ix] is an array with shape (4,10)
I make it work by change to gradBuffer[ix] += grad[0]

@Heeseok

This comment has been minimized.

Copy link

Heeseok commented Nov 17, 2016

tGrad,bggg = sess.run([newGrads,begining],feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

The variable begining did not defined.

I change the code like this.

tGrad = sess.run(newGrads, feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

And it works well.

@LoveDLWujing

This comment has been minimized.

Copy link

LoveDLWujing commented Nov 3, 2017

log(P(y|x)) = (1-input_y)log(probability) + input_ylog(1-probability) , and the loss function in above code happened to the same result as this maximum likelihood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.