Skip to content

Instantly share code, notes, and snippets.

@awjuliani
Last active May 2, 2018 18:55
Show Gist options
  • Star 12 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save awjuliani/86ae316a231bceb96a3e2ab3ac8e646a to your computer and use it in GitHub Desktop.
Save awjuliani/86ae316a231bceb96a3e2ab3ac8e646a to your computer and use it in GitHub Desktop.
Reinforcement Learning Tutorial 2 (Cart Pole problem)
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@muik
Copy link

muik commented Oct 16, 2016

In while loop, loss value is always nan.

@yjmade
Copy link

yjmade commented Nov 4, 2016

for this line

tGrad,bggg = sess.run([newGrads,begining],feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

the variable begining did not defined

@yjmade
Copy link

yjmade commented Nov 4, 2016

and for this line

for ix,grad in enumerate(tGrad):
     gradBuffer[ix] += grad

the grad is a list with two array, and gradBuffer[ix] is an array with shape (4,10)
I make it work by change to gradBuffer[ix] += grad[0]

@Heeseok
Copy link

Heeseok commented Nov 17, 2016

tGrad,bggg = sess.run([newGrads,begining],feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

The variable begining did not defined.

I change the code like this.

tGrad = sess.run(newGrads, feed_dict={observations: epx, input_y: epy, advantages: discounted_epr})

And it works well.

@LoveDLWujing
Copy link

log(P(y|x)) = (1-input_y)log(probability) + input_ylog(1-probability) , and the loss function in above code happened to the same result as this maximum likelihood.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment