Skip to content

Instantly share code, notes, and snippets.

What would you like to do?
Reinforcement Learning Tutorial in Tensorflow: Model-based RL
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

This comment has been minimized.

Copy link

@krystynak krystynak commented Oct 18, 2016

The actions shape can be (0,) i.e. empty and there will be a ValueError when
state_prevs = np.hstack([state_prevs, actions])

I see this error most of the time, and am not sure what to do
when the actions is empty.

World Perf: Episode 106.000000. Reward 15.333333. action: 0.000000. mean reward 24.393341.
actions shape: (1, 1)
state_prevs shape: (1, 4)
actions shape: (0,)
state_prevs shape: (0, 4)
Traceback (most recent call last):
File "ReLearn/", line 191, in
state_prevs = np.hstack([state_prevs,actions])
File "/Users/john/Work/TF/lib/python2.7/site-packages/numpy/core/", line 280, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input arrays must have same number of dimensions


This comment has been minimized.

Copy link

@breeko breeko commented Dec 27, 2016

Your actions shape has to have a second dimension. For instance, below work and will result in 0,5 empty array:

np.hstack([np.empty(0).reshape(0,4), np.empty(0).reshape(0,1)])

While below will give you your error:


I wold reshape actions to (-1,1) or initialize it as np.empty(0).reshape(0,1) and append using np.vstack


This comment has been minimized.

Copy link

@Santara Santara commented Feb 21, 2017

Hi Arthur,

Thank you very much for making these tutorials! They are awesome!

However there seems to be a number of incompatibilities/bugs in this notebook. I had to make the following modifications to get the notebook running on Tensorflow 1.0.0:

  1. I had to comment out the line: from modelAny import * because neither was any script by the name modelAny provided, nor were any of the resources of the script required by the rest of the code.
  2. rnn_cell seems to be removed from tensorflow.python.ops in the current generation. Also this was never used in the rest of the code. So I commented out from tensorflow.python.ops import rnn_cell.
  3. tf.concat() has a different syntax now. I had to make the following modification:
    predicted_state = tf.concat([predicted_observation,predicted_reward,predicted_done],1)
  4. tf.mul() had to be replaced by tf.multiply() as follows:
    done_loss = tf.multiply(predicted_done, true_done) + tf.multiply(1-predicted_done, 1-true_done)

And everything executed as expected :)
Thank you


This comment has been minimized.

Copy link

@yangliu2 yangliu2 commented Nov 24, 2018

RuntimeWarning: overflow encountered in multiply x = um.multiply(x, x, out=x).
Then the reward starts to have large values like 11062986271742011518222336.000000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment