Skip to content

Instantly share code, notes, and snippets.

@awjuliani
Last active March 24, 2021 07:38
Show Gist options
  • Save awjuliani/1256e7ad7c8ac54051d09963606c8a47 to your computer and use it in GitHub Desktop.
Save awjuliani/1256e7ad7c8ac54051d09963606c8a47 to your computer and use it in GitHub Desktop.
Reinforcement Learning Tutorial in Tensorflow: Model-based RL
Display the source blob
Display the rendered blob
Raw
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@krystynak
Copy link

The actions shape can be (0,) i.e. empty and there will be a ValueError when
state_prevs = np.hstack([state_prevs, actions])

I see this error most of the time, and am not sure what to do
when the actions is empty.

World Perf: Episode 106.000000. Reward 15.333333. action: 0.000000. mean reward 24.393341.
actions shape: (1, 1)
state_prevs shape: (1, 4)
actions shape: (0,)
state_prevs shape: (0, 4)
Traceback (most recent call last):
File "ReLearn/rltut3.py", line 191, in
state_prevs = np.hstack([state_prevs,actions])
File "/Users/john/Work/TF/lib/python2.7/site-packages/numpy/core/shape_base.py", line 280, in hstack
return _nx.concatenate(arrs, 1)
ValueError: all the input arrays must have same number of dimensions

@breeko
Copy link

breeko commented Dec 27, 2016

@krystynak
Your actions shape has to have a second dimension. For instance, below work and will result in 0,5 empty array:

np.hstack([np.empty(0).reshape(0,4), np.empty(0).reshape(0,1)])

While below will give you your error:

np.hstack([np.empty(0).reshape(0,4),np.empty(0).reshape(0,)])

I wold reshape actions to (-1,1) or initialize it as np.empty(0).reshape(0,1) and append using np.vstack

@Santara
Copy link

Santara commented Feb 21, 2017

Hi Arthur,

Thank you very much for making these tutorials! They are awesome!

However there seems to be a number of incompatibilities/bugs in this notebook. I had to make the following modifications to get the notebook running on Tensorflow 1.0.0:

  1. I had to comment out the line: from modelAny import * because neither was any script by the name modelAny provided, nor were any of the resources of the script required by the rest of the code.
  2. rnn_cell seems to be removed from tensorflow.python.ops in the current generation. Also this was never used in the rest of the code. So I commented out from tensorflow.python.ops import rnn_cell.
  3. tf.concat() has a different syntax now. I had to make the following modification:
    predicted_state = tf.concat([predicted_observation,predicted_reward,predicted_done],1)
  4. tf.mul() had to be replaced by tf.multiply() as follows:
    done_loss = tf.multiply(predicted_done, true_done) + tf.multiply(1-predicted_done, 1-true_done)

And everything executed as expected :)
Thank you
Anirban

@yangliu2
Copy link

RuntimeWarning: overflow encountered in multiply x = um.multiply(x, x, out=x).
Then the reward starts to have large values like 11062986271742011518222336.000000.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment