Skip to content

Instantly share code, notes, and snippets.

@awjuliani
Last active July 14, 2019 16:24
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 1 You must be signed in to fork a gist
  • Save awjuliani/fffe41519166ee41a6bd5f5ce8ae2630 to your computer and use it in GitHub Desktop.
Save awjuliani/fffe41519166ee41a6bd5f5ce8ae2630 to your computer and use it in GitHub Desktop.
Implementation of Double Dueling Deep-Q Network
Display the source blob
Display the rendered blob
Raw
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
@mphielipp
Copy link

I'm getting this message: -
----> 2 mainQN = Qnetwork(h_size)
---> 16 self.AW = tf.Variable(tf.random_normal([h_size/2,env.actions]))
---> 77 seed2=seed2)
--> 189 name=name)
--> 582 _Attr(op_def, input_arg.type_attr))
lib\site-packages\tensorflow\python\framework\op_def_library.py in _SatisfiesTypeConstraint(dtype, attr_def)
58 "DataType %s for attr '%s' not in list of allowed values: %s" %
59 (dtypes.as_dtype(dtype).name, attr_def.name,
---> 60 ", ".join(dtypes.as_dtype(x).name for x in allowed_list)))
TypeError: DataType float32 for attr 'T' not in list of allowed values: int32, int64

@irwenqiang
Copy link

@mphielipp
I think you should check your version of TF first.
python -c 'import tensorflow as tf; print(tf.__version__)'
The version should be 0.12.x

@Xan-Kun
Copy link

Xan-Kun commented Feb 13, 2017

@mphielipp Did it work for you? I installed the latest tensorflow 0.12.1 and
pip show tensorflow says 0.12.1
but I still get the same error as you.

@tropical32
Copy link

tropical32 commented Feb 13, 2017

@mphielipp Replace that line with:
self.AW = tf.Variable(tf.random_normal([h_size // 2, env.actions]))
It expects an integer, not a float.

@nathanin
Copy link

nathanin commented Sep 8, 2017

Hi, First thanks so much for your detailed write ups and commented implementations. I have been working through them while developing my own RL environment outside of gym.

I have a few questions regarding the implementation for Double-DQN here:

  • The Double-DQN paper (https://arxiv.org/pdf/1511.06581.pdf) algorithm mentions updating \theta with each step t. It looks like the implementation here updates \theta every update_freq steps, and updates \theta- immediately afterwards. Is there something I don't understand? I guess it ends up being a heuristic decision when to perform these updates, just wondering what your intuition is for the \theta, \theta- update cycle.

  • Second is your nice tensorflow hack to update the targetQ weights. Does it rely on the order of initialization? Might there be a more verbose but explicit way to do it, maybe storing the targetQ ops by name in a dictionary?

  • Last is there a reason for not using a nonlinearity/activation in the network?

@samsenyang
Copy link

I would like to ask a question: do we have to split the inputs in order to achieve dueling DQN?
why can't i just input all the inputs into value layer and advantage layer?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment