Skip to content

Instantly share code, notes, and snippets.

@ppwwyyxx
Last active May 23, 2018 09:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
placeholder for OpenAI Gym submission

Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo

Gist doesn't have notifications, please use repo issues to discuss.

@congling
Copy link

Sorry, my mistake, I was training MsPanMan and use "neon" instead of Tensorflow as the training architect.

@congling
Copy link

Hi v@ppwwyyxx
I've tried your sample in tensorpack/examples/Atari2600, running on my GTX 1080 with 32G mem. It runs about 1230000 steps, but the score is still very low . Is there something wrong with my configuration?

I'm running your program by using the following command, because my machine restarted once, I continue the training for the model generated before restarting.
python2 ./DQN.py --rom breakout.bin --gpu 0 --load train_log/DQN/model-690000

Thank you for your code, it helps me a lot to understand the DDQN network

Here're the my logs.

[0830 16:06:07 @stat.py:81] conv0/W/rms: 0.047264
[0830 16:06:07 @stat.py:81] conv1/W/rms: 0.035405
[0830 16:06:07 @stat.py:81] conv2/W/rms: 0.036001
[0830 16:06:07 @stat.py:81] conv3/W/rms: 0.041562
[0830 16:06:07 @stat.py:81] cost: 0.24421
[0830 16:06:07 @stat.py:81] expreplay/max_score: 4
[0830 16:06:07 @stat.py:81] expreplay/mean_score: 1.046
[0830 16:06:07 @stat.py:81] fc0/W/rms: 0.017889
[0830 16:06:07 @stat.py:81] fct/W/rms: 0.0079109
[0830 16:06:07 @stat.py:81] learning_rate: 0.001
[0830 16:06:07 @stat.py:81] max_score: 3
[0830 16:06:07 @stat.py:81] mean_score: 2.22
[0830 16:06:07 @stat.py:81] predict_reward: 0.24851
[0830 16:06:07 @group.py:95] Callbacks took 11.449 sec in total. Periodic-Evaluator: 11.166sec
Epoch 54, global_step=1230000 finished, time=522.31sec.

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Sep 2, 2016

I ran python2 ./DQN.py --rom breakout.bin --gpu 0 today for about 8 hours. At global_step=360000 it already reached a score of 40. This is roughly what I had before, so it's unlikely to be some bugs I introduced recently.
Did you modify the code some way?
Also someone had issue with GTX1080 + cuda8.0 before. tensorflow/tensorflow#3068, tensorpack/tensorpack#8. Maybe it's related.

@congling
Copy link

congling commented Sep 2, 2016

Thank you for your reply. My colleague ran successfully with your sample with the same environment. I'll tried again later.
Thanks.

@acarticm
Copy link

Hi @ppwwyyxx,
I tried to run a pretrained Atari model from examples/OpenAIGym and got an error about a "malformed environment ID". The full traceback is copied below. Do you have any suggestions on how to avoid this issue?
I would really appreciate any suggestions.
Thanks.

ENV=Breakout-v0 ./run-atari.py --load "$ENV".tfmodel --env "$ENV"
..
[2016-10-11 19:41:53,285] Making new env:
Traceback (most recent call last):
File "./run-atari.py", line 87, in
p = get_player(); del p # set NUM_ACTIONS
File "./run-atari.py", line 28, in get_player
pl = GymEnv(ENV_NAME, dumpdir=dumpdir, auto_restart=False)
File "/home/user/tensorpack/tensorpack/RL/gymenv.py", line 30, in init
self.gymenv = gym.make(name)
File "/home/user/gym/gym/envs/registration.py", line 126, in make
return registry.make(id)
File "/home/user/gym/gym/envs/registration.py", line 90, in make
spec = self.spec(id)
File "/home/user/gym/gym/envs/registration.py", line 99, in spec
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
gym.error.Error: Attempted to look up malformed environment ID: . (Currently all IDs must be of the form ^([\w:-]+)-v(\d+)$.)

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Oct 19, 2016

@acarticm
For some reason I never got notified about the discussions here.
ENV should be an environment variable, so it should be (note the semicolon)

ENV=Breakout-v0; ./run-atari.py --load "$ENV".tfmodel --env "$ENV"

I'll correct this in the readme.

// OK it looks like gist doesn't have notification at all: issue
// Further visitors please use issues in my code repo so I can see you..

@richardxiong
Copy link

@ppwwyyxx
Hello! I have been studying the Tutankham using your code. Could you tell me how you plot the "training curve on break out"? since I hope to plot a similar figure on Tutankham, in order to monitor the training process. Thanks!

@ppwwyyxx
Copy link
Author

After you started training all the statistics will be in train_log/some_dir/stat.json
You can parse the json and plot it using your tools, or open the directory with tensorboard, or plot it with my plotting tools:
cat train_log/some_directory/stat.json | jq '.[] | .mean_score // empty' | scripts/plot-point.py

@dylanthomas
Copy link

I am trying to train your A3C from scratch, but got the following error. Can you guide me to the right direction ?
Million thanks in advance,

(py35) ➜ OpenAIGym git:(master) ✗ ./train-atari.py --env Breakout-v0 --gpu 0
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
[2016-11-25 11:06:13,976] Making new env: Breakout-v0
Traceback (most recent call last):
File "./train-atari.py", line 247, in
train_tower = range(nr_gpu)[:-nr_gpu/2] or [0]
TypeError: slice indices must be integers or None or have an index method

@ppwwyyxx
Copy link
Author

@dylanthomas Sorry, that's a python-3 compatibility problem. You need to replace nr_gpu/2 by nr_gpu//2. I just fixed it in the project.

@dylanthomas
Copy link

Thank you, but now I am getting
Traceback (most recent call last):
File "./train-atari.py", line 255, in
config = get_config()
File "./train-atari.py", line 184, in get_config
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "./train-atari.py", line 184, in
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "/home/john/dev/tensorpack/tensorpack/RL/simulator.py", line 70, in init
super(SimulatorProcessStateExchange, self).init(idx)
File "/home/john/dev/tensorpack/tensorpack/RL/simulator.py", line 52, in init
self.name = self.identity = u'simulator-{}'.format(self.idx).encode('utf-8')
File "/home/john/anaconda3/envs/py35/lib/python3.5/multiprocessing/process.py", line 143, in name
assert isinstance(name, str), 'name must be a string'
AssertionError: name must be a string

Another compatibility problem, maybe ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 25, 2016

Yes.. it is an unicode/str compatibility issue.. I just pushed another fix. I don't have a python3 environment for testing now, but hopefully it'll work..

@dylanthomas
Copy link

It works !!! Many thanks !!

@dylanthomas
Copy link

Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?

First,
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?

Second,
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 28, 2016

The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.

Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.

@dylanthomas
Copy link

That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?

@ppwwyyxx
Copy link
Author

Yes, apart from other minor differences, random frame skip might be most relevant to performance.

@dylanthomas
Copy link

The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?

@ppwwyyxx
Copy link
Author

Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.

@dylanthomas
Copy link

Wonderful. Thank YOU !

@Nhorning
Copy link

Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?

@lululun20
Copy link

Hey,
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!

@ppwwyyxx
Copy link
Author

It has better performance on GPU.

@cyrsis
Copy link

cyrsis commented Dec 1, 2017

[1201 10:47:55 @monitor.py:363] max_score: 863
[1201 10:47:55 @monitor.py:363] mean_score: 590.14

This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram

it still running,

when I do

./train-atari.py --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir

It said

AssertionError: Breakout-v0.npy"

Do I need to wait for the training finish to get Breakout-v0.npy ????

@pablosjb
Copy link

Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:

  • I am trying to solve the game "tennis-v0" in which the data (observation) is the image (RGB 3D-array) and I want first to extract features such as players position, ball position and score.
  • For the score, i am thinking about applying a text recognition algortihm in the region where the score is.
  • The problem is for the location of the items (players and ball), Can anyone help me telling which way to take?

Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment