Skip to content

Instantly share code, notes, and snippets.

@ppwwyyxx
Last active May 23, 2018 09:29
Show Gist options
  • Star 1 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
placeholder for OpenAI Gym submission

Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo

Gist doesn't have notifications, please use repo issues to discuss.

@Sohojoe
Copy link

Sohojoe commented Aug 12, 2016

Nice! Which algorithm was the breakout 625 scored with (your repro mentions both DoubleDQN and BatchA3C)

@ppwwyyxx
Copy link
Author

They're all BatchA3C. On atari games, A3C are faster to train and usually result in higher score than DQN as well, according to deepmind.

@congling
Copy link

It's great model. I tried your model, the max score is 864.
After all the brick has been cleared at the second level, the ball keeps going and the game will not start a new level. Perhaps it's the bug.

BTW, how long did it take to train the model? I found use your training algorithm to train a new model and I found this model gets lower avg reward than the Deepmind model, and the performance is lower too.
Will you model perform better in the long run?

Thanks

Here're the statistic of two models

Steps: 2500000
Deepmind model
(8_8_64,strides=4)-> (4_4_64,strides=2)-> (3_3_64,strides=1)->512->actions
avg reward: 1037.006803
avg steps per second: 295
Your model
(5_5_32)->Pool(2)-> (5_5_32)-> Pool(2)->(4_4_64)->Pool(2)->(3_3_64)->512(PRelu(0.001))->actions (correct me if i'm wrong)
avg reward: 898.745981
avg steps per second: 196

@ppwwyyxx
Copy link
Author

Are you using your own code? What game are you training on? an average score of 1037 seems too high for breakout and I wonder how you did it.
A3C on breakout would take me 1~2 days to roughly converge.

@congling
Copy link

Sorry, my mistake, I was training MsPanMan and use "neon" instead of Tensorflow as the training architect.

@congling
Copy link

Hi v@ppwwyyxx
I've tried your sample in tensorpack/examples/Atari2600, running on my GTX 1080 with 32G mem. It runs about 1230000 steps, but the score is still very low . Is there something wrong with my configuration?

I'm running your program by using the following command, because my machine restarted once, I continue the training for the model generated before restarting.
python2 ./DQN.py --rom breakout.bin --gpu 0 --load train_log/DQN/model-690000

Thank you for your code, it helps me a lot to understand the DDQN network

Here're the my logs.

[0830 16:06:07 @stat.py:81] conv0/W/rms: 0.047264
[0830 16:06:07 @stat.py:81] conv1/W/rms: 0.035405
[0830 16:06:07 @stat.py:81] conv2/W/rms: 0.036001
[0830 16:06:07 @stat.py:81] conv3/W/rms: 0.041562
[0830 16:06:07 @stat.py:81] cost: 0.24421
[0830 16:06:07 @stat.py:81] expreplay/max_score: 4
[0830 16:06:07 @stat.py:81] expreplay/mean_score: 1.046
[0830 16:06:07 @stat.py:81] fc0/W/rms: 0.017889
[0830 16:06:07 @stat.py:81] fct/W/rms: 0.0079109
[0830 16:06:07 @stat.py:81] learning_rate: 0.001
[0830 16:06:07 @stat.py:81] max_score: 3
[0830 16:06:07 @stat.py:81] mean_score: 2.22
[0830 16:06:07 @stat.py:81] predict_reward: 0.24851
[0830 16:06:07 @group.py:95] Callbacks took 11.449 sec in total. Periodic-Evaluator: 11.166sec
Epoch 54, global_step=1230000 finished, time=522.31sec.

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Sep 2, 2016

I ran python2 ./DQN.py --rom breakout.bin --gpu 0 today for about 8 hours. At global_step=360000 it already reached a score of 40. This is roughly what I had before, so it's unlikely to be some bugs I introduced recently.
Did you modify the code some way?
Also someone had issue with GTX1080 + cuda8.0 before. tensorflow/tensorflow#3068, tensorpack/tensorpack#8. Maybe it's related.

@congling
Copy link

congling commented Sep 2, 2016

Thank you for your reply. My colleague ran successfully with your sample with the same environment. I'll tried again later.
Thanks.

@acarticm
Copy link

Hi @ppwwyyxx,
I tried to run a pretrained Atari model from examples/OpenAIGym and got an error about a "malformed environment ID". The full traceback is copied below. Do you have any suggestions on how to avoid this issue?
I would really appreciate any suggestions.
Thanks.

ENV=Breakout-v0 ./run-atari.py --load "$ENV".tfmodel --env "$ENV"
..
[2016-10-11 19:41:53,285] Making new env:
Traceback (most recent call last):
File "./run-atari.py", line 87, in
p = get_player(); del p # set NUM_ACTIONS
File "./run-atari.py", line 28, in get_player
pl = GymEnv(ENV_NAME, dumpdir=dumpdir, auto_restart=False)
File "/home/user/tensorpack/tensorpack/RL/gymenv.py", line 30, in init
self.gymenv = gym.make(name)
File "/home/user/gym/gym/envs/registration.py", line 126, in make
return registry.make(id)
File "/home/user/gym/gym/envs/registration.py", line 90, in make
spec = self.spec(id)
File "/home/user/gym/gym/envs/registration.py", line 99, in spec
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
gym.error.Error: Attempted to look up malformed environment ID: . (Currently all IDs must be of the form ^([\w:-]+)-v(\d+)$.)

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Oct 19, 2016

@acarticm
For some reason I never got notified about the discussions here.
ENV should be an environment variable, so it should be (note the semicolon)

ENV=Breakout-v0; ./run-atari.py --load "$ENV".tfmodel --env "$ENV"

I'll correct this in the readme.

// OK it looks like gist doesn't have notification at all: issue
// Further visitors please use issues in my code repo so I can see you..

@richardxiong
Copy link

@ppwwyyxx
Hello! I have been studying the Tutankham using your code. Could you tell me how you plot the "training curve on break out"? since I hope to plot a similar figure on Tutankham, in order to monitor the training process. Thanks!

@ppwwyyxx
Copy link
Author

After you started training all the statistics will be in train_log/some_dir/stat.json
You can parse the json and plot it using your tools, or open the directory with tensorboard, or plot it with my plotting tools:
cat train_log/some_directory/stat.json | jq '.[] | .mean_score // empty' | scripts/plot-point.py

@dylanthomas
Copy link

I am trying to train your A3C from scratch, but got the following error. Can you guide me to the right direction ?
Million thanks in advance,

(py35) ➜ OpenAIGym git:(master) ✗ ./train-atari.py --env Breakout-v0 --gpu 0
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
[2016-11-25 11:06:13,976] Making new env: Breakout-v0
Traceback (most recent call last):
File "./train-atari.py", line 247, in
train_tower = range(nr_gpu)[:-nr_gpu/2] or [0]
TypeError: slice indices must be integers or None or have an index method

@ppwwyyxx
Copy link
Author

@dylanthomas Sorry, that's a python-3 compatibility problem. You need to replace nr_gpu/2 by nr_gpu//2. I just fixed it in the project.

@dylanthomas
Copy link

Thank you, but now I am getting
Traceback (most recent call last):
File "./train-atari.py", line 255, in
config = get_config()
File "./train-atari.py", line 184, in get_config
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "./train-atari.py", line 184, in
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "/home/john/dev/tensorpack/tensorpack/RL/simulator.py", line 70, in init
super(SimulatorProcessStateExchange, self).init(idx)
File "/home/john/dev/tensorpack/tensorpack/RL/simulator.py", line 52, in init
self.name = self.identity = u'simulator-{}'.format(self.idx).encode('utf-8')
File "/home/john/anaconda3/envs/py35/lib/python3.5/multiprocessing/process.py", line 143, in name
assert isinstance(name, str), 'name must be a string'
AssertionError: name must be a string

Another compatibility problem, maybe ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 25, 2016

Yes.. it is an unicode/str compatibility issue.. I just pushed another fix. I don't have a python3 environment for testing now, but hopefully it'll work..

@dylanthomas
Copy link

It works !!! Many thanks !!

@dylanthomas
Copy link

Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?

First,
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?

Second,
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 28, 2016

The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.

Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.

@dylanthomas
Copy link

That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?

@ppwwyyxx
Copy link
Author

Yes, apart from other minor differences, random frame skip might be most relevant to performance.

@dylanthomas
Copy link

The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?

@ppwwyyxx
Copy link
Author

Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.

@dylanthomas
Copy link

Wonderful. Thank YOU !

@Nhorning
Copy link

Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?

@lululun20
Copy link

Hey,
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!

@ppwwyyxx
Copy link
Author

It has better performance on GPU.

@cyrsis
Copy link

cyrsis commented Dec 1, 2017

[1201 10:47:55 @monitor.py:363] max_score: 863
[1201 10:47:55 @monitor.py:363] mean_score: 590.14

This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram

it still running,

when I do

./train-atari.py --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir

It said

AssertionError: Breakout-v0.npy"

Do I need to wait for the training finish to get Breakout-v0.npy ????

@pablosjb
Copy link

Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:

  • I am trying to solve the game "tennis-v0" in which the data (observation) is the image (RGB 3D-array) and I want first to extract features such as players position, ball position and score.
  • For the score, i am thinking about applying a text recognition algortihm in the region where the score is.
  • The problem is for the location of the items (players and ball), Can anyone help me telling which way to take?

Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment