Skip to content

Instantly share code, notes, and snippets.

@ppwwyyxx
Last active May 23, 2018 09:29
Show Gist options
  • Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
placeholder for OpenAI Gym submission

Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo

Gist doesn't have notifications, please use repo issues to discuss.

@richardxiong
Copy link

@ppwwyyxx
Hello! I have been studying the Tutankham using your code. Could you tell me how you plot the "training curve on break out"? since I hope to plot a similar figure on Tutankham, in order to monitor the training process. Thanks!

@ppwwyyxx
Copy link
Author

After you started training all the statistics will be in train_log/some_dir/stat.json
You can parse the json and plot it using your tools, or open the directory with tensorboard, or plot it with my plotting tools:
cat train_log/some_directory/stat.json | jq '.[] | .mean_score // empty' | scripts/plot-point.py

@dylanthomas
Copy link

I am trying to train your A3C from scratch, but got the following error. Can you guide me to the right direction ?
Million thanks in advance,

(py35) ➜ OpenAIGym git:(master) ✗ ./train-atari.py --env Breakout-v0 --gpu 0
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:111] successfully opened CUDA library libcurand.so locally
[2016-11-25 11:06:13,976] Making new env: Breakout-v0
Traceback (most recent call last):
File "./train-atari.py", line 247, in
train_tower = range(nr_gpu)[:-nr_gpu/2] or [0]
TypeError: slice indices must be integers or None or have an index method

@ppwwyyxx
Copy link
Author

@dylanthomas Sorry, that's a python-3 compatibility problem. You need to replace nr_gpu/2 by nr_gpu//2. I just fixed it in the project.

@dylanthomas
Copy link

Thank you, but now I am getting
Traceback (most recent call last):
File "./train-atari.py", line 255, in
config = get_config()
File "./train-atari.py", line 184, in get_config
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "./train-atari.py", line 184, in
procs = [MySimulatorWorker(k, namec2s, names2c) for k in range(SIMULATOR_PROC)]
File "/home/john/dev/tensorpack/tensorpack/RL/simulator.py", line 70, in init
super(SimulatorProcessStateExchange, self).init(idx)
File "/home/john/dev/tensorpack/tensorpack/RL/simulator.py", line 52, in init
self.name = self.identity = u'simulator-{}'.format(self.idx).encode('utf-8')
File "/home/john/anaconda3/envs/py35/lib/python3.5/multiprocessing/process.py", line 143, in name
assert isinstance(name, str), 'name must be a string'
AssertionError: name must be a string

Another compatibility problem, maybe ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 25, 2016

Yes.. it is an unicode/str compatibility issue.. I just pushed another fix. I don't have a python3 environment for testing now, but hopefully it'll work..

@dylanthomas
Copy link

It works !!! Many thanks !!

@dylanthomas
Copy link

Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?

First,
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?

Second,
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 28, 2016

The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.

Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.

@dylanthomas
Copy link

That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?

@ppwwyyxx
Copy link
Author

Yes, apart from other minor differences, random frame skip might be most relevant to performance.

@dylanthomas
Copy link

The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?

@ppwwyyxx
Copy link
Author

Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.

@dylanthomas
Copy link

Wonderful. Thank YOU !

@Nhorning
Copy link

Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?

@lululun20
Copy link

Hey,
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!

@ppwwyyxx
Copy link
Author

It has better performance on GPU.

@cyrsis
Copy link

cyrsis commented Dec 1, 2017

[1201 10:47:55 @monitor.py:363] max_score: 863
[1201 10:47:55 @monitor.py:363] mean_score: 590.14

This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram

it still running,

when I do

./train-atari.py --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir

It said

AssertionError: Breakout-v0.npy"

Do I need to wait for the training finish to get Breakout-v0.npy ????

@pablosjb
Copy link

Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:

  • I am trying to solve the game "tennis-v0" in which the data (observation) is the image (RGB 3D-array) and I want first to extract features such as players position, ball position and score.
  • For the score, i am thinking about applying a text recognition algortihm in the region where the score is.
  • The problem is for the location of the items (players and ball), Can anyone help me telling which way to take?

Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment