Skip to content

Instantly share code, notes, and snippets.

@ppwwyyxx
Last active May 23, 2018 09:29
Show Gist options
  • Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
Save ppwwyyxx/713a873a50ef83712e2909fb835a1fb8 to your computer and use it in GitHub Desktop.
placeholder for OpenAI Gym submission

Use A3C (asynchronous advantage actor-critic) written in TensorFlow. Training code, model & evaluation code at this repo

Gist doesn't have notifications, please use repo issues to discuss.

@dylanthomas
Copy link

It works !!! Many thanks !!

@dylanthomas
Copy link

Over the weekend, I trained your A3C for 390 epochs , and related to that, can I ask you 2 questions?

First,
mean-score went up to around 500, but it stayed there. That is, it did not go near 700 as in your results. Can you guess why? lr not selected optimally ? Initialization not optimal ?

Second,
your A3C looks like A3C.FF. Am I correct? Have you also implemented A3C.LSTM ?

@ppwwyyxx
Copy link
Author

ppwwyyxx commented Nov 28, 2016

The 700 one is trained with DeepMind settings, not Gym settings. For gym my average score is 625.
I don't have much clues for your questions on the score. One guess is that I actually trained the submission model with 4 GPUs (two for training and two for simulation). In that case 1. the learning rate is divided by 2 inside AsyncMultiGPUTrainer; and 2. Two training threads will asynchronously update the parameters which should improve the model.

Yes, I have a a3c-lstm implementation which can reach a similar score on Breakout. But I didn't run a lot of experiments and not sure if my implementation is better than a3c-ff (as in the paper) so I didn't release them.

@dylanthomas
Copy link

That helps. But, what do you mean by DeepMind settings ? ALE + 4 frame skips, instead of Gym with k={2, 3, 4}?

@ppwwyyxx
Copy link
Author

Yes, apart from other minor differences, random frame skip might be most relevant to performance.

@dylanthomas
Copy link

The number of actions appear to be different... For Breakout, in case of ALE, it is 3, but in Gym, it's 6. Wouldn't this matter ? Did you just use ALE with the DeepMind setting or were you adjust Gym somehow to act like ALE ?

@ppwwyyxx
Copy link
Author

Yes I mentioned these differences. The number of actions also make it harder in gym.
For the result here I use deepmind settings and for gym submissions I used gym.

@dylanthomas
Copy link

Wonderful. Thank YOU !

@Nhorning
Copy link

Hey, Kangaroo v.0 seems to get stuck over in the corner trying to catch things that fall until it gets killed. Is max session time already a training perimeter, and if not, do you think that could help in this case?

@lululun20
Copy link

Hey,
I just want to ask a very dumb question: I have read the a3c paper in which they kind of boasted for their good performance when running on a 16 core CPU. How come here we are talking about GPU...
Thank you in avance!

@ppwwyyxx
Copy link
Author

It has better performance on GPU.

@cyrsis
Copy link

cyrsis commented Dec 1, 2017

[1201 10:47:55 @monitor.py:363] max_score: 863
[1201 10:47:55 @monitor.py:363] mean_score: 590.14

This is my first work out with GYM
Ran for 2 days with and stable , pretty good with single 1070 w8G Ram

it still running,

when I do

./train-atari.py --task gen_submit --load Breakout-v0.npy --env Breakout-v0 --output output_dir

It said

AssertionError: Breakout-v0.npy"

Do I need to wait for the training finish to get Breakout-v0.npy ????

@pablosjb
Copy link

Hello, i hope i am not bothering asking this here. I am kind of new here and I would like the following:

  • I am trying to solve the game "tennis-v0" in which the data (observation) is the image (RGB 3D-array) and I want first to extract features such as players position, ball position and score.
  • For the score, i am thinking about applying a text recognition algortihm in the region where the score is.
  • The problem is for the location of the items (players and ball), Can anyone help me telling which way to take?

Additionally I am preparing a dataset of the players in different shapes to then paste them in the field (previously the players erased) to have a classified dataset. What do you think about this.?? Thank you and regards.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment