Skip to content

Instantly share code, notes, and snippets.

@jcoreyes
Last active April 28, 2016 22:27
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save jcoreyes/f3193106fd6f7c89f5074718e9d06013 to your computer and use it in GitHub Desktop.
Save jcoreyes/f3193106fd6f7c89f5074718e9d06013 to your computer and use it in GitHub Desktop.
Simple-DQN Writeup
./train.sh Breakout-v0 --environment gym --screen_width 40 --screen_height 52
python src/test_gym.py Breakout-v0 <output_folder> --load_weights <saved_model_pkl>

Code used to obtain these results can be found at the url https://github.com/NervanaSystems/simple_dqn, commit 31a92a9. This code runs with Neon commit 344372b. Training and test scripts are included in scripts.txt above. Note that for training, the screen width and screen height must be specified as (40, 52). Default training paramters are used as set in src/main.py. This model was trained for 77 epochs which will take roughly 15 hours to train on a Titan X GPU.

  • learning_rate=0.00025: Learning rate
  • discount_rate=0.99: Discount rate for future rewards
  • batch_size=32: Batch size for neural network
  • optimizer=rmsprop: Network optimization algorithm
  • decay_rate=0.95: Decay rate for RMSProp algorithm
  • clip_error=1: Clip error term in update.
  • train_steps=250000: How many training steps per epoch
  • epochs=77: How many epochs to run
@karpathy
Copy link

karpathy commented Apr 27, 2016

Few notes:

The quotes in the command inside scripts.txt, that is ./train.sh “Breakout-v0” --environment gym --screen_width 40 --screen_height 52 are funny utf-8 quotes and will cause a crash if copy pasted verbatim. (we are adding this common failure case on our end though)

The instructions pointed to for rlgym only contain explicit code for installing the base install that does not include ATARI envs. To run on ATARI envs it is also needed to install them explicitly: pip install -e .[atari]

Please also kindly provide approximate runtime for anyone who might wish to reproduce.

@karpathy
Copy link

I finished reproducing this result (Gym link.)
My machine has 16 Intel(R) Xeon(R) CPU E5-2630 v3 @ 2.40GHz, 64GB memory, on Ubuntu 14.04 (trusty), with a Titan Z and NVIDIA driver 352.39. One epoch of training with this machine takes 27 minutes. 77 epochs of training add up to 35 hours.

Here is my output:

$ python src/test_gym.py Breakout-v0 gymresults --load_weights snapshots/Breakout-v0_77.pkl
[...truncated...]
Episode 1 finished after 1639 timesteps with reward 65.0
Episode 2 finished after 1906 timesteps with reward 57.0
Episode 3 finished after 1381 timesteps with reward 42.0
Episode 4 finished after 1773 timesteps with reward 69.0
Episode 5 finished after 1768 timesteps with reward 73.0
Episode 6 finished after 1584 timesteps with reward 44.0
Episode 7 finished after 1416 timesteps with reward 51.0
Episode 8 finished after 1321 timesteps with reward 64.0
Episode 9 finished after 1569 timesteps with reward 47.0
Episode 10 finished after 1337 timesteps with reward 83.0
Episode 11 finished after 808 timesteps with reward 20.0
Episode 12 finished after 2055 timesteps with reward 148.0
Episode 13 finished after 1683 timesteps with reward 66.0
Episode 14 finished after 1460 timesteps with reward 53.0
Episode 15 finished after 1701 timesteps with reward 71.0
Episode 16 finished after 1516 timesteps with reward 60.0
Episode 17 finished after 2396 timesteps with reward 110.0
Episode 18 finished after 1797 timesteps with reward 74.0
Episode 19 finished after 2021 timesteps with reward 65.0
Episode 20 finished after 2201 timesteps with reward 203.0
Episode 21 finished after 1807 timesteps with reward 53.0
Episode 22 finished after 1037 timesteps with reward 22.0
Episode 23 finished after 1573 timesteps with reward 59.0
Episode 24 finished after 1625 timesteps with reward 47.0
Episode 25 finished after 1625 timesteps with reward 42.0
Episode 26 finished after 1486 timesteps with reward 52.0
Episode 27 finished after 1997 timesteps with reward 72.0
Episode 28 finished after 918 timesteps with reward 16.0
Episode 29 finished after 1501 timesteps with reward 52.0
Episode 30 finished after 1599 timesteps with reward 37.0
Episode 31 finished after 1345 timesteps with reward 37.0
Episode 32 finished after 1879 timesteps with reward 101.0
Episode 33 finished after 1816 timesteps with reward 83.0
Episode 34 finished after 1936 timesteps with reward 99.0
Episode 35 finished after 1609 timesteps with reward 58.0
Episode 36 finished after 2104 timesteps with reward 105.0
Episode 37 finished after 1777 timesteps with reward 75.0
Episode 38 finished after 1559 timesteps with reward 52.0
Episode 39 finished after 1495 timesteps with reward 91.0
Episode 40 finished after 2136 timesteps with reward 91.0
Episode 41 finished after 1663 timesteps with reward 84.0
Episode 42 finished after 1872 timesteps with reward 67.0
Episode 43 finished after 2001 timesteps with reward 64.0
Episode 44 finished after 2026 timesteps with reward 78.0
Episode 45 finished after 998 timesteps with reward 21.0
Episode 46 finished after 1917 timesteps with reward 99.0
Episode 47 finished after 2078 timesteps with reward 112.0
Episode 48 finished after 1167 timesteps with reward 42.0
Episode 49 finished after 2082 timesteps with reward 299.0
Episode 50 finished after 1582 timesteps with reward 64.0
Episode 51 finished after 998 timesteps with reward 38.0
Episode 52 finished after 1868 timesteps with reward 74.0
Episode 53 finished after 1728 timesteps with reward 53.0
Episode 54 finished after 1000 timesteps with reward 22.0
Episode 55 finished after 2079 timesteps with reward 196.0
Episode 56 finished after 1828 timesteps with reward 79.0
Episode 57 finished after 1771 timesteps with reward 69.0
Episode 58 finished after 1716 timesteps with reward 64.0
Episode 59 finished after 1561 timesteps with reward 61.0
Episode 60 finished after 1917 timesteps with reward 118.0
Episode 61 finished after 1960 timesteps with reward 92.0
Episode 62 finished after 1651 timesteps with reward 55.0
Episode 63 finished after 1661 timesteps with reward 55.0
Episode 64 finished after 1732 timesteps with reward 50.0
Episode 65 finished after 1222 timesteps with reward 45.0
Episode 66 finished after 839 timesteps with reward 18.0
Episode 67 finished after 1203 timesteps with reward 30.0
Episode 68 finished after 1583 timesteps with reward 104.0
Episode 69 finished after 2497 timesteps with reward 118.0
Episode 70 finished after 2043 timesteps with reward 79.0
Episode 71 finished after 1031 timesteps with reward 49.0
Episode 72 finished after 2320 timesteps with reward 128.0
Episode 73 finished after 1493 timesteps with reward 42.0
Episode 74 finished after 1598 timesteps with reward 85.0
Episode 75 finished after 2235 timesteps with reward 154.0
Episode 76 finished after 1299 timesteps with reward 31.0
Episode 77 finished after 1594 timesteps with reward 65.0
Episode 78 finished after 2028 timesteps with reward 71.0
Episode 79 finished after 2078 timesteps with reward 115.0
Episode 80 finished after 1445 timesteps with reward 35.0
Episode 81 finished after 1380 timesteps with reward 65.0
Episode 82 finished after 1756 timesteps with reward 103.0
Episode 83 finished after 1703 timesteps with reward 53.0
Episode 84 finished after 1785 timesteps with reward 140.0
Episode 85 finished after 2402 timesteps with reward 101.0
Episode 86 finished after 2007 timesteps with reward 87.0
Episode 87 finished after 1860 timesteps with reward 57.0
Episode 88 finished after 2023 timesteps with reward 87.0
Episode 89 finished after 1946 timesteps with reward 76.0
Episode 90 finished after 1907 timesteps with reward 74.0
Episode 91 finished after 1603 timesteps with reward 60.0
Episode 92 finished after 1552 timesteps with reward 59.0
Episode 93 finished after 1852 timesteps with reward 71.0
Episode 94 finished after 1483 timesteps with reward 41.0
Episode 95 finished after 1130 timesteps with reward 28.0
Episode 96 finished after 2432 timesteps with reward 137.0
Episode 97 finished after 2318 timesteps with reward 99.0
Episode 98 finished after 1696 timesteps with reward 103.0
Episode 99 finished after 1461 timesteps with reward 44.0
Episode 100 finished after 1698 timesteps with reward 46.0
Avg reward 73.65

After upload to Gym we also get the standard deviation: Best 100-episode average reward was 73.65 ± 4.58.
This is slightly lower than the reported 86.95 ± 5.10 by the original submission.

I also ran the code until epoch 98 instead of the reported 77. With that model checkpoint I obtain an average reward of 119.8 (submission here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment