Skip to content

Instantly share code, notes, and snippets.

@Itsukara
Last active October 23, 2016 03:25
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save Itsukara/5d9e3bc6163ee8202d33b7bc48ec6da1 to your computer and use it in GitHub Desktop.
Save Itsukara/5d9e3bc6163ee8202d33b7bc48ec6da1 to your computer and use it in GitHub Desktop.
How to reproduce Montezuma's Revenge score by Itsukara
# See followings for explanation of this algorithm
# * https://github.com/Itsukara/async_deep_reinforce/blob/master/README.md (English)
# * http://www.slideshare.net/ItsukaraIitsuka/drl-challenge-on-montezumas-revenge (English)
# * http://www.slideshare.net/ItsukaraIitsuka/deepmind20166-unifying-countbased-exploration-and-intrinsic-motivation-pseudocount-montezumas-revenge Japanese)
# Preparation of environment
sudo apt-get install git
git clone https://github.com/Itsukara/async_deep_reinforce.git
mkdir Download
cp async_deep_reinforce/gcp-install/gcp-install-a3c-env.tgz Download/
cd Download/
tar zxvf gcp-install-a3c-env.tgz
bash -x install-Anaconda.sh
. ~/.bashrc
bash -x install-tensorflow.sh
bash -x install-opencv3.sh
bash -x install-ALE.sh
bash -x install-additionals.sh
cd ../async_deep_reinforce
# Training
./run-option-gym montezuma-j-tes30-b0020-ff-fs2 &> log.montezuma-j-tes30-b0020-ff-fs2
# You can stop training by control-c
# In this case, latest checkpoint is not that of best average score
# You have to search checkpoint of best average score from log.montezuma-j-tes30-b0020-ff-fs2
# You can resume traning by above command ( ./run-option-gym ... )
# The following command displays training curve while training or after training
python plot.py log.montezuma-j-tes30-b0020-ff-fs2
# The training curve is periodically updated (every 10 seconds)
# plot.py has many options. The following is usefull for analysis of training status
# -i {r,lives,s,tes,v,pr} option changes information to display
# The following command creates high-score-play-movies while training or after training
./run-avconv-and-rm screen.new-record
# The following command creates new-room-visiting-movies while training or after training
./run-avconv-and-rm screen.new-room
# The following command output visited rooms
python rooms.py log.montezuma-j-tes30-b0020-ff-fs2
# The follwing command report various information to /tmp periodically (every 5 minutes)
report log.montezuma-j-tes30-b0020-ff-fs2
# It is usefull when you run training in many machines.
# You just run "nohuu report ..." in every machine and you can watch every training curves
# by periodically scp "/tmp/*.png" to your local machine. (I'm using a shell script for do)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment