This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I tested various approaches and found that properly tuned DQN plus cross-entropy pool solves | |
this problem in the fastest way. | |
By DQN+CE I mean common DQN technique, but batches sampled each time for experience reply | |
are selected proportionally to how good their appropriate episode was compared to | |
the worst one with -200 total reward. | |
In common cross-entropy we basically select the best episodes and learn network | |
to correctly predict action based on those steps. This drops experience for the | |
wrong/non-existing steps and actions, which might be good to learn too. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#!/usr/bin/env bash | |
set -x | |
apt-get update | |
apt-get install -y git-core devscripts gcc g++ equivs gdb | |
BASE_DIR=`pwd` | |
ulimit -c unlimited |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"id": "8a5f4640935...", | |
"csum": "a15cf7eee7ba4fd90f...", | |
"filename": "/tmp/blob3/data-0.0", | |
"size": 9, | |
"offset-within-data-file": 144, | |
"mtime": { | |
"time": "2013-12-05 MSK 19:40:35.731166", | |
"time-raw": "1386258035.731166" | |
}, |