Skip to content

Instantly share code, notes, and snippets.

@bioothod
bioothod / Mountain Car solution
Last active April 21, 2017 20:05
MountainCar solution
I tested various approaches and found that properly tuned DQN plus cross-entropy pool solves
this problem in the fastest way.
By DQN+CE I mean common DQN technique, but batches sampled each time for experience reply
are selected proportionally to how good their appropriate episode was compared to
the worst one with -200 total reward.
In common cross-entropy we basically select the best episodes and learn network
to correctly predict action based on those steps. This drops experience for the
wrong/non-existing steps and actions, which might be good to learn too.
@bioothod
bioothod / vagrant_elliptics
Created September 17, 2014 01:50
Vagrant file to build elliptics
#!/usr/bin/env bash
set -x
apt-get update
apt-get install -y git-core devscripts gcc g++ equivs gdb
BASE_DIR=`pwd`
ulimit -c unlimited
{
"id": "8a5f4640935...",
"csum": "a15cf7eee7ba4fd90f...",
"filename": "/tmp/blob3/data-0.0",
"size": 9,
"offset-within-data-file": 144,
"mtime": {
"time": "2013-12-05 MSK 19:40:35.731166",
"time-raw": "1386258035.731166"
},