Skip to content

Instantly share code, notes, and snippets.

"""A training script of Soft Actor-Critic on OpenAI Gym Mujoco environments.
This script follows the settings of https://arxiv.org/abs/1812.05905 as much
as possible.
"""
import argparse
from distutils.version import LooseVersion
import functools
import logging
import sys
@muupan
muupan / how_to_build_deepmind_lab_on_macos_mojave.md
Last active May 14, 2020 12:05
How to build DeepMind Lab on macOS Mojave (as of 2019/02/01)
@muupan
muupan / get_krishna_probability_foundations.sh
Created May 4, 2018 06:50
Download all lecture notes of "Probability Foundations for Electrical Engineers" by Krishna Jagannathan. videos: https://www.youtube.com/playlist?list=PLVhKOwOM3oudtpQG7jf6WrS1GqxTskXsP notes: http://www.ee.iitm.ac.in/~krishnaj/ee5110notes.htm
#!/bin/sh
set -e
# Download all the pdfs
wget -nc http://www.ee.iitm.ac.in/~krishnaj/EE5110_files/notes/lecture1_set_theory.pdf
wget -nc http://www.ee.iitm.ac.in/~krishnaj/EE5110_files/notes/lecture2_Realanalysis.pdf
wget -nc http://www.ee.iitm.ac.in/~krishnaj/EE5110_files/notes/lecture3_cardinality.pdf
wget -nc http://www.ee.iitm.ac.in/~krishnaj/EE5110_files/notes/lecture4_probability_spaces.pdf
wget -nc http://www.ee.iitm.ac.in/~krishnaj/EE5110_files/notes/lecture5_properties%20of%20prob%20measures.pdf
from timeit import default_timer as timer
import chainer
from chainer import cuda
from chainer import function
import chainer.functions as F
from chainer import utils
from chainer.utils import type_check
import cupy
@muupan
muupan / gist:66b42e3a3f755b5c35d3419276c1008e
Created July 24, 2016 10:21
ICML2016 reinforcement-learning-related papers
Guided Cost Learning: Deep Inverse Optimal Control via Policy Optimization
Doubly Robust Off-policy Value Evaluation for Reinforcement Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Learning Simple Algorithms from Examples
Stability of Controllers for Gaussian Process Forward Models
Smooth Imitation Learning for Online Sequence Prediction
On the Analysis of Complex Backup Strategies in Monte Carlo Tree Search
Benchmarking Deep Reinforcement Learning for Continuous Control
Cumulative Prospect Theory Meets Reinforcement Learning: Prediction and Control
Why Most Decisions Are Easy in Tetris—And Perhaps in Other Sequential Decision Problems, As Well
@muupan
muupan / gist:546c239982dc06967436
Last active September 9, 2015 08:22
byobu/tmux configuration; add these lines to ~/.byobu/keybindings.tmux
# Move the current window left/right by Ctrl+Shift+Left/Right
bind-key -n C-S-Left swap-window -t -1
bind-key -n C-S-Right swap-window -t +1
# Move to the prev/next window by Shift+Left/Right
bind-key -n S-Left prev
bind-key -n S-Right next
@muupan
muupan / CMakeLists.txt
Created March 8, 2015 01:29
CMakeLists.txt that makes all the header and source files visible from Qt Creator
# CMakeLists.txt that makes all the header and source files visible from Qt Creator
cmake_minimum_required(VERSION 2.8)
execute_process(
COMMAND find . -name *.cpp -or -name *.hpp -or -name *.h -or -name *.cc -or -name *.hh
COMMAND tr "\n" ";"
OUTPUT_VARIABLE QTCREATOR_SRCS
OUTPUT_STRIP_TRAILING_WHITESPACE
)
add_custom_target(qtcreator SOURCES ${QTCREATOR_SRCS})
@muupan
muupan / roulette.cpp
Created December 22, 2014 09:51
roulette wheel selection using std::discrete_distribution
#include <array>
#include <iostream>
#include <random>
int main() {
constexpr std::array<double, 3> fitness_values = {1, 1.5, 2};
// Constructor is O(n)
std::discrete_distribution<std::size_t> dist(fitness_values.begin(), fitness_values.end());
std::array<size_t, 3> table;
table.fill(0);
@muupan
muupan / dqn.prototxt
Created October 20, 2014 23:22
A Deep Q-Network definition for Caffe
layers {
name: "frames_input_layer"
type: MEMORY_DATA
top: "frames"
top: "dummy1"
memory_data_param {
batch_size: 32
channels: 4
height: 84
width: 84
#include <iostream>
#include <memory>
#include <random>
#include <caffe/caffe.hpp>
#include <glog/logging.h>
int main(int argc, char** argv) {
// glogの初期化
google::InitGoogleLogging(argv[0]);