Skip to content

Instantly share code, notes, and snippets.

@floodsung
floodsung / CEM.py
Created June 2, 2016 08:02
This solution is mainly from John Schulman's Deep Reinforcement Learning Lab script in Machine Learning Summer School 2016
import numpy as np
import gym
from gym.spaces import Discrete,Box
# -------------------------------------------
# Policies
# -------------------------------------------
class DeterministicDiscreteActionLinearPolicy(object):
require 'midilib'
seq = MIDI::Sequence.new()
File.open(ARGV[0], 'rb') { |file| seq.read(file) }
events = [ ]
id = 0
seq.tracks.each do |track|
@sorenbouma
sorenbouma / cem.py
Last active April 17, 2018 05:54
This is a basic python implementation of the Cross-Entropy Method for reinforcement learning on OpenAI gym's CartPole environment.
import gym
import numpy as np
import matplotlib.pyplot as plt
env = gym.make('CartPole-v0')
env.render(close=True)
#vector of means(mu) and standard dev(sigma) for each paramater
mu=np.random.uniform(size=state.shape)
sigma=np.random.uniform(low=0.001,size=state.shape)
@zsal
zsal / CEMgym.py
Created June 27, 2016 02:57
John Schulman MLSS Lab 1: CartPole-v0
#Most code from John Schulman's MLSS talk on Deep Reinforcement Learning
#http://rl-gym-doc.s3-website-us-west-2.amazonaws.com/mlss/lab1.html#szitalorincz06
import numpy as np
import gym
from gym.spaces import Discrete, Box
# ================================================================
# Policies
# ================================================================
@fnurl
fnurl / docsearch-pageindexer.py
Last active January 15, 2020 07:07
A script that produces a JSON page index file for markdown files (extension `.md`) in a directory and its subdirectories (e.g. a Hugo site's (https://gohugo.io/) `content` directory) for use with Algolia Docsearch (https://github.com/algolia/docsearch).
import os
import sys
import yaml
import json
# base url to use
base_url = "http://localhost:1313"
# The attribute mapping for docsearch.
#
@kashif
kashif / cem.md
Last active November 7, 2023 12:56
Cross Entropy Method

Cross Entropy Method

How do we solve for the policy optimization problem which is to maximize the total reward given some parametrized policy?

Discounted future reward

To begin with, for an episode the total reward is the sum of all the rewards. If our environment is stochastic, we can never be sure if we will get the same rewards the next time we perform the same actions. Thus the more we go into the future the more the total future reward may diverge. So for that reason it is common to use the discounted future reward where the parameter discount is called the discount factor and is between 0 and 1.

A good strategy for an agent would be to always choose an action that maximizes the (discounted) future reward. In other words we want to maximize the expected reward per episode.

@squiter
squiter / install_ruby
Created October 7, 2016 22:26
How to install ruby with TCL and TK for Coursera Programming Languages Part C
#!/bin/bash
set -eou pipefail
version=8.6.4.1
patchinfo=299124-linux-x86_64-threaded
dir=ActiveTcl$version.$patchinfo
package=$dir.tar.gz
url=http://downloads.activestate.com/ActiveTcl/releases/$version/$package
@eph2795
eph2795 / cartpole.py
Created February 18, 2017 19:50
CartPole
import gym
from tqdm import tqdm_notebook
import numpy as np
import matplotlib.pyplot as plt
import warnings
warnings.filterwarnings('ignore')
def get_random_policy():
return np.random.choice(n_actions, tuple(bins))
#
# add to bashrc
#
# files
alias sampletree='mkdir -p sample/{train,test,valid}'
lsn(){ matchdir=`pwd`/$2; find $matchdir -type f | grep -v sample | shuf -n $1 | awk -F`pwd` '{print "."$NF}' ; }
# shuffle mv/cp
cpn(){ matchdir=`pwd`/$2; find $matchdir -type f | grep -v sample | shuf -n $1 | awk -F`pwd` '{print "."$NF" sample"$NF}' | xargs -t -n2 cp ; }
mvn(){ matchdir=`pwd`/$2; todir=`pwd`/$3; find $matchdir -type f | grep -v sample | shuf -n $1 | awk -F`pwd` -v todir="$todir" '{print $0" "todir}' | xargs -t -n2 mv ; }
@pat-coady
pat-coady / racetrack_sarsa.ipynb
Last active April 21, 2024 21:01
Sutton and Barto Racetrack: Sarsa
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.