To apply (deep) reinforcement learning to other real problems (e.g. energy management, traffic optimization etc.), you first need a simulation environment to learn/train autonomous agent.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
""" Trains an agent with (stochastic) Policy Gradients on Pong. Uses OpenAI Gym. """ | |
import numpy as np | |
import cPickle as pickle | |
import gym | |
# hyperparameters | |
H = 200 # number of hidden layer neurons | |
batch_size = 10 # every how many episodes to do a param update? | |
learning_rate = 1e-4 | |
gamma = 0.99 # discount factor for reward |
aaa
Preliminary Updates and Installations
(http://markus.com/install-theano-on-aws/)
sudo apt-get update
sudo apt-get -y dist-upgrade
- Launch the g2.2xlarge instance, select a security group with SSH rule
- ssh to the instance
ssh -i [pem file] ubuntu@[Public DNS of the instance]
- Update the locale
export LC_ALL="en_US.UTF-8"
export LC_CTYPE="en_US.UTF-8"
sudo dpkg-reconfigure locales
- Novi Quadrianto, Alex J. Smola, Tiberio S. Caetano and Quoc V. Le, Estimating Labels from Label Proportions, JMLR 2009.
- Giorgio Patrini, Richard Nock, Paul Rivera and Tiberio Caetano, (Almost) No Label No Cry, NIPS 2014 [Supplement] [Code].
- Richard Nock, Giorgio Patrini and Arik Friedman, Rademacher Observations, Private Data and Boosting, ICML 2015, [PDF] [Ariv Long Version].
##Reference
- M A H Dempster and E A Medova, Planning for Retirement: Asset Liability Management for Individuals, Asset Liability Management 2010 Yearbook
- E. A. MEDOVA, J. K. MURPHY, A. P. OWEN and K. REHMAN, Individual Asset Liability Management Quantitative Finance, 2008
NewerOlder