Robert Chuchro chuchro3

## flappybird-DQN
This is a submission of the second training epoch of Flappy Bird using Deep Q-Learning, built with Python and Tensorflow.

The first epoch was trained with the following settings:

First buffer with 1000 random iterations
training episodes: 20000
learning rate for Adam Optimizer: 1e-5

we used our sinusoidal epsilon function with parameters:
starting epsilon: 1.0

## frozenlake
Model free Q-Learning in an MDP style environment

Utilized code from Berkeley's CS188 Reinforcement Learning project

Introduced an epsilon decay to offer a transition between early exploration and late exploitation

QLearning paramters:

alpha = 0.1
epsilon = 1.0

## cartpole
Tabular Q-Learning on CartPole-v1:

Utilized code from Berkeley's CS188 Q-Learning project

Discretized the state space from continuous values

#cart_x, cart_velocity, pole_theta, pole_velocity
[5,10,20,10]

Introduced an epsilon decay to offer a transition between early exploration and late exploitation
	This is a submission of the second training epoch of Flappy Bird using Deep Q-Learning, built with Python and Tensorflow.

	The first epoch was trained with the following settings:

	First buffer with 1000 random iterations
	training episodes: 20000
	learning rate for Adam Optimizer: 1e-5

	we used our sinusoidal epsilon function with parameters:
	starting epsilon: 1.0
	Model free Q-Learning in an MDP style environment

	Utilized code from Berkeley's CS188 Reinforcement Learning project

	Introduced an epsilon decay to offer a transition between early exploration and late exploitation

	QLearning paramters:

	alpha = 0.1
	epsilon = 1.0
	Tabular Q-Learning on CartPole-v1:

	Utilized code from Berkeley's CS188 Q-Learning project

	Discretized the state space from continuous values

	#cart_x, cart_velocity, pole_theta, pole_velocity
	[5,10,20,10]

	Introduced an epsilon decay to offer a transition between early exploration and late exploitation