patrickthompson/GST_Taxiv0_2016_1007

## GST_Taxiv0_2016_1007
To replicate my evaluation result for Taxi-v0, do the following in order:

Import Libraries:
  import the gym library
  import the sys library
  import the random library

Set constants:
  Set the current episode count to zero
  Set gamma to .15
  Set starting epsilon to .5
  Set the epsilon decay at a rate of .999

Set global variables:
  Set Q as a blank array
  Set the environment variable to gym.make "Taxi-V1"

Populate Q array:
  Append to the q array one blank array for every available space in gym.observation_space
  Append to each blank q array one 0 list item for every available action in gym.action_space  (should look like [0,0,0..] list)

Execute the learning loop:
  While the episode count is less than 10,000, loop the following:
    Set the [previous state] variable to zero
    Set the [previous action] variable to zero
    Set [observation] variable to env.reset()
    While the test count is less than 10,000, loop the following:
      Set the next action:
        If the maximum of all actions in the current state is 0, then select a random action from all available actions
        Otherwise,
          If a random number between 0 and 1 is more than epsilon (starting .5), then select a random action
            Otherwise, choose a "greedy" action:
              Sort all of the available action values based on the current state, reverse order (largest is first in list)
              Loop through all of the available actions
                If the value of the available action in the loop matches the first item in the list, select the action id as our next action
    Set the [previous state] to the current state
    Set the [previous action] to the current action
    Update the Q matrix:
      Set the value of the q matrix in the old state and action (q[state][action] equal to the current reward + gamma times the maximum of q for the current action row
    If we are 'done':
      Set epsilon to epsilon * epsilon_decay (effectively reducing the odds of a random roll)
    Increment the test variable
Increment the episode variable

Crack open a beer!  Is anyone reading this?  I doubt it.  Prove me wrong: If you have read this, message me.

Patrick
	To replicate my evaluation result for Taxi-v0, do the following in order:

	Import Libraries:
	import the gym library
	import the sys library
	import the random library

	Set constants:
	Set the current episode count to zero
	Set gamma to .15
	Set starting epsilon to .5
	Set the epsilon decay at a rate of .999

	Set global variables:
	Set Q as a blank array
	Set the environment variable to gym.make "Taxi-V1"

	Populate Q array:
	Append to the q array one blank array for every available space in gym.observation_space
	Append to each blank q array one 0 list item for every available action in gym.action_space (should look like [0,0,0..] list)

	Execute the learning loop:
	While the episode count is less than 10,000, loop the following:
	Set the [previous state] variable to zero
	Set the [previous action] variable to zero
	Set [observation] variable to env.reset()
	While the test count is less than 10,000, loop the following:
	Set the next action:
	If the maximum of all actions in the current state is 0, then select a random action from all available actions
	Otherwise,
	If a random number between 0 and 1 is more than epsilon (starting .5), then select a random action
	Otherwise, choose a "greedy" action:
	Sort all of the available action values based on the current state, reverse order (largest is first in list)
	Loop through all of the available actions
	If the value of the available action in the loop matches the first item in the list, select the action id as our next action
	Set the [previous state] to the current state
	Set the [previous action] to the current action
	Update the Q matrix:
	Set the value of the q matrix in the old state and action (q[state][action] equal to the current reward + gamma times the maximum of q for the current action row
	If we are 'done':
	Set epsilon to epsilon * epsilon_decay (effectively reducing the odds of a random roll)
	Increment the test variable
	Increment the episode variable

	Crack open a beer! Is anyone reading this? I doubt it. Prove me wrong: If you have read this, message me.

	Patrick