denny0323/0. FrozenLake-v0(Dummy Q-Table).ipynb

## 0. FrozenLake-v0(Dummy Q-Table).ipynb
{
 "cells": [
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# Q-Table Learning in FrozenLake(Dummy)"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "from OpenAI gymm"
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "# import package\nimport gym\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom gym.envs.registration import register\nimport random as pr",
   "execution_count": 1,
   "outputs": []
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "register(\n    id = 'FrozenLake-v3',\n    entry_point = 'gym.envs.toy_text:FrozenLakeEnv',\n    kwargs={'map_name': '4x4',\n           'is_slippery' : False }\n)",
   "execution_count": 2,
   "outputs": []
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "env = gym.make('FrozenLake-v3')",
   "execution_count": 3,
   "outputs": []
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "FrozenLake env review)\n\n* Structure : a 4 X 4 grid of blocks.\n* Blocks of State : Start / Goal / Safe-Frozen / Dangerous hole.\n* Objective : To have an agent learn to navigate from Start Block to Goal Block without moving onto a hole.\n* The Catch : __do not exist__"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# Q-Table Learning Algorithm"
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "#Initialize table with all zeros\nQ = np.zeros([env.observation_space.n, env.action_space.n])",
   "execution_count": 4,
   "outputs": []
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "Q",
   "execution_count": 5,
   "outputs": [
    {
     "output_type": "execute_result",
     "execution_count": 5,
     "data": {
      "text/plain": "array([[ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.],\n       [ 0.,  0.,  0.,  0.]])"
     },
     "metadata": {}
    }
   ]
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Q -Table of FrozenLake env. has a 16 X 4 grid of shape. \n(one for each block : 16) X (Action : 4) // up, down, left or right"
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "as above result, we can know\n* action 0 : Left\n* action 1 : Down\n* action 2 : Right\n* action 3 : UP"
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "def rargmax(vector):\n    m = np.amax(vector)\n    indices = np.nonzero(vector == m)[0]\n    return pr.choice(indices)",
   "execution_count": 6,
   "outputs": []
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "# Set learning parameters\nnum_episodes = 2000\n\n#create lists to contain total rewards and steps per episode\nrList = [] # reword list\n\nfor i in range(num_episodes):\n    s = env.reset() # Reset environment and get first new observation\n    rAll = 0        # total reward\n    d = False       # end of precess\n\n    #The Q-Table learning algorithm\n    while not d:\n        a = rargmax(Q[s,:])        \n        s1, r, d, info = env.step(a)\n\n        # Update Q-Table with new knowledge(=reward)\n        Q[s,a] = r + np.max(Q[s1,:])\n\n        rAll += r # add reward \n        s = s1    # move to next state\n\n    rList.append(rAll)",
   "execution_count": 7,
   "outputs": []
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "print (\"Score over time: \" +  str(sum(rList)/num_episodes))",
   "execution_count": 8,
   "outputs": [
    {
     "output_type": "stream",
     "text": "Score over time: 0.9325\n",
     "name": "stdout"
    }
   ]
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "print (\"Final Q-Table Values\")\nprint (np.round(Q,3))",
   "execution_count": 9,
   "outputs": [
    {
     "output_type": "stream",
     "text": "Final Q-Table Values\n[[ 0.  0.  1.  0.]\n [ 0.  0.  1.  0.]\n [ 0.  1.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  1.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  1.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  0.  0.]\n [ 0.  0.  1.  0.]\n [ 0.  0.  0.  0.]]\n",
     "name": "stdout"
    }
   ]
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "Let me check more detail.\nI'll get the index of the maximum prob. in every row, then translate to action."
  },
  {
   "metadata": {},
   "cell_type": "markdown",
   "source": "# Test"
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "num_episodes_test = 5\n\n#create lists to contain total rewards and steps per episode\nrList_test = [] # reword list\npath=[] # path in each episoed\nfor i in range(num_episodes_test):\n    s = env.reset() # Reset environment and get first new observation\n    rAll = 0        # total reward\n    d = False       # end of precess\n    sDict = {}      # state-action dict\n    #The Q-Table learning algorithm\n    while not d:\n        a = np.argmax(Q[s,:])\n        sDict[s] = a\n        \n        s1, r, d, info = env.step(a)\n\n        rAll += r # add reward \n        s = s1    # move to next state\n    path.append(sDict)\n    rList_test.append(rAll)",
   "execution_count": 10,
   "outputs": []
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "path",
   "execution_count": 11,
   "outputs": [
    {
     "output_type": "execute_result",
     "execution_count": 11,
     "data": {
      "text/plain": "[{0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2}]"
     },
     "metadata": {}
    }
   ]
  },
  {
   "metadata": {
    "trusted": true
   },
   "cell_type": "code",
   "source": "# path\naction_set=['Left', 'Down', 'Right', 'Up']\nprint(\"-----Result of what the agent did-------\\n\")\nfor i in path[0].keys():\n    action = path[0][i]\n    print('At state %d, the agent move %s' %(i, action_set[action]))",
   "execution_count": 13,
   "outputs": [
    {
     "output_type": "stream",
     "text": "-----Result of what the agent did-------\n\nAt state 0, the agent move Right\nAt state 1, the agent move Right\nAt state 2, the agent move Down\nAt state 6, the agent move Down\nAt state 10, the agent move Down\nAt state 14, the agent move Right\n",
     "name": "stdout"
    }
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "name": "python3",
   "display_name": "Python [default]",
   "language": "python"
  },
  "language_info": {
   "nbconvert_exporter": "python",
   "file_extension": ".py",
   "codemirror_mode": {
    "version": 3,
    "name": "ipython"
   },
   "name": "python",
   "mimetype": "text/x-python",
   "pygments_lexer": "ipython3",
   "version": "3.5.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "# Q-Table Learning in FrozenLake(Dummy)"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "from OpenAI gymm"
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "# import package\nimport gym\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom gym.envs.registration import register\nimport random as pr",
	"execution_count": 1,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "register(\n id = 'FrozenLake-v3',\n entry_point = 'gym.envs.toy_text:FrozenLakeEnv',\n kwargs={'map_name': '4x4',\n 'is_slippery' : False }\n)",
	"execution_count": 2,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "env = gym.make('FrozenLake-v3')",
	"execution_count": 3,
	"outputs": []
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "FrozenLake env review)\n\n* Structure : a 4 X 4 grid of blocks.\n* Blocks of State : Start / Goal / Safe-Frozen / Dangerous hole.\n* Objective : To have an agent learn to navigate from Start Block to Goal Block without moving onto a hole.\n* The Catch : __do not exist__"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "# Q-Table Learning Algorithm"
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "#Initialize table with all zeros\nQ = np.zeros([env.observation_space.n, env.action_space.n])",
	"execution_count": 4,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "Q",
	"execution_count": 5,
	"outputs": [
	{
	"output_type": "execute_result",
	"execution_count": 5,
	"data": {
	"text/plain": "array([[ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.]])"
	},
	"metadata": {}
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Q -Table of FrozenLake env. has a 16 X 4 grid of shape. \n(one for each block : 16) X (Action : 4) // up, down, left or right"
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "as above result, we can know\n* action 0 : Left\n* action 1 : Down\n* action 2 : Right\n* action 3 : UP"
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "def rargmax(vector):\n m = np.amax(vector)\n indices = np.nonzero(vector == m)[0]\n return pr.choice(indices)",
	"execution_count": 6,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "# Set learning parameters\nnum_episodes = 2000\n\n#create lists to contain total rewards and steps per episode\nrList = [] # reword list\n\nfor i in range(num_episodes):\n s = env.reset() # Reset environment and get first new observation\n rAll = 0 # total reward\n d = False # end of precess\n\n #The Q-Table learning algorithm\n while not d:\n a = rargmax(Q[s,:]) \n s1, r, d, info = env.step(a)\n\n # Update Q-Table with new knowledge(=reward)\n Q[s,a] = r + np.max(Q[s1,:])\n\n rAll += r # add reward \n s = s1 # move to next state\n\n rList.append(rAll)",
	"execution_count": 7,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "print (\"Score over time: \" + str(sum(rList)/num_episodes))",
	"execution_count": 8,
	"outputs": [
	{
	"output_type": "stream",
	"text": "Score over time: 0.9325\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "print (\"Final Q-Table Values\")\nprint (np.round(Q,3))",
	"execution_count": 9,
	"outputs": [
	{
	"output_type": "stream",
	"text": "Final Q-Table Values\n[[ 0. 0. 1. 0.]\n [ 0. 0. 1. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 1. 0.]\n [ 0. 0. 0. 0.]]\n",
	"name": "stdout"
	}
	]
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "Let me check more detail.\nI'll get the index of the maximum prob. in every row, then translate to action."
	},
	{
	"metadata": {},
	"cell_type": "markdown",
	"source": "# Test"
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "num_episodes_test = 5\n\n#create lists to contain total rewards and steps per episode\nrList_test = [] # reword list\npath=[] # path in each episoed\nfor i in range(num_episodes_test):\n s = env.reset() # Reset environment and get first new observation\n rAll = 0 # total reward\n d = False # end of precess\n sDict = {} # state-action dict\n #The Q-Table learning algorithm\n while not d:\n a = np.argmax(Q[s,:])\n sDict[s] = a\n \n s1, r, d, info = env.step(a)\n\n rAll += r # add reward \n s = s1 # move to next state\n path.append(sDict)\n rList_test.append(rAll)",
	"execution_count": 10,
	"outputs": []
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "path",
	"execution_count": 11,
	"outputs": [
	{
	"output_type": "execute_result",
	"execution_count": 11,
	"data": {
	"text/plain": "[{0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2}]"
	},
	"metadata": {}
	}
	]
	},
	{
	"metadata": {
	"trusted": true
	},
	"cell_type": "code",
	"source": "# path\naction_set=['Left', 'Down', 'Right', 'Up']\nprint(\"-----Result of what the agent did-------\\n\")\nfor i in path[0].keys():\n action = path[0][i]\n print('At state %d, the agent move %s' %(i, action_set[action]))",
	"execution_count": 13,
	"outputs": [
	{
	"output_type": "stream",
	"text": "-----Result of what the agent did-------\n\nAt state 0, the agent move Right\nAt state 1, the agent move Right\nAt state 2, the agent move Down\nAt state 6, the agent move Down\nAt state 10, the agent move Down\nAt state 14, the agent move Right\n",
	"name": "stdout"
	}
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"name": "python3",
	"display_name": "Python [default]",
	"language": "python"
	},
	"language_info": {
	"nbconvert_exporter": "python",
	"file_extension": ".py",
	"codemirror_mode": {
	"version": 3,
	"name": "ipython"
	},
	"name": "python",
	"mimetype": "text/x-python",
	"pygments_lexer": "ipython3",
	"version": "3.5.2"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}