Skip to content

Instantly share code, notes, and snippets.

@denny0323
Last active January 25, 2018 17:19
Show Gist options
  • Save denny0323/7e8b5ef0d6adbd388517c7aa89d7eca8 to your computer and use it in GitHub Desktop.
Save denny0323/7e8b5ef0d6adbd388517c7aa89d7eca8 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"cells": [
{
"metadata": {},
"cell_type": "markdown",
"source": "# Q-Table Learning in FrozenLake(Dummy)"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "from OpenAI gymm"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# import package\nimport gym\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom gym.envs.registration import register\nimport random as pr",
"execution_count": 1,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "register(\n id = 'FrozenLake-v3',\n entry_point = 'gym.envs.toy_text:FrozenLakeEnv',\n kwargs={'map_name': '4x4',\n 'is_slippery' : False }\n)",
"execution_count": 2,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "env = gym.make('FrozenLake-v3')",
"execution_count": 3,
"outputs": []
},
{
"metadata": {},
"cell_type": "markdown",
"source": "FrozenLake env review)\n\n* Structure : a 4 X 4 grid of blocks.\n* Blocks of State : Start / Goal / Safe-Frozen / Dangerous hole.\n* Objective : To have an agent learn to navigate from Start Block to Goal Block without moving onto a hole.\n* The Catch : __do not exist__"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# Q-Table Learning Algorithm"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "#Initialize table with all zeros\nQ = np.zeros([env.observation_space.n, env.action_space.n])",
"execution_count": 4,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "Q",
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 5,
"data": {
"text/plain": "array([[ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.]])"
},
"metadata": {}
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Q -Table of FrozenLake env. has a 16 X 4 grid of shape. \n(one for each block : 16) X (Action : 4) // up, down, left or right"
},
{
"metadata": {},
"cell_type": "markdown",
"source": "as above result, we can know\n* action 0 : Left\n* action 1 : Down\n* action 2 : Right\n* action 3 : UP"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "def rargmax(vector):\n m = np.amax(vector)\n indices = np.nonzero(vector == m)[0]\n return pr.choice(indices)",
"execution_count": 6,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# Set learning parameters\nnum_episodes = 2000\n\n#create lists to contain total rewards and steps per episode\nrList = [] # reword list\n\nfor i in range(num_episodes):\n s = env.reset() # Reset environment and get first new observation\n rAll = 0 # total reward\n d = False # end of precess\n\n #The Q-Table learning algorithm\n while not d:\n a = rargmax(Q[s,:]) \n s1, r, d, info = env.step(a)\n\n # Update Q-Table with new knowledge(=reward)\n Q[s,a] = r + np.max(Q[s1,:])\n\n rAll += r # add reward \n s = s1 # move to next state\n\n rList.append(rAll)",
"execution_count": 7,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print (\"Score over time: \" + str(sum(rList)/num_episodes))",
"execution_count": 8,
"outputs": [
{
"output_type": "stream",
"text": "Score over time: 0.9325\n",
"name": "stdout"
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "print (\"Final Q-Table Values\")\nprint (np.round(Q,3))",
"execution_count": 9,
"outputs": [
{
"output_type": "stream",
"text": "Final Q-Table Values\n[[ 0. 0. 1. 0.]\n [ 0. 0. 1. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 1. 0.]\n [ 0. 0. 0. 0.]]\n",
"name": "stdout"
}
]
},
{
"metadata": {},
"cell_type": "markdown",
"source": "Let me check more detail.\nI'll get the index of the maximum prob. in every row, then translate to action."
},
{
"metadata": {},
"cell_type": "markdown",
"source": "# Test"
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "num_episodes_test = 5\n\n#create lists to contain total rewards and steps per episode\nrList_test = [] # reword list\npath=[] # path in each episoed\nfor i in range(num_episodes_test):\n s = env.reset() # Reset environment and get first new observation\n rAll = 0 # total reward\n d = False # end of precess\n sDict = {} # state-action dict\n #The Q-Table learning algorithm\n while not d:\n a = np.argmax(Q[s,:])\n sDict[s] = a\n \n s1, r, d, info = env.step(a)\n\n rAll += r # add reward \n s = s1 # move to next state\n path.append(sDict)\n rList_test.append(rAll)",
"execution_count": 10,
"outputs": []
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "path",
"execution_count": 11,
"outputs": [
{
"output_type": "execute_result",
"execution_count": 11,
"data": {
"text/plain": "[{0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2}]"
},
"metadata": {}
}
]
},
{
"metadata": {
"trusted": true
},
"cell_type": "code",
"source": "# path\naction_set=['Left', 'Down', 'Right', 'Up']\nprint(\"-----Result of what the agent did-------\\n\")\nfor i in path[0].keys():\n action = path[0][i]\n print('At state %d, the agent move %s' %(i, action_set[action]))",
"execution_count": 13,
"outputs": [
{
"output_type": "stream",
"text": "-----Result of what the agent did-------\n\nAt state 0, the agent move Right\nAt state 1, the agent move Right\nAt state 2, the agent move Down\nAt state 6, the agent move Down\nAt state 10, the agent move Down\nAt state 14, the agent move Right\n",
"name": "stdout"
}
]
}
],
"metadata": {
"kernelspec": {
"name": "python3",
"display_name": "Python [default]",
"language": "python"
},
"language_info": {
"nbconvert_exporter": "python",
"file_extension": ".py",
"codemirror_mode": {
"version": 3,
"name": "ipython"
},
"name": "python",
"mimetype": "text/x-python",
"pygments_lexer": "ipython3",
"version": "3.5.2"
}
},
"nbformat": 4,
"nbformat_minor": 2
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment