Last active
January 25, 2018 17:19
-
-
Save denny0323/7e8b5ef0d6adbd388517c7aa89d7eca8 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"cells": [ | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Q-Table Learning in FrozenLake(Dummy)" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "from OpenAI gymm" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# import package\nimport gym\nimport numpy as np\nimport matplotlib.pyplot as plt\nfrom gym.envs.registration import register\nimport random as pr", | |
"execution_count": 1, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "register(\n id = 'FrozenLake-v3',\n entry_point = 'gym.envs.toy_text:FrozenLakeEnv',\n kwargs={'map_name': '4x4',\n 'is_slippery' : False }\n)", | |
"execution_count": 2, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "env = gym.make('FrozenLake-v3')", | |
"execution_count": 3, | |
"outputs": [] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "FrozenLake env review)\n\n* Structure : a 4 X 4 grid of blocks.\n* Blocks of State : Start / Goal / Safe-Frozen / Dangerous hole.\n* Objective : To have an agent learn to navigate from Start Block to Goal Block without moving onto a hole.\n* The Catch : __do not exist__" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Q-Table Learning Algorithm" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "#Initialize table with all zeros\nQ = np.zeros([env.observation_space.n, env.action_space.n])", | |
"execution_count": 4, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "Q", | |
"execution_count": 5, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 5, | |
"data": { | |
"text/plain": "array([[ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.],\n [ 0., 0., 0., 0.]])" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Q -Table of FrozenLake env. has a 16 X 4 grid of shape. \n(one for each block : 16) X (Action : 4) // up, down, left or right" | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "as above result, we can know\n* action 0 : Left\n* action 1 : Down\n* action 2 : Right\n* action 3 : UP" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "def rargmax(vector):\n m = np.amax(vector)\n indices = np.nonzero(vector == m)[0]\n return pr.choice(indices)", | |
"execution_count": 6, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# Set learning parameters\nnum_episodes = 2000\n\n#create lists to contain total rewards and steps per episode\nrList = [] # reword list\n\nfor i in range(num_episodes):\n s = env.reset() # Reset environment and get first new observation\n rAll = 0 # total reward\n d = False # end of precess\n\n #The Q-Table learning algorithm\n while not d:\n a = rargmax(Q[s,:]) \n s1, r, d, info = env.step(a)\n\n # Update Q-Table with new knowledge(=reward)\n Q[s,a] = r + np.max(Q[s1,:])\n\n rAll += r # add reward \n s = s1 # move to next state\n\n rList.append(rAll)", | |
"execution_count": 7, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "print (\"Score over time: \" + str(sum(rList)/num_episodes))", | |
"execution_count": 8, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "Score over time: 0.9325\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "print (\"Final Q-Table Values\")\nprint (np.round(Q,3))", | |
"execution_count": 9, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "Final Q-Table Values\n[[ 0. 0. 1. 0.]\n [ 0. 0. 1. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 1. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 0. 0.]\n [ 0. 0. 1. 0.]\n [ 0. 0. 0. 0.]]\n", | |
"name": "stdout" | |
} | |
] | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "Let me check more detail.\nI'll get the index of the maximum prob. in every row, then translate to action." | |
}, | |
{ | |
"metadata": {}, | |
"cell_type": "markdown", | |
"source": "# Test" | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "num_episodes_test = 5\n\n#create lists to contain total rewards and steps per episode\nrList_test = [] # reword list\npath=[] # path in each episoed\nfor i in range(num_episodes_test):\n s = env.reset() # Reset environment and get first new observation\n rAll = 0 # total reward\n d = False # end of precess\n sDict = {} # state-action dict\n #The Q-Table learning algorithm\n while not d:\n a = np.argmax(Q[s,:])\n sDict[s] = a\n \n s1, r, d, info = env.step(a)\n\n rAll += r # add reward \n s = s1 # move to next state\n path.append(sDict)\n rList_test.append(rAll)", | |
"execution_count": 10, | |
"outputs": [] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "path", | |
"execution_count": 11, | |
"outputs": [ | |
{ | |
"output_type": "execute_result", | |
"execution_count": 11, | |
"data": { | |
"text/plain": "[{0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2},\n {0: 2, 1: 2, 2: 1, 6: 1, 10: 1, 14: 2}]" | |
}, | |
"metadata": {} | |
} | |
] | |
}, | |
{ | |
"metadata": { | |
"trusted": true | |
}, | |
"cell_type": "code", | |
"source": "# path\naction_set=['Left', 'Down', 'Right', 'Up']\nprint(\"-----Result of what the agent did-------\\n\")\nfor i in path[0].keys():\n action = path[0][i]\n print('At state %d, the agent move %s' %(i, action_set[action]))", | |
"execution_count": 13, | |
"outputs": [ | |
{ | |
"output_type": "stream", | |
"text": "-----Result of what the agent did-------\n\nAt state 0, the agent move Right\nAt state 1, the agent move Right\nAt state 2, the agent move Down\nAt state 6, the agent move Down\nAt state 10, the agent move Down\nAt state 14, the agent move Right\n", | |
"name": "stdout" | |
} | |
] | |
} | |
], | |
"metadata": { | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python [default]", | |
"language": "python" | |
}, | |
"language_info": { | |
"nbconvert_exporter": "python", | |
"file_extension": ".py", | |
"codemirror_mode": { | |
"version": 3, | |
"name": "ipython" | |
}, | |
"name": "python", | |
"mimetype": "text/x-python", | |
"pygments_lexer": "ipython3", | |
"version": "3.5.2" | |
} | |
}, | |
"nbformat": 4, | |
"nbformat_minor": 2 | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment