{"nbformat":4,"nbformat_minor":0,"metadata":{"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.8-final"},"orig_nbformat":2,"kernelspec":{"name":"python3","display_name":"Python 3.8.8 64-bit (conda)","metadata":{"interpreter":{"hash":"828b2d2e90c34517c0f3db151137e4812c232f0241304c27576ea4a0014563c4"}}},"colab":{"name":"IITM_RL_DP_ASSIGNMENT_v1","provenance":[{"file_id":"1LGjo_UabJCcQl41FwuOahjeJe7Lp2kHk","timestamp":1614721014962},{"file_id":"14fC9FsOpJ6tJHZ5sa9s2EILP9BLIxnlw","timestamp":1614718819970}],"collapsed_sections":[],"toc_visible":true}},"cells":[{"cell_type":"markdown","metadata":{"id":"VXoLHrjbjNJp"},"source":["<div style=\"text-align: center\">\n"," <a href=\"\"><img alt=\"AIcrowd\" src=\"\"></a>\n","</div>"]},{"cell_type":"markdown","metadata":{"id":"_rBlqB-7jSaG"},"source":["# What is the notebook about?\n","\n","## Problem - DP Algorithm\n","This problem deals with a taxi driver with multiple actions in different cities. The tasks you have to do are:\n","- Implement DP Algorithm to find the optimal sequence for the taxi driver\n","- Find optimal policies for sequences of varying lengths\n","- Explain a variation on the policy\n","\n","# How to use this notebook? 📝\n","\n","- This is a shared template and any edits you make here will not be saved. **You\n","should make a copy in your own drive**. Click the \"File\" menu (top-left), then \"Save a Copy in Drive\". You will be working in your copy however you like.\n","\n","<p style=\"text-align: center\"><img src=\"\" alt=\"notebook overview\" style=\"width: 650px;\"/></p>\n","\n","- **Update the config parameters**. You can define the common variables here\n","\n","Variable | Description\n","--- | ---\n","`AICROWD_DATASET_PATH` | Path to the file containing test data. This should be an absolute path.\n","`AICROWD_RESULTS_DIR` | Path to write the output to.\n","`AICROWD_ASSETS_DIR` | In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.\n","`AICROWD_API_KEY` | In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at\n","\n","- **Installing packages**. Please use the [Install packages 🗃](#install-packages-) section to install the packages"]},{"cell_type":"markdown","metadata":{"id":"zr2_Itu2jmYu"},"source":["# Setup AIcrowd Utilities 🛠\n","\n","We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block."]},{"cell_type":"code","metadata":{"id":"kriIY9ntvLQD"},"source":["!pip install -U git+ > /dev/null "],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"ecFpLP6avMok"},"source":["%load_ext aicrowd.magic "],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"9tpalq6zjo3X"},"source":["# AIcrowd Runtime Configuration 🧷\n","\n","Define configuration parameters. Please include any files needed for the notebook to run under `ASSETS_DIR`. We will copy the contents of this directory to your final submission file 🙂"]},{"cell_type":"code","metadata":{"id":"Up8mk6fhvOub"},"source":["import os\n","\n","AICROWD_DATASET_PATH = os.getenv(\"DATASET_PATH\", os.getcwd()+\"/\")\n","AICROWD_RESULTS_DIR = os.getenv(\"OUTPUTS_DIR\", \"results\")\n","API_KEY = \"\" #Get your API key from"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"O2w1nVPDj9Uj"},"source":["# Download dataset files 📲"]},{"cell_type":"code","metadata":{"id":"ZV9xReqhvVNt"},"source":["!aicrowd login --api-key $API_KEY\n","!aicrowd dataset download -c rl-taxi"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"MTYP6MFKvXCM"},"source":["!unzip -q $AICROWD_DATASET_PATH"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"vijz0kfaKzmw"},"source":["DATASET_DIR = 'hw2_q1/'\n","!mkdir {DATASET_DIR}results/"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"gTeWFlWukTob"},"source":["# Install packages 🗃\n","\n","Please add all pacakage installations in this section"]},{"cell_type":"code","metadata":{"id":"KV5fXPkYkUxz"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"yJbm42p_kWRG"},"source":["# Import packages 💻"]},{"cell_type":"code","metadata":{"id":"geEOnXHeK4oQ"},"source":["import numpy as np\n","import os\n","# ADD ANY IMPORTS YOU WANT HERE"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"oL3_9MAk5cqv"},"source":["import numpy as np\n","\n","class TaxiEnv_HW2:\n"," def __init__(self, states, actions, probabilities, rewards):\n"," self.possible_states = states\n"," self._possible_actions = {st: ac for st, ac in zip(states, actions)}\n"," self._ride_probabilities = {st: pr for st, pr in zip(states, probabilities)}\n"," self._ride_rewards = {st: rw for st, rw in zip(states, rewards)}\n"," self._verify()\n","\n"," def _check_state(self, state):\n"," assert state in self.possible_states, \"State %s is not a valid state\" % state\n","\n"," def _verify(self):\n"," \"\"\" \n"," Verify that data conditions are met:\n"," Number of actions matches shape of next state and actions\n"," Every probability distribution adds up to 1 \n"," \"\"\"\n"," ns = len(self.possible_states)\n"," for state in self.possible_states:\n"," ac = self._possible_actions[state]\n"," na = len(ac)\n","\n"," rp = self._ride_probabilities[state]\n"," assert np.all(rp.shape == (na, ns)), \"Probabilities shape mismatch\"\n"," \n"," rr = self._ride_rewards[state]\n"," assert np.all(rr.shape == (na, ns)), \"Rewards shape mismatch\"\n","\n"," assert np.allclose(rp.sum(axis=1), 1), \"Probabilities don't add up to 1\"\n","\n"," def possible_actions(self, state):\n"," \"\"\" Return all possible actions from a given state \"\"\"\n"," self._check_state(state)\n"," return self._possible_actions[state]\n","\n"," def ride_probabilities(self, state, action):\n"," \"\"\" \n"," Returns all possible ride probabilities from a state for a given action\n"," For every action a list with the returned with values in the same order as self.possible_states\n"," \"\"\"\n"," actions = self.possible_actions(state)\n"," ac_idx = actions.index(action)\n"," return self._ride_probabilities[state][ac_idx]\n","\n"," def ride_rewards(self, state, action):\n"," actions = self.possible_actions(state)\n"," ac_idx = actions.index(action)\n"," return self._ride_rewards[state][ac_idx]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"APNUuTHK5cqw"},"source":["# Examples of using the environment functions"]},{"cell_type":"code","metadata":{"id":"U8BwZY8Z5cqw"},"source":["def check_taxienv():\n"," # These are the values as used in the pdf, but they may be changed during submission, so do not hardcode anything\n","\n"," states = ['A', 'B', 'C']\n","\n"," actions = [['1','2','3'], ['1','2'], ['1','2','3']]\n","\n"," probs = [np.array([[1/2, 1/4, 1/4],\n"," [1/16, 3/4, 3/16],\n"," [1/4, 1/8, 5/8]]),\n","\n"," np.array([[1/2, 0, 1/2],\n"," [1/16, 7/8, 1/16]]),\n","\n"," np.array([[1/4, 1/4, 1/2],\n"," [1/8, 3/4, 1/8],\n"," [3/4, 1/16, 3/16]]),]\n","\n"," rewards = [np.array([[10, 4, 8],\n"," [ 8, 2, 4],\n"," [ 4, 6, 4]]),\n"," \n"," np.array([[14, 0, 18],\n"," [ 8, 16, 8]]),\n"," \n"," np.array([[10, 2, 8],\n"," [6, 4, 2],\n"," [4, 0, 8]]),]\n","\n","\n"," env = TaxiEnv_HW2(states, actions, probs, rewards)\n"," print(\"All possible states\", env.possible_states)\n"," print(\"All possible actions from state B\", env.possible_actions('B'))\n"," print(\"Ride probabilities from state A with action 2\", env.ride_probabilities('A', '2'))\n"," print(\"Ride rewards from state C with action 3\", env.ride_rewards('C', '3'))\n","\n","check_taxienv()"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uh6g_1u05cqx"},"source":["# Task 1 - DP Algorithm implementation\n","Implement your DP algorithm that takes the starting state and sequence length\n","and return the expected reward for the policy"]},{"cell_type":"code","metadata":{"id":"IMYcbfVq5cqx"},"source":["def dp_solve(taxienv):\n"," ## Implement the DP algorithm for the taxienv\n"," states = taxienv.possible_states\n"," values = {s: 0 for s in states}\n"," policy = {s: '0' for s in states}\n"," all_values = [] # Append the \"values\" dictionary to this after each update\n"," all_policies = [] # Append the \"policy\" dictionary to this after each update\n"," # Note: The sequence length is always N=10\n"," \n"," # ADD YOUR CODE BELOW - DO NOT EDIT ABOVE THIS LINE\n","\n"," # DO NOT EDIT BELOW THIS LINE\n"," results = {\"Expected Reward\": all_values, \"Polcies\": all_policies}\n"," return results"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"us8AYNVXISv1"},"source":["## Here is an example of what the \"results\" output from value_iter function should look like\n","\n","Ofcourse, it won't be all zeros\n","``` python \n","{'Expected Reward': [{'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0}],\n"," 'Polcies': [{'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'}]}\n","\n"," ```"]},{"cell_type":"code","metadata":{"id":"5Ct5_WU1meeo"},"source":["if not os.path.exists(AICROWD_RESULTS_DIR):\n"," os.mkdir(AICROWD_RESULTS_DIR)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"AblFN9zNIjwV"},"source":["# DO NOT EDIT THIS CELL, DURING EVALUATION THE DATASET DIR WILL CHANGE\n","input_dir = os.path.join(DATASET_DIR, 'inputs')\n","for params_file in os.listdir(input_dir):\n"," kwargs = np.load(os.path.join(input_dir, params_file), allow_pickle=True).item()\n","\n"," env = TaxiEnv_HW2(**kwargs)\n","\n"," results = dp_solve(env)\n"," idx = params_file.split('_')[-1][:-4]\n",", 'results_' + idx), results)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"fswnLXrXL2wh"},"source":["## Modify this code to show the results for the policy and expected rewards properly\n","print(results)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Qsne4xm4YTi-"},"source":["# Task 2 - Tabulate the optimal policy & optimal value for each state in each round for N=10\n","\n","Modify this cell and add your answer"]},{"cell_type":"markdown","metadata":{"id":"vKsA1jrCKszh"},"source":["# Question - Consider a policy that always forces the driver to go to the nearest taxi stand, irrespective of the state. Is it optimal? Justify your answer.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"Yl502NaJHaPE"},"source":["Modify this cell and add your answer"]},{"cell_type":"markdown","metadata":{"id":"NiAS3hQPkiXS"},"source":["# Submit to AIcrowd 🚀\n","\n","**NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)**"]},{"cell_type":"code","metadata":{"id":"LfpXzdeTvjJ7"},"source":["!DATASET_PATH=$AICROWD_DATASET_PATH aicrowd notebook submit -c rl-taxi -a assets"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"VfwpjWPr3yMm"},"source":[""],"execution_count":null,"outputs":[]}]}
