Created
March 3, 2021 10:34
-
-
Save aicrowd-bot/c5565ed6a568b88bec602555f09bc044 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{"nbformat":4,"nbformat_minor":0,"metadata":{"language_info":{"codemirror_mode":{"name":"ipython","version":3},"file_extension":".py","mimetype":"text/x-python","name":"python","nbconvert_exporter":"python","pygments_lexer":"ipython3","version":"3.8.8-final"},"orig_nbformat":2,"kernelspec":{"name":"python3","display_name":"Python 3.8.8 64-bit (conda)","metadata":{"interpreter":{"hash":"828b2d2e90c34517c0f3db151137e4812c232f0241304c27576ea4a0014563c4"}}},"colab":{"name":"IITM_RL_DP_ASSIGNMENT_v1","provenance":[{"file_id":"1LGjo_UabJCcQl41FwuOahjeJe7Lp2kHk","timestamp":1614721014962},{"file_id":"14fC9FsOpJ6tJHZ5sa9s2EILP9BLIxnlw","timestamp":1614718819970}],"collapsed_sections":[],"toc_visible":true}},"cells":[{"cell_type":"markdown","metadata":{"id":"VXoLHrjbjNJp"},"source":["<div style=\"text-align: center\">\n"," <a href=\"https://www.aicrowd.com/challenges/rl-taxi\"><img alt=\"AIcrowd\" src=\"https://images.aicrowd.com/raw_images/challenges/banner_file/759/d9540ebbd506b68a5ff2.jpg\"></a>\n","</div>"]},{"cell_type":"markdown","metadata":{"id":"_rBlqB-7jSaG"},"source":["# What is the notebook about?\n","\n","## Problem - DP Algorithm\n","This problem deals with a taxi driver with multiple actions in different cities. The tasks you have to do are:\n","- Implement DP Algorithm to find the optimal sequence for the taxi driver\n","- Find optimal policies for sequences of varying lengths\n","- Explain a variation on the policy\n","\n","# How to use this notebook? 📝\n","\n","- This is a shared template and any edits you make here will not be saved. **You\n","should make a copy in your own drive**. Click the \"File\" menu (top-left), then \"Save a Copy in Drive\". You will be working in your copy however you like.\n","\n","<p style=\"text-align: center\"><img src=\"https://gitlab.aicrowd.com/aicrowd/assets/-/raw/master/notebook/aicrowd_notebook_submission_flow.png?inline=false\" alt=\"notebook overview\" style=\"width: 650px;\"/></p>\n","\n","- **Update the config parameters**. You can define the common variables here\n","\n","Variable | Description\n","--- | ---\n","`AICROWD_DATASET_PATH` | Path to the file containing test data. This should be an absolute path.\n","`AICROWD_RESULTS_DIR` | Path to write the output to.\n","`AICROWD_ASSETS_DIR` | In case your notebook needs additional files (like model weights, etc.,), you can add them to a directory and specify the path to the directory here (please specify relative path). The contents of this directory will be sent to AIcrowd for evaluation.\n","`AICROWD_API_KEY` | In order to submit your code to AIcrowd, you need to provide your account's API key. This key is available at https://www.aicrowd.com/participants/me\n","\n","- **Installing packages**. Please use the [Install packages 🗃](#install-packages-) section to install the packages"]},{"cell_type":"markdown","metadata":{"id":"zr2_Itu2jmYu"},"source":["# Setup AIcrowd Utilities 🛠\n","\n","We use this to bundle the files for submission and create a submission on AIcrowd. Do not edit this block."]},{"cell_type":"code","metadata":{"id":"kriIY9ntvLQD"},"source":["!pip install -U git+https://gitlab.aicrowd.com/aicrowd/aicrowd-cli.git@notebook-submission-v2 > /dev/null "],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"ecFpLP6avMok"},"source":["%load_ext aicrowd.magic "],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"9tpalq6zjo3X"},"source":["# AIcrowd Runtime Configuration 🧷\n","\n","Define configuration parameters. Please include any files needed for the notebook to run under `ASSETS_DIR`. We will copy the contents of this directory to your final submission file 🙂"]},{"cell_type":"code","metadata":{"id":"Up8mk6fhvOub"},"source":["import os\n","\n","AICROWD_DATASET_PATH = os.getenv(\"DATASET_PATH\", os.getcwd()+\"/40746340-4151-4921-8496-be10b3f8f5cf_hw2_q1.zip\")\n","AICROWD_RESULTS_DIR = os.getenv(\"OUTPUTS_DIR\", \"results\")\n","API_KEY = \"\" #Get your API key from https://www.aicrowd.com/participants/me"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"O2w1nVPDj9Uj"},"source":["# Download dataset files 📲"]},{"cell_type":"code","metadata":{"id":"ZV9xReqhvVNt"},"source":["!aicrowd login --api-key $API_KEY\n","!aicrowd dataset download -c rl-taxi"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"MTYP6MFKvXCM"},"source":["!unzip -q $AICROWD_DATASET_PATH"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"vijz0kfaKzmw"},"source":["DATASET_DIR = 'hw2_q1/'\n","!mkdir {DATASET_DIR}results/"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"gTeWFlWukTob"},"source":["# Install packages 🗃\n","\n","Please add all pacakage installations in this section"]},{"cell_type":"code","metadata":{"id":"KV5fXPkYkUxz"},"source":[""],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"yJbm42p_kWRG"},"source":["# Import packages 💻"]},{"cell_type":"code","metadata":{"id":"geEOnXHeK4oQ"},"source":["import numpy as np\n","import os\n","# ADD ANY IMPORTS YOU WANT HERE"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"oL3_9MAk5cqv"},"source":["import numpy as np\n","\n","class TaxiEnv_HW2:\n"," def __init__(self, states, actions, probabilities, rewards):\n"," self.possible_states = states\n"," self._possible_actions = {st: ac for st, ac in zip(states, actions)}\n"," self._ride_probabilities = {st: pr for st, pr in zip(states, probabilities)}\n"," self._ride_rewards = {st: rw for st, rw in zip(states, rewards)}\n"," self._verify()\n","\n"," def _check_state(self, state):\n"," assert state in self.possible_states, \"State %s is not a valid state\" % state\n","\n"," def _verify(self):\n"," \"\"\" \n"," Verify that data conditions are met:\n"," Number of actions matches shape of next state and actions\n"," Every probability distribution adds up to 1 \n"," \"\"\"\n"," ns = len(self.possible_states)\n"," for state in self.possible_states:\n"," ac = self._possible_actions[state]\n"," na = len(ac)\n","\n"," rp = self._ride_probabilities[state]\n"," assert np.all(rp.shape == (na, ns)), \"Probabilities shape mismatch\"\n"," \n"," rr = self._ride_rewards[state]\n"," assert np.all(rr.shape == (na, ns)), \"Rewards shape mismatch\"\n","\n"," assert np.allclose(rp.sum(axis=1), 1), \"Probabilities don't add up to 1\"\n","\n"," def possible_actions(self, state):\n"," \"\"\" Return all possible actions from a given state \"\"\"\n"," self._check_state(state)\n"," return self._possible_actions[state]\n","\n"," def ride_probabilities(self, state, action):\n"," \"\"\" \n"," Returns all possible ride probabilities from a state for a given action\n"," For every action a list with the returned with values in the same order as self.possible_states\n"," \"\"\"\n"," actions = self.possible_actions(state)\n"," ac_idx = actions.index(action)\n"," return self._ride_probabilities[state][ac_idx]\n","\n"," def ride_rewards(self, state, action):\n"," actions = self.possible_actions(state)\n"," ac_idx = actions.index(action)\n"," return self._ride_rewards[state][ac_idx]"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"APNUuTHK5cqw"},"source":["# Examples of using the environment functions"]},{"cell_type":"code","metadata":{"id":"U8BwZY8Z5cqw"},"source":["def check_taxienv():\n"," # These are the values as used in the pdf, but they may be changed during submission, so do not hardcode anything\n","\n"," states = ['A', 'B', 'C']\n","\n"," actions = [['1','2','3'], ['1','2'], ['1','2','3']]\n","\n"," probs = [np.array([[1/2, 1/4, 1/4],\n"," [1/16, 3/4, 3/16],\n"," [1/4, 1/8, 5/8]]),\n","\n"," np.array([[1/2, 0, 1/2],\n"," [1/16, 7/8, 1/16]]),\n","\n"," np.array([[1/4, 1/4, 1/2],\n"," [1/8, 3/4, 1/8],\n"," [3/4, 1/16, 3/16]]),]\n","\n"," rewards = [np.array([[10, 4, 8],\n"," [ 8, 2, 4],\n"," [ 4, 6, 4]]),\n"," \n"," np.array([[14, 0, 18],\n"," [ 8, 16, 8]]),\n"," \n"," np.array([[10, 2, 8],\n"," [6, 4, 2],\n"," [4, 0, 8]]),]\n","\n","\n"," env = TaxiEnv_HW2(states, actions, probs, rewards)\n"," print(\"All possible states\", env.possible_states)\n"," print(\"All possible actions from state B\", env.possible_actions('B'))\n"," print(\"Ride probabilities from state A with action 2\", env.ride_probabilities('A', '2'))\n"," print(\"Ride rewards from state C with action 3\", env.ride_rewards('C', '3'))\n","\n","check_taxienv()"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"uh6g_1u05cqx"},"source":["# Task 1 - DP Algorithm implementation\n","Implement your DP algorithm that takes the starting state and sequence length\n","and return the expected reward for the policy"]},{"cell_type":"code","metadata":{"id":"IMYcbfVq5cqx"},"source":["def dp_solve(taxienv):\n"," ## Implement the DP algorithm for the taxienv\n"," states = taxienv.possible_states\n"," values = {s: 0 for s in states}\n"," policy = {s: '0' for s in states}\n"," all_values = [] # Append the \"values\" dictionary to this after each update\n"," all_policies = [] # Append the \"policy\" dictionary to this after each update\n"," # Note: The sequence length is always N=10\n"," \n"," # ADD YOUR CODE BELOW - DO NOT EDIT ABOVE THIS LINE\n","\n"," # DO NOT EDIT BELOW THIS LINE\n"," results = {\"Expected Reward\": all_values, \"Polcies\": all_policies}\n"," return results"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"us8AYNVXISv1"},"source":["## Here is an example of what the \"results\" output from value_iter function should look like\n","\n","Ofcourse, it won't be all zeros\n","``` python \n","{'Expected Reward': [{'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0},\n"," {'A': 0, 'B': 0, 'C': 0}],\n"," 'Polcies': [{'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'},\n"," {'A': '0', 'B': '0', 'C': '0'}]}\n","\n"," ```"]},{"cell_type":"code","metadata":{"id":"5Ct5_WU1meeo"},"source":["if not os.path.exists(AICROWD_RESULTS_DIR):\n"," os.mkdir(AICROWD_RESULTS_DIR)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"AblFN9zNIjwV"},"source":["# DO NOT EDIT THIS CELL, DURING EVALUATION THE DATASET DIR WILL CHANGE\n","input_dir = os.path.join(DATASET_DIR, 'inputs')\n","for params_file in os.listdir(input_dir):\n"," kwargs = np.load(os.path.join(input_dir, params_file), allow_pickle=True).item()\n","\n"," env = TaxiEnv_HW2(**kwargs)\n","\n"," results = dp_solve(env)\n"," idx = params_file.split('_')[-1][:-4]\n"," np.save(os.path.join(AICROWD_RESULTS_DIR, 'results_' + idx), results)"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"fswnLXrXL2wh"},"source":["## Modify this code to show the results for the policy and expected rewards properly\n","print(results)"],"execution_count":null,"outputs":[]},{"cell_type":"markdown","metadata":{"id":"Qsne4xm4YTi-"},"source":["# Task 2 - Tabulate the optimal policy & optimal value for each state in each round for N=10\n","\n","Modify this cell and add your answer"]},{"cell_type":"markdown","metadata":{"id":"vKsA1jrCKszh"},"source":["# Question - Consider a policy that always forces the driver to go to the nearest taxi stand, irrespective of the state. Is it optimal? Justify your answer.\n","\n"]},{"cell_type":"markdown","metadata":{"id":"Yl502NaJHaPE"},"source":["Modify this cell and add your answer"]},{"cell_type":"markdown","metadata":{"id":"NiAS3hQPkiXS"},"source":["# Submit to AIcrowd 🚀\n","\n","**NOTE: PLEASE SAVE THE NOTEBOOK BEFORE SUBMITTING IT (Ctrl + S)**"]},{"cell_type":"code","metadata":{"id":"LfpXzdeTvjJ7"},"source":["!DATASET_PATH=$AICROWD_DATASET_PATH aicrowd notebook submit -c rl-taxi -a assets"],"execution_count":null,"outputs":[]},{"cell_type":"code","metadata":{"id":"VfwpjWPr3yMm"},"source":[""],"execution_count":null,"outputs":[]}]} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment