johnleung8888/Completing the Parameter Study.ipynb

## Completing the Parameter Study.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "b64717e9bcdc4ab796bc1d0fdbb12c4b",
     "grade": false,
     "grade_id": "cell-f64cdc46435b25eb",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "# Assignment 3 - Completing the Parameter Study\n",
    "\n",
    "Welcome to Course 4 Programming Assignment 3. In the previous assignments, you completed the implementation of the Lunar Lander environment and implemented an agent with neural networks and the Adam optimizer. As you may remember, we discussed a number of key meta-parameters that affect the performance of the agent (e.g. the step-size, the temperature parameter for the softmax policy, the capacity of the replay buffer). We can use rules of thumb for picking reasonable values for these meta-parameters. However, we can also study the impact of these meta-parameters on the performance of the agent to gain insight.\n",
    "\n",
    "In this assignment, you will conduct a careful experiment analyzing performance of an agent, under different values of the step-size parameter.\n",
    "\n",
    "**In this assignment, you will:**\n",
    "\n",
    "- write a script to run your agent and environment on a set of parameters, to determine performance across these parameters.\n",
    "- gain insight into the impact of the step-size parameter on agent performance by examining its parameter sensitivity curve."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "a5e3743388aba0f744da57d8af1c5270",
     "grade": false,
     "grade_id": "cell-4359dc74745ffb31",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Packages\n",
    "\n",
    "- [numpy](www.numpy.org) : Fundamental package for scientific computing with Python.\n",
    "- [matplotlib](http://matplotlib.org) : Library for plotting graphs in Python.\n",
    "- [RL-Glue](http://www.jmlr.org/papers/v10/tanner09a.html) : Library for reinforcement learning experiments.\n",
    "- [tqdm](https://tqdm.github.io/) : A package to display progress bar when running experiments"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "501d238ebcf7d6e116e6849af15dbb07",
     "grade": false,
     "grade_id": "cell-e55836e566b8c01d",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# Import necessary libraries\n",
    "# DO NOT IMPORT OTHER LIBRARIES - This will break the autograder.\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "%matplotlib inline\n",
    "\n",
    "import os\n",
    "from tqdm import tqdm\n",
    "\n",
    "from rl_glue import RLGlue\n",
    "from environment import BaseEnvironment\n",
    "from agent import BaseAgent\n",
    "from dummy_environment import DummyEnvironment\n",
    "from dummy_agent import DummyAgent"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "3eaa22ac9a78ff78d9599150ff83877d",
     "grade": false,
     "grade_id": "cell-c3d5d347d1726775",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Section 1: Write Parameter Study Script\n",
    "\n",
    "In this section, you will write a script for performing parameter studies. You will implement the `run_experiment()` function. This function takes an environment and agent and performs a parameter study on the step-size and termperature parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 43,
   "metadata": {
    "deletable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "74fc83f73918b97072a2c3dffa909bdd",
     "grade": false,
     "grade_id": "cell-e53c85e6098a975b",
     "locked": false,
     "schema_version": 3,
     "solution": true,
     "task": false
    }
   },
   "outputs": [],
   "source": [
    "# -----------\n",
    "# Graded Cell\n",
    "# -----------\n",
    "\n",
    "def run_experiment(environment, agent, environment_parameters, agent_parameters, experiment_parameters):\n",
    "    \n",
    "    \"\"\"\n",
    "    Assume environment_parameters dict contains:\n",
    "    {\n",
    "        input_dim: integer,\n",
    "        num_actions: integer,\n",
    "        discount_factor: float\n",
    "    }\n",
    "    \n",
    "    Assume agent_parameters dict contains:\n",
    "    {\n",
    "        step_size: 1D numpy array of floats,\n",
    "        tau: 1D numpy array of floats\n",
    "    }\n",
    "    \n",
    "    Assume experiment_parameters dict contains:\n",
    "    {\n",
    "        num_runs: integer,\n",
    "        num_episodes: integer\n",
    "    }    \n",
    "    \"\"\"\n",
    "    \n",
    "    ### Instantiate rl_glue from RLGlue    \n",
    "    rl_glue = RLGlue(environment, agent)\n",
    "\n",
    "    os.system('sleep 1') # to prevent tqdm printing out-of-order\n",
    "    \n",
    "    ### START CODE HERE\n",
    "    \n",
    "    ### Initialize agent_sum_reward to zero in the form of a numpy array \n",
    "    # with shape (number of values for tau, number of step-sizes, number of runs, number of episodes)\n",
    "    \n",
    "    tau = agent_parameters[\"tau\"]\n",
    "    step_size = agent_parameters[\"step_size\"]\n",
    "    num_runs = experiment_parameters[\"num_runs\"]\n",
    "    num_episodes = experiment_parameters[\"num_episodes\"]\n",
    "    agent_sum_reward = np.zeros((len(tau), (len(step_size)), num_runs, num_episodes))\n",
    "    ### END CODE HERE\n",
    "    # your code here\n",
    "    \n",
    "    \n",
    "    \n",
    "    \n",
    "    ### Replace the Nones with the correct values in the rest of the code\n",
    "\n",
    "    # for loop over different values of tau\n",
    "    # tqdm is used to show a progress bar for completing the parameter study\n",
    "    # for i in tqdm(range(TODO)):\n",
    "    # your code here\n",
    "    \n",
    "    \n",
    "    for i in tqdm(range(len(tau))):\n",
    "        \n",
    "        # for loop over different values of the step-size\n",
    "        # for j in range(TODO):\n",
    "        # your code here\n",
    "        for j in range(len(step_size)):\n",
    "\n",
    "            ### Specify env_info \n",
    "            env_info = {}\n",
    "\n",
    "            ### Specify agent_info\n",
    "            agent_info = {\"num_actions\": environment_parameters[\"num_actions\"],\n",
    "                          \"input_dim\": environment_parameters[\"input_dim\"],\n",
    "                          \"discount_factor\": environment_parameters[\"discount_factor\"],\n",
    "                          \"tau\": agent_parameters[\"tau\"][i],\n",
    "                          \"step_size\": agent_parameters[\"step_size\"][j]\n",
    "                         }\n",
    "            \n",
    "            # your code here\n",
    "            \n",
    "\n",
    "            # for loop over runs\n",
    "            # for run in range(TODO): \n",
    "            # your code here\n",
    "            for run in range(num_runs):\n",
    "                # Set the seed\n",
    "                agent_info[\"seed\"] = agent_parameters[\"seed\"] * experiment_parameters[\"num_runs\"] + run\n",
    "                \n",
    "                # Beginning of the run            \n",
    "                rl_glue.rl_init(agent_info, env_info)\n",
    "\n",
    "                # for episode in range(TODO):\n",
    "                # your code here\n",
    "                for episode in range(num_episodes):\n",
    "                    \n",
    "                    # Run episode\n",
    "                    rl_glue.rl_episode(0) # no step limit\n",
    "\n",
    "                    ### Store sum of reward\n",
    "                    # agent_sum_reward[None, None, None, None] = rl_glue.rl_agent_message(\"get_sum_reward\")\n",
    "                    # your code here\n",
    "                    agent_sum_reward[i, j, run, episode] = rl_glue.rl_agent_message(\"get_sum_reward\")\n",
    "\n",
    "            if not os.path.exists('results'):\n",
    "                    os.makedirs('results')\n",
    "\n",
    "            save_name = \"{}\".format(rl_glue.agent.name).replace('.','')\n",
    "\n",
    "            # save sum reward\n",
    "            np.save(\"results/sum_reward_{}\".format(save_name), agent_sum_reward) "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "3e49955e441ad30ea64396c178bcef79",
     "grade": false,
     "grade_id": "cell-b5e0bf5f2c8ed098",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "Run the following code to test your implementation of `run_experiment()` given a dummy agent and a dummy environment for 100 runs, 100 episodes, 12 values of the step-size, and 4 values of $\\tau$:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 44,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 4/4 [00:11<00:00,  2.85s/it]"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Passed the assert!\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "\n"
     ]
    }
   ],
   "source": [
    "# --------------\n",
    "# Debugging Cell\n",
    "# --------------\n",
    "# Feel free to make any changes to this cell to debug your code\n",
    "\n",
    "# Experiment parameters\n",
    "experiment_parameters = {\n",
    "    \"num_runs\" : 100,\n",
    "    \"num_episodes\" : 100,\n",
    "}\n",
    "\n",
    "# Environment parameters\n",
    "environment_parameters = {\n",
    "    \"input_dim\" : 8,\n",
    "    \"num_actions\": 4, \n",
    "    \"discount_factor\" : 0.99\n",
    "}\n",
    "\n",
    "agent_parameters = {\n",
    "    \"step_size\": 3e-5 * np.power(2.0, np.array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])),\n",
    "    \"tau\": np.array([0.001, 0.01, 0.1, 1.0]),\n",
    "    \"seed\": 0\n",
    "}\n",
    "\n",
    "test_env = DummyEnvironment\n",
    "test_agent = DummyAgent\n",
    "\n",
    "run_experiment(test_env, \n",
    "               test_agent, \n",
    "               environment_parameters, \n",
    "               agent_parameters, \n",
    "               experiment_parameters)\n",
    "\n",
    "sum_reward_dummy_agent = np.load(\"results/sum_reward_dummy_agent.npy\")\n",
    "sum_reward_dummy_agent_answer = np.load(\"asserts/sum_reward_dummy_agent.npy\")\n",
    "assert(np.allclose(sum_reward_dummy_agent, sum_reward_dummy_agent_answer))\n",
    "\n",
    "print(\"Passed the assert!\")\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 45,
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "code",
     "checksum": "74063f15ad25a724f198b4be2f3f38d7",
     "grade": true,
     "grade_id": "cell-2284e3bbbcf29936",
     "locked": true,
     "points": 10,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|██████████| 4/4 [00:11<00:00,  2.82s/it]\n"
     ]
    }
   ],
   "source": [
    "# -----------\n",
    "# Tested Cell\n",
    "# -----------\n",
    "# The contents of the cell will be tested by the autograder.\n",
    "# If they do not pass here, they will not pass there.\n",
    "\n",
    "# Experiment parameters\n",
    "experiment_parameters = {\n",
    "    \"num_runs\" : 100,\n",
    "    \"num_episodes\" : 100,\n",
    "}\n",
    "\n",
    "# Environment parameters\n",
    "environment_parameters = {\n",
    "    \"input_dim\" : 8,\n",
    "    \"num_actions\": 4, \n",
    "    \"discount_factor\" : 0.99\n",
    "}\n",
    "\n",
    "agent_parameters = {\n",
    "    \"step_size\": 3e-5 * np.power(2.0, np.array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])),\n",
    "    \"tau\": np.array([0.001, 0.01, 0.1, 1.0]),\n",
    "    \"seed\": 0\n",
    "}\n",
    "\n",
    "test_env = DummyEnvironment\n",
    "test_agent = DummyAgent\n",
    "\n",
    "run_experiment(test_env, \n",
    "               test_agent,  \n",
    "               environment_parameters, \n",
    "               agent_parameters, \n",
    "               experiment_parameters)\n",
    "\n",
    "sum_reward_dummy_agent = np.load(\"results/sum_reward_dummy_agent.npy\")\n",
    "sum_reward_dummy_agent_answer = np.load(\"asserts/sum_reward_dummy_agent.npy\")\n",
    "assert(np.allclose(sum_reward_dummy_agent, sum_reward_dummy_agent_answer))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "43eb94b7507dc4c618e6bdc6a89dd471",
     "grade": false,
     "grade_id": "cell-ee4685363ad498e2",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "## Section 2: The Parameter Study for the Agent with Neural Network and Adam Optimizer\n",
    "\n",
    "Now that you implemented run_experiment() for a dummy agent, let’s examine the performance of the agent that you implemented in Assignment 2 for different values of the step-size parameter. To do so, we can use parameter sensitivity curves. As you know, in parameter sensitivity curves, on the y-axis, we have our performance measure and on the x-axis, we have the values of the parameter we are testing. We will use the average of returns over episodes, averaged over 30 runs as our performance metric.\n",
    "\n",
    "Recall that we used a step-size of 10^{-3}$ in Assignment 2 and got reasonable performance. We can use this value to construct a sensible set of step-sizes for our parameter study by multiplying it with powers of two:\n",
    "\n",
    "$10^{-3} \\times 2^x$ where $x \\in \\{-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3\\}$\n",
    "\n",
    "We use powers of two because doing so produces smaller increments in the step-size for smaller step-size values and larger jumps for larger step-sizes. \n",
    "\n",
    "Let’s take a look at the results for this set of step-sizes.\n",
    "\n",
    "<img src=\"parameter_study.png\" alt=\"Drawing\" style=\"width: 500px;\"/>\n",
    "\n",
    "Observe that the best performance is achieved for step-sizes in range $[10^{-4}, 10^{-3}]$. This includes the step-size that we used in Assignment 2! The performance degrades for higher and lower step-size values. Since the range of step-sizes for which the agent performs well is not broad, choosing a good step-size is challenging for this problem.\n",
    "\n",
    "As we mentioned above, we used the average of returns over episodes, averaged over 30 runs as our performance metric. This metric gives an overall estimation of the agent's performance over the episodes. If we want to study the effect of the step-size parameter on the agent's early performance or final performance, we should use different metrics. For example, to study the effect of the step-size parameter on early performance, we could use the average of returns over the first 100 episodes, averaged over 30 runs. When conducting a parameter study, you may consider these for defining your performance metric!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": false,
    "editable": false,
    "nbgrader": {
     "cell_type": "markdown",
     "checksum": "16533ce7f50f0d22dd5774ccb53f50de",
     "grade": false,
     "grade_id": "cell-a682d7f91cd82cc3",
     "locked": true,
     "schema_version": 3,
     "solution": false,
     "task": false
    }
   },
   "source": [
    "### **Wrapping up!** \n",
    "\n",
    "Congratulations, you have completed the Capstone project! In Assignment 1 (Module 1), you designed the reward function for the Lunar Lander environment. In Assignment 2 (Module 4), you implemented your Expected Sarsa agent with a neural network and Adam optimizer. In Assignment 3 (Module 5), you conducted a careful parameter study and examined the effect of changing the step size parameter on the performance of the agent. Thanks for sticking with us throughout the specialization! At this point, you should have a solid foundation for formulating your own reinforcement learning problems, understanding advanced topics in reinforcement learning, and even pursuing graduate studies."
   ]
  }
 ],
 "metadata": {
  "coursera": {
   "course_slug": "complete-reinforcement-learning-system",
   "graded_item_id": "h4ZLq",
   "launcher_item_id": "rbt6a"
  },
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "markdown",
	"checksum": "b64717e9bcdc4ab796bc1d0fdbb12c4b",
	"grade": false,
	"grade_id": "cell-f64cdc46435b25eb",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"source": [
	"# Assignment 3 - Completing the Parameter Study\n",
	"\n",
	"Welcome to Course 4 Programming Assignment 3. In the previous assignments, you completed the implementation of the Lunar Lander environment and implemented an agent with neural networks and the Adam optimizer. As you may remember, we discussed a number of key meta-parameters that affect the performance of the agent (e.g. the step-size, the temperature parameter for the softmax policy, the capacity of the replay buffer). We can use rules of thumb for picking reasonable values for these meta-parameters. However, we can also study the impact of these meta-parameters on the performance of the agent to gain insight.\n",
	"\n",
	"In this assignment, you will conduct a careful experiment analyzing performance of an agent, under different values of the step-size parameter.\n",
	"\n",
	"In this assignment, you will:\n",
	"\n",
	"- write a script to run your agent and environment on a set of parameters, to determine performance across these parameters.\n",
	"- gain insight into the impact of the step-size parameter on agent performance by examining its parameter sensitivity curve."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "markdown",
	"checksum": "a5e3743388aba0f744da57d8af1c5270",
	"grade": false,
	"grade_id": "cell-4359dc74745ffb31",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"source": [
	"## Packages\n",
	"\n",
	"- [numpy](www.numpy.org) : Fundamental package for scientific computing with Python.\n",
	"- [matplotlib](http://matplotlib.org) : Library for plotting graphs in Python.\n",
	"- [RL-Glue](http://www.jmlr.org/papers/v10/tanner09a.html) : Library for reinforcement learning experiments.\n",
	"- [tqdm](https://tqdm.github.io/) : A package to display progress bar when running experiments"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "code",
	"checksum": "501d238ebcf7d6e116e6849af15dbb07",
	"grade": false,
	"grade_id": "cell-e55836e566b8c01d",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"outputs": [],
	"source": [
	"# Import necessary libraries\n",
	"# DO NOT IMPORT OTHER LIBRARIES - This will break the autograder.\n",
	"import numpy as np\n",
	"import matplotlib.pyplot as plt\n",
	"%matplotlib inline\n",
	"\n",
	"import os\n",
	"from tqdm import tqdm\n",
	"\n",
	"from rl_glue import RLGlue\n",
	"from environment import BaseEnvironment\n",
	"from agent import BaseAgent\n",
	"from dummy_environment import DummyEnvironment\n",
	"from dummy_agent import DummyAgent"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "markdown",
	"checksum": "3eaa22ac9a78ff78d9599150ff83877d",
	"grade": false,
	"grade_id": "cell-c3d5d347d1726775",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"source": [
	"## Section 1: Write Parameter Study Script\n",
	"\n",
	"In this section, you will write a script for performing parameter studies. You will implement the `run_experiment()` function. This function takes an environment and agent and performs a parameter study on the step-size and termperature parameters."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 43,
	"metadata": {
	"deletable": false,
	"nbgrader": {
	"cell_type": "code",
	"checksum": "74fc83f73918b97072a2c3dffa909bdd",
	"grade": false,
	"grade_id": "cell-e53c85e6098a975b",
	"locked": false,
	"schema_version": 3,
	"solution": true,
	"task": false
	}
	},
	"outputs": [],
	"source": [
	"# -----------\n",
	"# Graded Cell\n",
	"# -----------\n",
	"\n",
	"def run_experiment(environment, agent, environment_parameters, agent_parameters, experiment_parameters):\n",
	" \n",
	" \"\"\"\n",
	" Assume environment_parameters dict contains:\n",
	" {\n",
	" input_dim: integer,\n",
	" num_actions: integer,\n",
	" discount_factor: float\n",
	" }\n",
	" \n",
	" Assume agent_parameters dict contains:\n",
	" {\n",
	" step_size: 1D numpy array of floats,\n",
	" tau: 1D numpy array of floats\n",
	" }\n",
	" \n",
	" Assume experiment_parameters dict contains:\n",
	" {\n",
	" num_runs: integer,\n",
	" num_episodes: integer\n",
	" } \n",
	" \"\"\"\n",
	" \n",
	" ### Instantiate rl_glue from RLGlue \n",
	" rl_glue = RLGlue(environment, agent)\n",
	"\n",
	" os.system('sleep 1') # to prevent tqdm printing out-of-order\n",
	" \n",
	" ### START CODE HERE\n",
	" \n",
	" ### Initialize agent_sum_reward to zero in the form of a numpy array \n",
	" # with shape (number of values for tau, number of step-sizes, number of runs, number of episodes)\n",
	" \n",
	" tau = agent_parameters[\"tau\"]\n",
	" step_size = agent_parameters[\"step_size\"]\n",
	" num_runs = experiment_parameters[\"num_runs\"]\n",
	" num_episodes = experiment_parameters[\"num_episodes\"]\n",
	" agent_sum_reward = np.zeros((len(tau), (len(step_size)), num_runs, num_episodes))\n",
	" ### END CODE HERE\n",
	" # your code here\n",
	" \n",
	" \n",
	" \n",
	" \n",
	" ### Replace the Nones with the correct values in the rest of the code\n",
	"\n",
	" # for loop over different values of tau\n",
	" # tqdm is used to show a progress bar for completing the parameter study\n",
	" # for i in tqdm(range(TODO)):\n",
	" # your code here\n",
	" \n",
	" \n",
	" for i in tqdm(range(len(tau))):\n",
	" \n",
	" # for loop over different values of the step-size\n",
	" # for j in range(TODO):\n",
	" # your code here\n",
	" for j in range(len(step_size)):\n",
	"\n",
	" ### Specify env_info \n",
	" env_info = {}\n",
	"\n",
	" ### Specify agent_info\n",
	" agent_info = {\"num_actions\": environment_parameters[\"num_actions\"],\n",
	" \"input_dim\": environment_parameters[\"input_dim\"],\n",
	" \"discount_factor\": environment_parameters[\"discount_factor\"],\n",
	" \"tau\": agent_parameters[\"tau\"][i],\n",
	" \"step_size\": agent_parameters[\"step_size\"][j]\n",
	" }\n",
	" \n",
	" # your code here\n",
	" \n",
	"\n",
	" # for loop over runs\n",
	" # for run in range(TODO): \n",
	" # your code here\n",
	" for run in range(num_runs):\n",
	" # Set the seed\n",
	" agent_info[\"seed\"] = agent_parameters[\"seed\"] * experiment_parameters[\"num_runs\"] + run\n",
	" \n",
	" # Beginning of the run \n",
	" rl_glue.rl_init(agent_info, env_info)\n",
	"\n",
	" # for episode in range(TODO):\n",
	" # your code here\n",
	" for episode in range(num_episodes):\n",
	" \n",
	" # Run episode\n",
	" rl_glue.rl_episode(0) # no step limit\n",
	"\n",
	" ### Store sum of reward\n",
	" # agent_sum_reward[None, None, None, None] = rl_glue.rl_agent_message(\"get_sum_reward\")\n",
	" # your code here\n",
	" agent_sum_reward[i, j, run, episode] = rl_glue.rl_agent_message(\"get_sum_reward\")\n",
	"\n",
	" if not os.path.exists('results'):\n",
	" os.makedirs('results')\n",
	"\n",
	" save_name = \"{}\".format(rl_glue.agent.name).replace('.','')\n",
	"\n",
	" # save sum reward\n",
	" np.save(\"results/sum_reward_{}\".format(save_name), agent_sum_reward) "
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "markdown",
	"checksum": "3e49955e441ad30ea64396c178bcef79",
	"grade": false,
	"grade_id": "cell-b5e0bf5f2c8ed098",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"source": [
	"Run the following code to test your implementation of `run_experiment()` given a dummy agent and a dummy environment for 100 runs, 100 episodes, 12 values of the step-size, and 4 values of $\\tau$:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 44,
	"metadata": {},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"100%\|██████████\| 4/4 [00:11<00:00, 2.85s/it]"
	]
	},
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"Passed the assert!\n"
	]
	},
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"\n"
	]
	}
	],
	"source": [
	"# --------------\n",
	"# Debugging Cell\n",
	"# --------------\n",
	"# Feel free to make any changes to this cell to debug your code\n",
	"\n",
	"# Experiment parameters\n",
	"experiment_parameters = {\n",
	" \"num_runs\" : 100,\n",
	" \"num_episodes\" : 100,\n",
	"}\n",
	"\n",
	"# Environment parameters\n",
	"environment_parameters = {\n",
	" \"input_dim\" : 8,\n",
	" \"num_actions\": 4, \n",
	" \"discount_factor\" : 0.99\n",
	"}\n",
	"\n",
	"agent_parameters = {\n",
	" \"step_size\": 3e-5 * np.power(2.0, np.array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])),\n",
	" \"tau\": np.array([0.001, 0.01, 0.1, 1.0]),\n",
	" \"seed\": 0\n",
	"}\n",
	"\n",
	"test_env = DummyEnvironment\n",
	"test_agent = DummyAgent\n",
	"\n",
	"run_experiment(test_env, \n",
	" test_agent, \n",
	" environment_parameters, \n",
	" agent_parameters, \n",
	" experiment_parameters)\n",
	"\n",
	"sum_reward_dummy_agent = np.load(\"results/sum_reward_dummy_agent.npy\")\n",
	"sum_reward_dummy_agent_answer = np.load(\"asserts/sum_reward_dummy_agent.npy\")\n",
	"assert(np.allclose(sum_reward_dummy_agent, sum_reward_dummy_agent_answer))\n",
	"\n",
	"print(\"Passed the assert!\")\n"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 45,
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "code",
	"checksum": "74063f15ad25a724f198b4be2f3f38d7",
	"grade": true,
	"grade_id": "cell-2284e3bbbcf29936",
	"locked": true,
	"points": 10,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"outputs": [
	{
	"name": "stderr",
	"output_type": "stream",
	"text": [
	"100%\|██████████\| 4/4 [00:11<00:00, 2.82s/it]\n"
	]
	}
	],
	"source": [
	"# -----------\n",
	"# Tested Cell\n",
	"# -----------\n",
	"# The contents of the cell will be tested by the autograder.\n",
	"# If they do not pass here, they will not pass there.\n",
	"\n",
	"# Experiment parameters\n",
	"experiment_parameters = {\n",
	" \"num_runs\" : 100,\n",
	" \"num_episodes\" : 100,\n",
	"}\n",
	"\n",
	"# Environment parameters\n",
	"environment_parameters = {\n",
	" \"input_dim\" : 8,\n",
	" \"num_actions\": 4, \n",
	" \"discount_factor\" : 0.99\n",
	"}\n",
	"\n",
	"agent_parameters = {\n",
	" \"step_size\": 3e-5 * np.power(2.0, np.array([-6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5])),\n",
	" \"tau\": np.array([0.001, 0.01, 0.1, 1.0]),\n",
	" \"seed\": 0\n",
	"}\n",
	"\n",
	"test_env = DummyEnvironment\n",
	"test_agent = DummyAgent\n",
	"\n",
	"run_experiment(test_env, \n",
	" test_agent, \n",
	" environment_parameters, \n",
	" agent_parameters, \n",
	" experiment_parameters)\n",
	"\n",
	"sum_reward_dummy_agent = np.load(\"results/sum_reward_dummy_agent.npy\")\n",
	"sum_reward_dummy_agent_answer = np.load(\"asserts/sum_reward_dummy_agent.npy\")\n",
	"assert(np.allclose(sum_reward_dummy_agent, sum_reward_dummy_agent_answer))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "markdown",
	"checksum": "43eb94b7507dc4c618e6bdc6a89dd471",
	"grade": false,
	"grade_id": "cell-ee4685363ad498e2",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"source": [
	"## Section 2: The Parameter Study for the Agent with Neural Network and Adam Optimizer\n",
	"\n",
	"Now that you implemented run_experiment() for a dummy agent, let’s examine the performance of the agent that you implemented in Assignment 2 for different values of the step-size parameter. To do so, we can use parameter sensitivity curves. As you know, in parameter sensitivity curves, on the y-axis, we have our performance measure and on the x-axis, we have the values of the parameter we are testing. We will use the average of returns over episodes, averaged over 30 runs as our performance metric.\n",
	"\n",
	"Recall that we used a step-size of 10^{-3}$ in Assignment 2 and got reasonable performance. We can use this value to construct a sensible set of step-sizes for our parameter study by multiplying it with powers of two:\n",
	"\n",
	"$10^{-3} \\times 2^x$ where $x \\in \\{-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3\\}$\n",
	"\n",
	"We use powers of two because doing so produces smaller increments in the step-size for smaller step-size values and larger jumps for larger step-sizes. \n",
	"\n",
	"Let’s take a look at the results for this set of step-sizes.\n",
	"\n",
	"<img src=\"parameter_study.png\" alt=\"Drawing\" style=\"width: 500px;\"/>\n",
	"\n",
	"Observe that the best performance is achieved for step-sizes in range $[10^{-4}, 10^{-3}]$. This includes the step-size that we used in Assignment 2! The performance degrades for higher and lower step-size values. Since the range of step-sizes for which the agent performs well is not broad, choosing a good step-size is challenging for this problem.\n",
	"\n",
	"As we mentioned above, we used the average of returns over episodes, averaged over 30 runs as our performance metric. This metric gives an overall estimation of the agent's performance over the episodes. If we want to study the effect of the step-size parameter on the agent's early performance or final performance, we should use different metrics. For example, to study the effect of the step-size parameter on early performance, we could use the average of returns over the first 100 episodes, averaged over 30 runs. When conducting a parameter study, you may consider these for defining your performance metric!"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": false,
	"editable": false,
	"nbgrader": {
	"cell_type": "markdown",
	"checksum": "16533ce7f50f0d22dd5774ccb53f50de",
	"grade": false,
	"grade_id": "cell-a682d7f91cd82cc3",
	"locked": true,
	"schema_version": 3,
	"solution": false,
	"task": false
	}
	},
	"source": [
	"### Wrapping up! \n",
	"\n",
	"Congratulations, you have completed the Capstone project! In Assignment 1 (Module 1), you designed the reward function for the Lunar Lander environment. In Assignment 2 (Module 4), you implemented your Expected Sarsa agent with a neural network and Adam optimizer. In Assignment 3 (Module 5), you conducted a careful parameter study and examined the effect of changing the step size parameter on the performance of the agent. Thanks for sticking with us throughout the specialization! At this point, you should have a solid foundation for formulating your own reinforcement learning problems, understanding advanced topics in reinforcement learning, and even pursuing graduate studies."
	]
	}
	],
	"metadata": {
	"coursera": {
	"course_slug": "complete-reinforcement-learning-system",
	"graded_item_id": "h4ZLq",
	"launcher_item_id": "rbt6a"
	},
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.7.6"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}