Skip to content

Instantly share code, notes, and snippets.

@qazwsxal
Last active January 28, 2023 13:56
Show Gist options
  • Save qazwsxal/6cc1c5cf16a23ae6ea8d5c369828fa80 to your computer and use it in GitHub Desktop.
Save qazwsxal/6cc1c5cf16a23ae6ea8d5c369828fa80 to your computer and use it in GitHub Desktop.
OpenAI Gym Lab Demo (Durham Students Click Here!)
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "OpenAI Gym Demo.ipynb",
"provenance": [],
"collapsed_sections": [],
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/qazwsxal/6cc1c5cf16a23ae6ea8d5c369828fa80/gym-demo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "QTNU1mwGB1ZD"
},
"source": [
"**Initialise**"
]
},
{
"cell_type": "code",
"metadata": {
"id": "1tfkgPuF3MbI",
"outputId": "c38d0a71-0619-418c-ebaf-197bf56aa336",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 833
}
},
"source": [
"%%bash\n",
"# This Cell is only used to make sure cartpole runs \n",
"\n",
"# install required system dependencies\n",
"apt-get install -y xvfb x11-utils\n",
"\n",
"# install required python dependencies (might need to install additional gym extras depending)\n",
"pip install pyvirtualdisplay PyOpenGL PyOpenGL-accelerate\n"
],
"execution_count": 1,
"outputs": [
{
"output_type": "stream",
"text": [
"Reading package lists...\n",
"Building dependency tree...\n",
"Reading state information...\n",
"The following additional packages will be installed:\n",
" libxxf86dga1\n",
"Suggested packages:\n",
" mesa-utils\n",
"The following NEW packages will be installed:\n",
" libxxf86dga1 x11-utils xvfb\n",
"0 upgraded, 3 newly installed, 0 to remove and 21 not upgraded.\n",
"Need to get 993 kB of archives.\n",
"After this operation, 2,977 kB of additional disk space will be used.\n",
"Get:1 http://archive.ubuntu.com/ubuntu bionic/main amd64 libxxf86dga1 amd64 2:1.1.4-1 [13.7 kB]\n",
"Get:2 http://archive.ubuntu.com/ubuntu bionic/main amd64 x11-utils amd64 7.7+3build1 [196 kB]\n",
"Get:3 http://archive.ubuntu.com/ubuntu bionic-updates/universe amd64 xvfb amd64 2:1.19.6-1ubuntu4.7 [783 kB]\n",
"Fetched 993 kB in 2s (518 kB/s)\n",
"Selecting previously unselected package libxxf86dga1:amd64.\r\n",
"(Reading database ... \r(Reading database ... 5%\r(Reading database ... 10%\r(Reading database ... 15%\r(Reading database ... 20%\r(Reading database ... 25%\r(Reading database ... 30%\r(Reading database ... 35%\r(Reading database ... 40%\r(Reading database ... 45%\r(Reading database ... 50%\r(Reading database ... 55%\r(Reading database ... 60%\r(Reading database ... 65%\r(Reading database ... 70%\r(Reading database ... 75%\r(Reading database ... 80%\r(Reading database ... 85%\r(Reading database ... 90%\r(Reading database ... 95%\r(Reading database ... 100%\r(Reading database ... 144611 files and directories currently installed.)\r\n",
"Preparing to unpack .../libxxf86dga1_2%3a1.1.4-1_amd64.deb ...\r\n",
"Unpacking libxxf86dga1:amd64 (2:1.1.4-1) ...\r\n",
"Selecting previously unselected package x11-utils.\r\n",
"Preparing to unpack .../x11-utils_7.7+3build1_amd64.deb ...\r\n",
"Unpacking x11-utils (7.7+3build1) ...\r\n",
"Selecting previously unselected package xvfb.\r\n",
"Preparing to unpack .../xvfb_2%3a1.19.6-1ubuntu4.7_amd64.deb ...\r\n",
"Unpacking xvfb (2:1.19.6-1ubuntu4.7) ...\r\n",
"Setting up xvfb (2:1.19.6-1ubuntu4.7) ...\r\n",
"Setting up libxxf86dga1:amd64 (2:1.1.4-1) ...\r\n",
"Setting up x11-utils (7.7+3build1) ...\r\n",
"Processing triggers for man-db (2.8.3-2ubuntu0.1) ...\r\n",
"Processing triggers for libc-bin (2.27-3ubuntu1.2) ...\r\n",
"/sbin/ldconfig.real: /usr/local/lib/python3.6/dist-packages/ideep4py/lib/libmkldnn.so.0 is not a symbolic link\r\n",
"\r\n",
"Collecting pyvirtualdisplay\n",
" Downloading https://files.pythonhosted.org/packages/d0/8a/643043cc70791367bee2d19eb20e00ed1a246ac48e5dbe57bbbcc8be40a9/PyVirtualDisplay-1.3.2-py2.py3-none-any.whl\n",
"Requirement already satisfied: PyOpenGL in /usr/local/lib/python3.6/dist-packages (3.1.5)\n",
"Collecting PyOpenGL-accelerate\n",
" Downloading https://files.pythonhosted.org/packages/a2/3c/f42a62b7784c04b20f8b88d6c8ad04f4f20b0767b721102418aad94d8389/PyOpenGL-accelerate-3.1.5.tar.gz (538kB)\n",
"Collecting EasyProcess\n",
" Downloading https://files.pythonhosted.org/packages/48/3c/75573613641c90c6d094059ac28adb748560d99bd27ee6f80cce398f404e/EasyProcess-0.3-py2.py3-none-any.whl\n",
"Building wheels for collected packages: PyOpenGL-accelerate\n",
" Building wheel for PyOpenGL-accelerate (setup.py): started\n",
" Building wheel for PyOpenGL-accelerate (setup.py): finished with status 'done'\n",
" Created wheel for PyOpenGL-accelerate: filename=PyOpenGL_accelerate-3.1.5-cp36-cp36m-linux_x86_64.whl size=1593646 sha256=4dc2b8873280c38ddd9560bb7a7781c52009ff635150efde0954fb4ebbeeb191\n",
" Stored in directory: /root/.cache/pip/wheels/bd/21/77/99670ceca25fddb3c2b60a7ae44644b8253d1006e8ec417bcc\n",
"Successfully built PyOpenGL-accelerate\n",
"Installing collected packages: EasyProcess, pyvirtualdisplay, PyOpenGL-accelerate\n",
"Successfully installed EasyProcess-0.3 PyOpenGL-accelerate-3.1.5 pyvirtualdisplay-1.3.2\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "uKXoZoJ93Wsd"
},
"source": [
"# This Cell is only used to make sure cartpole runs \n",
"\n",
"import pyvirtualdisplay\n",
"\n",
"\n",
"_display = pyvirtualdisplay.Display(visible=False, # use False with Xvfb\n",
" size=(1400, 900))\n",
"_ = _display.start()"
],
"execution_count": 2,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "_TZefME0MTvA"
},
"source": [
"# this is a Deep Q Learning (DQN) agent including replay memory and a target network \n",
"# you can write a brief 8-10 line abstract detailing your submission and experiments here\n",
"# the code is based on https://github.com/seungeunrho/minimalRL/blob/master/dqn.py, which is released under the MIT licesne\n",
"# make sure you reference any code you have studied as above, with one comment line per reference\n",
"\n",
"# imports\n",
"import gym\n",
"import collections\n",
"import random\n",
"import numpy as np\n",
"import torch\n",
"import torch.nn as nn\n",
"import torch.nn.functional as F\n",
"import torch.optim as optim\n",
"import matplotlib.pyplot as plt\n",
"\n",
"# hyperparameters\n",
"learning_rate = 0.0005\n",
"gamma = 0.98\n",
"buffer_limit = 50000\n",
"batch_size = 32\n",
"video_every = 25\n",
"print_every = 5\n",
"\n",
"class ReplayBuffer():\n",
" def __init__(self):\n",
" self.buffer = collections.deque(maxlen=buffer_limit)\n",
" \n",
" def put(self, transition):\n",
" self.buffer.append(transition)\n",
" \n",
" def sample(self, n):\n",
" mini_batch = random.sample(self.buffer, n)\n",
" s_lst, a_lst, r_lst, s_prime_lst, done_mask_lst = [], [], [], [], []\n",
" \n",
" for transition in mini_batch:\n",
" s, a, r, s_prime, done_mask = transition\n",
" s_lst.append(s)\n",
" a_lst.append([a])\n",
" r_lst.append([r])\n",
" s_prime_lst.append(s_prime)\n",
" done_mask_lst.append([done_mask])\n",
"\n",
" return torch.tensor(s_lst, dtype=torch.float), torch.tensor(a_lst), \\\n",
" torch.tensor(r_lst), torch.tensor(s_prime_lst, dtype=torch.float), \\\n",
" torch.tensor(done_mask_lst)\n",
" \n",
" def size(self):\n",
" return len(self.buffer)\n",
"\n",
"class QNetwork(nn.Module):\n",
" def __init__(self, insize, outsize):\n",
" super(QNetwork, self).__init__()\n",
" self.fc1 = nn.Linear(insize, 256)\n",
" self.fc2 = nn.Linear(256, 84)\n",
" self.fc3 = nn.Linear(84, outsize)\n",
"\n",
" def forward(self, x):\n",
" x = x.view(x.size(0),-1)\n",
" x = F.relu(self.fc1(x))\n",
" x = F.relu(self.fc2(x))\n",
" x = self.fc3(x)\n",
" return x\n",
" \n",
" def sample_action(self, obs, epsilon):\n",
" out = self.forward(obs)\n",
" coin = random.random()\n",
" if coin < epsilon:\n",
" return random.randint(0,1)\n",
" else : \n",
" return out.argmax().item()\n",
" \n",
"def train(q, q_target, memory, optimizer):\n",
" for i in range(10):\n",
" s,a,r,s_prime,done_mask = memory.sample(batch_size)\n",
"\n",
" q_out = q(s)\n",
" q_a = q_out.gather(1,a)\n",
" max_q_prime = q_target(s_prime).max(1)[0].unsqueeze(1)\n",
" target = r + gamma * max_q_prime * done_mask\n",
" loss = F.smooth_l1_loss(q_a, target)\n",
" # Q(s,a) = R(s,a) + γ*Q_targ(s_prime)*done_mask\n",
" optimizer.zero_grad()\n",
" loss.backward()\n",
" optimizer.step()\n"
],
"execution_count": 3,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "4ck-chjFdScJ"
},
"source": [
"**Train**\n",
"\n",
"← You can download the videos from the videos folder in the files on the left"
]
},
{
"cell_type": "code",
"metadata": {
"id": "q6MTebC0u_wI"
},
"source": [
"# setup the environment, and record a video every 50 episodes.\n",
"env = gym.make('CartPole-v0')\n",
"env = gym.wrappers.Monitor(env, \"./video\", video_callable=lambda episode_id: (episode_id%video_every)==0,force=True)"
],
"execution_count": 4,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "SrlpvIidvJxh",
"outputId": "8f97c21b-a432-4f8f-a363-5a113a0034f7",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 34
}
},
"source": [
"env.reset()"
],
"execution_count": 5,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"array([-0.03431689, 0.04510229, -0.02069988, -0.0475874 ])"
]
},
"metadata": {
"tags": []
},
"execution_count": 5
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "2rdPwFaivOuf",
"outputId": "8659a719-3541-4745-fb1a-7caadf56c11f",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 286
}
},
"source": [
"plt.imshow(env.render(mode='rgb_array'))"
],
"execution_count": 6,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"<matplotlib.image.AxesImage at 0x7fcbb20fbc88>"
]
},
"metadata": {
"tags": []
},
"execution_count": 6
},
{
"output_type": "display_data",
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAW4AAAD8CAYAAABXe05zAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAATPUlEQVR4nO3df6zddZ3n8eerP2gBHUvlWjpt2TLaDYOzazFXxOgkDMYZZCcLk7gGdheJQ9LZBBNNzO7CbLKjyWJm4o64Zkd2OwHB1RWZUaQhuNpBNhM3EShYEagMVylLuy0tSIGCVnr73j/ut3iAlnvur55+7nk+kpPz/b6/n+857084fXHu537PPakqJEntWDDoBiRJU2NwS1JjDG5JaozBLUmNMbglqTEGtyQ1Zs6CO8kFSR5JMpbkqrl6HkkaNpmL67iTLAT+AfgAsAO4F7i0qh6e9SeTpCEzV++4zwHGqupnVfUr4Gbgojl6LkkaKovm6HFXAU/07O8A3n20waeeemqtXbt2jlqRpPZs376dp556Kkc6NlfBPakkG4ANAKeffjpbtmwZVCuSdNwZHR096rG5WirZCazp2V/d1V5WVRurarSqRkdGRuaoDUmaf+YquO8F1iU5I8kJwCXApjl6LkkaKnOyVFJVB5N8DPgOsBC4oaoemovnkqRhM2dr3FV1B3DHXD2+JA0rPzkpSY0xuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxM/rqsiTbgeeBceBgVY0mWQ58HVgLbAc+XFXPzKxNSdJhs/GO+/eqan1VjXb7VwF3VtU64M5uX5I0S+ZiqeQi4KZu+ybg4jl4DkkaWjMN7gK+m+S+JBu62oqq2tVt7wZWzPA5JEk9ZrTGDbyvqnYmeQuwOclPeg9WVSWpI53YBf0GgNNPP32GbUjS8JjRO+6q2tnd7wFuBc4BnkyyEqC733OUczdW1WhVjY6MjMykDUkaKtMO7iQnJ3nj4W3g94EHgU3A5d2wy4HbZtqkJOnXZrJUsgK4Ncnhx/mfVfW/ktwL3JLkCuBx4MMzb1OSdNi0g7uqfga84wj1p4H3z6QpSdLR+clJSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMMbklqjMEtSY0xuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTGTBneSG5LsSfJgT215ks1JHu3uT+nqSfKFJGNJHkjyzrlsXpKGUT/vuG8ELnhV7SrgzqpaB9zZ7QN8EFjX3TYA181Om5KkwyYN7qr6e+DnrypfBNzUbd8EXNxT/3JN+AGwLMnK2WpWkjT9Ne4VVbWr294NrOi2VwFP9Izb0dVeI8mGJFuSbNm7d+8025Ck4TPjX05WVQE1jfM2VtVoVY2OjIzMtA1JGhrTDe4nDy+BdPd7uvpOYE3PuNVdTZI0S6Yb3JuAy7vty4Hbeuof6a4uORd4tmdJRZI0CxZNNiDJ14DzgFOT7AD+DPhz4JYkVwCPAx/uht8BXAiMAS8CH52DniVpqE0a3FV16VEOvf8IYwu4cqZNSZKOzk9OSlJjDG5JaozBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMMbklqzKTBneSGJHuSPNhT+1SSnUm2drcLe45dnWQsySNJ/mCuGpekYdXPO+4bgQuOUL+2qtZ3tzsAkpwFXAK8vTvni0kWzlazkqQ+gruq/h74eZ+PdxFwc1UdqKrHmPi293Nm0J8k6VVmssb9sSQPdEspp3S1VcATPWN2dLXXSLIhyZYkW/bu3TuDNiRpuEw3uK8D3gqsB3YBfznVB6iqjVU1WlWjIyMj02xDkobPtIK7qp6sqvGqOgT8Nb9eDtkJrOkZurqrSZJmybSCO8nKnt0/Ag5fcbIJuCTJkiRnAOuAe2bWoiSp16LJBiT5GnAecGqSHcCfAeclWQ8UsB34E4CqeijJLcDDwEHgyqoan5vWJWk4TRrcVXXpEcrXv874a4BrZtKUJOno/OSkJDXG4JakxhjcktQYg1uSGmNwS1JjJr2qRJrvXtj7OOO/+gVLl53GCScvG3Q70qQMbg2dgwdeZPv/vpEafwmAF/ZsZ/xXL3L67/5rRn77dwfcnTQ5g1tDp8YP8vzObRw6+KtBtyJNi2vcktQYg1uSGmNwS1JjDG5JaozBLUmNMbg1dBaecCJvOv2fvKa+77H7OdRdIigdzwxuDZ0Fixaz9JTffE39xad3UIcODaAjaWoMbklqjMEtSY0xuCWpMZMGd5I1Se5K8nCSh5J8vKsvT7I5yaPd/SldPUm+kGQsyQNJ3jnXk5CkYdLPO+6DwCer6izgXODKJGcBVwF3VtU64M5uH+CDTHy7+zpgA3DdrHctSUNs0uCuql1VdX+3/TywDVgFXATc1A27Cbi4274I+HJN+AGwLMnKWe9ckobUlNa4k6wFzgbuBlZU1a7u0G5gRbe9Cnii57QdXe3Vj7UhyZYkW/bu3TvFtiVpePUd3EneAHwD+ERVPdd7rKoKqKk8cVVtrKrRqhodGRmZyqmSNNT6Cu4ki5kI7a9W1Te78pOHl0C6+z1dfSewpuf01V1NkjQL+rmqJMD1wLaq+lzPoU3A5d325cBtPfWPdFeXnAs827OkIkmaoX6+Aee9wGXAj5Ns7Wp/Cvw5cEuSK4DHgQ93x+4ALgTGgBeBj85qx5I05CYN7qr6PpCjHH7/EcYXcOUM+5IkHYWfnJSkxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuDaWT37KWBYtOeEXt0MEDvLDnsQF1JPXP4NZQesNp61iweOkraodeOsD+3Y8OqCOpfwa3JDXG4JakxhjcktQYg1uSGtPPlwWvSXJXkoeTPJTk4139U0l2Jtna3S7sOefqJGNJHknyB3M5AUkaNv18WfBB4JNVdX+SNwL3JdncHbu2qv5z7+AkZwGXAG8HfhP4uyT/uKrGZ7NxSRpWk77jrqpdVXV/t/08sA1Y9TqnXATcXFUHquoxJr7t/ZzZaFaSNMU17iRrgbOBu7vSx5I8kOSGJKd0tVXAEz2n7eD1g16SNAV9B3eSNwDfAD5RVc8B1wFvBdYDu4C/nMoTJ9mQZEuSLXv37p3KqZI01PoK7iSLmQjtr1bVNwGq6smqGq+qQ8Bf8+vlkJ3Amp7TV3e1V6iqjVU1WlWjIyMjM5mDJA2Vfq4qCXA9sK2qPtdTX9kz7I+AB7vtTcAlSZYkOQNYB9wzey1L0nDr56qS9wKXAT9OsrWr/SlwaZL1QAHbgT8BqKqHktwCPMzEFSlXekWJJM2eSYO7qr4P5AiH7nidc64BrplBX5Kko/CTk5LUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMMbg2lLFjA0mWnvaZ+4Nm9HBp/aQAdSf0zuDWUFixczPK3vfYb9fZt/yHjB14cQEdS//r5s65SM+69914+85nP9DX27DVL+We/8xuvqP3ylwf46B//MS8cODTp+cuXL+eLX/wiS5YsmVav0nQZ3JpXnnzySb71rW/1N/h9Z3LB28/j4KHDwVuMj+/n29/+Nj9/7heTnr5y5UrGx/1T8zr2DG4NrWIBP3nu3fzfX5wJwMK8xJknfnfAXUmTM7g1tP7fL97K9hffTnW/6hmvxTz+4lkcKv9Z6PjmLyc1tMZr4cuhfdjeA2t46dDiAXUk9aefLwtemuSeJD9K8lCST3f1M5LcnWQsydeTnNDVl3T7Y93xtXM7BWl6liz4BQs4+IraqhPHOGHBgQF1JPWnn3fcB4Dzq+odwHrggiTnAn8BXFtVbwOeAa7oxl8BPNPVr+3GScedFUsf57d/425OXriPF57fyTNPP8qC/f8Hv9tax7t+viy4gP3d7uLuVsD5wL/s6jcBnwKuAy7qtgH+FvivSdI9jnTc2Dq2m9z63yjgnm072fX0fkJxyJeqjnN9/RYmyULgPuBtwF8BPwX2VdXhnzN3AKu67VXAEwBVdTDJs8CbgaeO9vi7d+/ms5/97LQmIPXatm1b32O3797H9t37XlGbSmTv37+fz3/+8yxe7Jq4Zt/u3buPeqyv4K6Jnx3XJ1kG3AqcOdOmkmwANgCsWrWKyy67bKYPKbF582a+9KUvHZPnOumkk7j00ks58cQTj8nzabh85StfOeqxKV33VFX7ktwFvAdYlmRR9657NbCzG7YTWAPsSLIIeBPw9BEeayOwEWB0dLROO+21fzdCmqpTTjnlmD3XggULWLFiBSeddNIxe04Nj9f7Sa6fq0pGunfaJDkR+ACwDbgL+FA37HLgtm57U7dPd/x7rm9L0uzp5x33SuCmbp17AXBLVd2e5GHg5iT/CfghcH03/nrgfyQZA34OXDIHfUvS0OrnqpIHgLOPUP8Z8Jo/r1ZVvwT+xax0J0l6DT85KUmNMbglqTH+NR3NKytWrODiiy8+Js+1fPlyFi5ceEyeS+plcGteede73sWtt9466DakOeVSiSQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5IaY3BLUmMMbklqTD9fFrw0yT1JfpTkoSSf7uo3Jnksydbutr6rJ8kXkowleSDJO+d6EpI0TPr5e9wHgPOran+SxcD3k3y7O/Zvq+pvXzX+g8C67vZu4LruXpI0CyZ9x10T9ne7i7tbvc4pFwFf7s77AbAsycqZtypJgj7XuJMsTLIV2ANsrqq7u0PXdMsh1yZZ0tVWAU/0nL6jq0mSZkFfwV1V41W1HlgNnJPkd4CrgTOBdwHLgX8/lSdOsiHJliRb9u7dO8W2JWl4TemqkqraB9wFXFBVu7rlkAPAl4BzumE7gTU9p63uaq9+rI1VNVpVoyMjI9PrXpKGUD9XlYwkWdZtnwh8APjJ4XXrJAEuBh7sTtkEfKS7uuRc4Nmq2jUn3UvSEOrnqpKVwE1JFjIR9LdU1e1JvpdkBAiwFfg33fg7gAuBMeBF4KOz37YkDa9Jg7uqHgDOPkL9/KOML+DKmbcmSToSPzkpSY0xuCWpMQa3JDXG4JakxhjcktQYg1uSGmNwS1JjDG5JaozBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSYwxuSWqMwS1JjTG4JakxBrckNcbglqTGGNyS1BiDW5Iak6oadA8keR54ZNB9zJFTgacG3cQcmK/zgvk7N+fVln9UVSNHOrDoWHdyFI9U1eigm5gLSbbMx7nN13nB/J2b85o/XCqRpMYY3JLUmOMluDcOuoE5NF/nNl/nBfN3bs5rnjgufjkpSerf8fKOW5LUp4EHd5ILkjySZCzJVYPuZ6qS3JBkT5IHe2rLk2xO8mh3f0pXT5IvdHN9IMk7B9f560uyJsldSR5O8lCSj3f1pueWZGmSe5L8qJvXp7v6GUnu7vr/epITuvqSbn+sO752kP1PJsnCJD9Mcnu3P1/mtT3Jj5NsTbKlqzX9WpyJgQZ3koXAXwEfBM4CLk1y1iB7moYbgQteVbsKuLOq1gF3dvswMc913W0DcN0x6nE6DgKfrKqzgHOBK7v/Nq3P7QBwflW9A1gPXJDkXOAvgGur6m3AM8AV3fgrgGe6+rXduOPZx4FtPfvzZV4Av1dV63su/Wv9tTh9VTWwG/Ae4Ds9+1cDVw+yp2nOYy3wYM/+I8DKbnslE9epA/x34NIjjTveb8BtwAfm09yAk4D7gXcz8QGORV395dcl8B3gPd32om5cBt37UeazmokAOx+4Hch8mFfX43bg1FfV5s1rcaq3QS+VrAKe6Nnf0dVat6KqdnXbu4EV3XaT8+1+jD4buJt5MLduOWErsAfYDPwU2FdVB7shvb2/PK/u+LPAm49tx337PPDvgEPd/puZH/MCKOC7Se5LsqGrNf9anK7j5ZOT81ZVVZJmL91J8gbgG8Anquq5JC8fa3VuVTUOrE+yDLgVOHPALc1Ykj8E9lTVfUnOG3Q/c+B9VbUzyVuAzUl+0nuw1dfidA36HfdOYE3P/uqu1ronk6wE6O73dPWm5ptkMROh/dWq+mZXnhdzA6iqfcBdTCwhLEty+I1Mb+8vz6s7/ibg6WPcaj/eC/zzJNuBm5lYLvkvtD8vAKpqZ3e/h4n/2Z7DPHotTtWgg/teYF33m+8TgEuATQPuaTZsAi7vti9nYn34cP0j3W+9zwWe7flR77iSibfW1wPbqupzPYeanluSke6dNklOZGLdfhsTAf6hbtir53V4vh8CvlfdwunxpKqurqrVVbWWiX9H36uqf0Xj8wJIcnKSNx7eBn4feJDGX4szMuhFduBC4B+YWGf8D4PuZxr9fw3YBbzExFraFUysFd4JPAr8HbC8GxsmrqL5KfBjYHTQ/b/OvN7HxLriA8DW7nZh63MD/inww25eDwL/sav/FnAPMAb8DbCkqy/t9se647816Dn0McfzgNvny7y6Ofyouz10OCdafy3O5OYnJyWpMYNeKpEkTZHBLUmNMbglqTEGtyQ1xuCWpMYY3JLUGINbkhpjcEtSY/4/1YKMefUVpcIAAAAASUVORK5CYII=\n",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"tags": [],
"needs_background": "light"
}
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "5stHkFq4UztI",
"outputId": "c7a6a422-905b-4510-e2f5-368ee8dc9439",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"# reproducible environment and action spaces, do not change lines 6-11 here (tools > settings > editor > show line numbers)\n",
"seed = 742\n",
"torch.manual_seed(seed)\n",
"env.seed(seed)\n",
"random.seed(seed)\n",
"np.random.seed(seed)\n",
"env.action_space.seed(seed)\n",
"\n",
"q = QNetwork(np.array(env.observation_space.shape).prod(), env.action_space.n)\n",
"q_target = QNetwork(np.array(env.observation_space.shape).prod(), env.action_space.n)\n",
"q_target.load_state_dict(q.state_dict())\n",
"memory = ReplayBuffer()\n",
"\n",
"score = 0.0\n",
"marking = []\n",
"optimizer = optim.Adam(q.parameters(), lr=learning_rate)\n",
"\n",
"for n_episode in range(1001):\n",
" epsilon = max(0.01, 0.08 - 0.01*(n_episode/200)) # linear annealing from 8% to 1%\n",
" s = env.reset()\n",
" done = False\n",
" score = 0.0\n",
"\n",
" while True:\n",
"\n",
" a = q.sample_action(torch.from_numpy(s).float().unsqueeze(0), epsilon)\n",
" s_prime, r, done, info = env.step(a)\n",
" done_mask = 0.0 if done else 1.0\n",
" memory.put((s,a,r/100.0,s_prime, done_mask))\n",
" s = s_prime\n",
"\n",
" score += r\n",
" if done:\n",
" break\n",
" \n",
" if memory.size()>2000:\n",
" train(q, q_target, memory, optimizer)\n",
"\n",
" # do not change lines 44-48 here, they are for marking the submission log\n",
" marking.append(score)\n",
" if n_episode%100 == 0:\n",
" print(\"marking, episode: {}, score: {:.1f}, mean_score: {:.2f}, std_score: {:.2f}\".format(\n",
" n_episode, score, np.array(marking).mean(), np.array(marking).std()))\n",
" marking = []\n",
"\n",
" # you can change this part, and print any data you like (so long as it doesn't start with \"marking\")\n",
" if n_episode%print_every==0 and n_episode!=0:\n",
" q_target.load_state_dict(q.state_dict())\n",
" print(\"episode: {}, score: {:.1f}, epsilon: {:.2f}\".format(n_episode, score, epsilon))"
],
"execution_count": 7,
"outputs": [
{
"output_type": "stream",
"text": [
"marking, episode: 0, score: 9.0, mean_score: 9.00, std_score: 0.00\n",
"episode: 5, score: 10.0, epsilon: 0.08\n",
"episode: 10, score: 9.0, epsilon: 0.08\n",
"episode: 15, score: 8.0, epsilon: 0.08\n",
"episode: 20, score: 9.0, epsilon: 0.08\n",
"episode: 25, score: 10.0, epsilon: 0.08\n",
"episode: 30, score: 10.0, epsilon: 0.08\n",
"episode: 35, score: 9.0, epsilon: 0.08\n",
"episode: 40, score: 8.0, epsilon: 0.08\n",
"episode: 45, score: 10.0, epsilon: 0.08\n",
"episode: 50, score: 8.0, epsilon: 0.08\n",
"episode: 55, score: 11.0, epsilon: 0.08\n",
"episode: 60, score: 10.0, epsilon: 0.08\n",
"episode: 65, score: 11.0, epsilon: 0.08\n",
"episode: 70, score: 8.0, epsilon: 0.08\n",
"episode: 75, score: 10.0, epsilon: 0.08\n",
"episode: 80, score: 9.0, epsilon: 0.08\n",
"episode: 85, score: 10.0, epsilon: 0.08\n",
"episode: 90, score: 9.0, epsilon: 0.08\n",
"episode: 95, score: 10.0, epsilon: 0.08\n",
"marking, episode: 100, score: 8.0, mean_score: 9.61, std_score: 1.05\n",
"episode: 100, score: 8.0, epsilon: 0.07\n",
"episode: 105, score: 10.0, epsilon: 0.07\n",
"episode: 110, score: 9.0, epsilon: 0.07\n",
"episode: 115, score: 12.0, epsilon: 0.07\n",
"episode: 120, score: 8.0, epsilon: 0.07\n",
"episode: 125, score: 11.0, epsilon: 0.07\n",
"episode: 130, score: 12.0, epsilon: 0.07\n",
"episode: 135, score: 10.0, epsilon: 0.07\n",
"episode: 140, score: 9.0, epsilon: 0.07\n",
"episode: 145, score: 10.0, epsilon: 0.07\n",
"episode: 150, score: 10.0, epsilon: 0.07\n",
"episode: 155, score: 8.0, epsilon: 0.07\n",
"episode: 160, score: 10.0, epsilon: 0.07\n",
"episode: 165, score: 10.0, epsilon: 0.07\n",
"episode: 170, score: 8.0, epsilon: 0.07\n",
"episode: 175, score: 12.0, epsilon: 0.07\n",
"episode: 180, score: 9.0, epsilon: 0.07\n",
"episode: 185, score: 9.0, epsilon: 0.07\n",
"episode: 190, score: 10.0, epsilon: 0.07\n",
"episode: 195, score: 10.0, epsilon: 0.07\n",
"marking, episode: 200, score: 10.0, mean_score: 9.56, std_score: 0.99\n",
"episode: 200, score: 10.0, epsilon: 0.07\n",
"episode: 205, score: 8.0, epsilon: 0.07\n",
"episode: 210, score: 11.0, epsilon: 0.07\n",
"episode: 215, score: 8.0, epsilon: 0.07\n",
"episode: 220, score: 9.0, epsilon: 0.07\n",
"episode: 225, score: 10.0, epsilon: 0.07\n",
"episode: 230, score: 18.0, epsilon: 0.07\n",
"episode: 235, score: 11.0, epsilon: 0.07\n",
"episode: 240, score: 11.0, epsilon: 0.07\n",
"episode: 245, score: 12.0, epsilon: 0.07\n",
"episode: 250, score: 10.0, epsilon: 0.07\n",
"episode: 255, score: 40.0, epsilon: 0.07\n",
"episode: 260, score: 31.0, epsilon: 0.07\n",
"episode: 265, score: 28.0, epsilon: 0.07\n",
"episode: 270, score: 36.0, epsilon: 0.07\n",
"episode: 275, score: 50.0, epsilon: 0.07\n",
"episode: 280, score: 136.0, epsilon: 0.07\n",
"episode: 285, score: 84.0, epsilon: 0.07\n",
"episode: 290, score: 200.0, epsilon: 0.07\n",
"episode: 295, score: 179.0, epsilon: 0.07\n",
"marking, episode: 300, score: 177.0, mean_score: 50.69, std_score: 62.06\n",
"episode: 300, score: 177.0, epsilon: 0.07\n",
"episode: 305, score: 113.0, epsilon: 0.06\n",
"episode: 310, score: 123.0, epsilon: 0.06\n",
"episode: 315, score: 164.0, epsilon: 0.06\n",
"episode: 320, score: 104.0, epsilon: 0.06\n",
"episode: 325, score: 186.0, epsilon: 0.06\n",
"episode: 330, score: 115.0, epsilon: 0.06\n",
"episode: 335, score: 134.0, epsilon: 0.06\n",
"episode: 340, score: 156.0, epsilon: 0.06\n",
"episode: 345, score: 200.0, epsilon: 0.06\n",
"episode: 350, score: 123.0, epsilon: 0.06\n",
"episode: 355, score: 200.0, epsilon: 0.06\n",
"episode: 360, score: 200.0, epsilon: 0.06\n",
"episode: 365, score: 200.0, epsilon: 0.06\n",
"episode: 370, score: 200.0, epsilon: 0.06\n",
"episode: 375, score: 200.0, epsilon: 0.06\n",
"episode: 380, score: 200.0, epsilon: 0.06\n",
"episode: 385, score: 170.0, epsilon: 0.06\n",
"episode: 390, score: 168.0, epsilon: 0.06\n",
"episode: 395, score: 173.0, epsilon: 0.06\n",
"marking, episode: 400, score: 179.0, mean_score: 167.21, std_score: 39.24\n",
"episode: 400, score: 179.0, epsilon: 0.06\n",
"episode: 405, score: 200.0, epsilon: 0.06\n",
"episode: 410, score: 200.0, epsilon: 0.06\n",
"episode: 415, score: 158.0, epsilon: 0.06\n",
"episode: 420, score: 200.0, epsilon: 0.06\n",
"episode: 425, score: 186.0, epsilon: 0.06\n",
"episode: 430, score: 200.0, epsilon: 0.06\n",
"episode: 435, score: 151.0, epsilon: 0.06\n",
"episode: 440, score: 165.0, epsilon: 0.06\n",
"episode: 445, score: 166.0, epsilon: 0.06\n",
"episode: 450, score: 175.0, epsilon: 0.06\n",
"episode: 455, score: 190.0, epsilon: 0.06\n",
"episode: 460, score: 191.0, epsilon: 0.06\n",
"episode: 465, score: 119.0, epsilon: 0.06\n",
"episode: 470, score: 200.0, epsilon: 0.06\n",
"episode: 475, score: 200.0, epsilon: 0.06\n",
"episode: 480, score: 200.0, epsilon: 0.06\n",
"episode: 485, score: 159.0, epsilon: 0.06\n",
"episode: 490, score: 133.0, epsilon: 0.06\n",
"episode: 495, score: 93.0, epsilon: 0.06\n",
"marking, episode: 500, score: 163.0, mean_score: 176.26, std_score: 25.89\n",
"episode: 500, score: 163.0, epsilon: 0.06\n",
"episode: 505, score: 154.0, epsilon: 0.05\n",
"episode: 510, score: 200.0, epsilon: 0.05\n",
"episode: 515, score: 200.0, epsilon: 0.05\n",
"episode: 520, score: 171.0, epsilon: 0.05\n",
"episode: 525, score: 182.0, epsilon: 0.05\n",
"episode: 530, score: 200.0, epsilon: 0.05\n",
"episode: 535, score: 200.0, epsilon: 0.05\n",
"episode: 540, score: 164.0, epsilon: 0.05\n",
"episode: 545, score: 160.0, epsilon: 0.05\n",
"episode: 550, score: 198.0, epsilon: 0.05\n",
"episode: 555, score: 176.0, epsilon: 0.05\n",
"episode: 560, score: 200.0, epsilon: 0.05\n",
"episode: 565, score: 178.0, epsilon: 0.05\n",
"episode: 570, score: 200.0, epsilon: 0.05\n",
"episode: 575, score: 197.0, epsilon: 0.05\n",
"episode: 580, score: 200.0, epsilon: 0.05\n",
"episode: 585, score: 164.0, epsilon: 0.05\n",
"episode: 590, score: 166.0, epsilon: 0.05\n",
"episode: 595, score: 126.0, epsilon: 0.05\n",
"marking, episode: 600, score: 151.0, mean_score: 174.15, std_score: 22.91\n",
"episode: 600, score: 151.0, epsilon: 0.05\n",
"episode: 605, score: 200.0, epsilon: 0.05\n",
"episode: 610, score: 150.0, epsilon: 0.05\n",
"episode: 615, score: 174.0, epsilon: 0.05\n",
"episode: 620, score: 147.0, epsilon: 0.05\n",
"episode: 625, score: 191.0, epsilon: 0.05\n",
"episode: 630, score: 161.0, epsilon: 0.05\n",
"episode: 635, score: 160.0, epsilon: 0.05\n",
"episode: 640, score: 169.0, epsilon: 0.05\n",
"episode: 645, score: 162.0, epsilon: 0.05\n",
"episode: 650, score: 170.0, epsilon: 0.05\n",
"episode: 655, score: 189.0, epsilon: 0.05\n",
"episode: 660, score: 151.0, epsilon: 0.05\n",
"episode: 665, score: 154.0, epsilon: 0.05\n",
"episode: 670, score: 166.0, epsilon: 0.05\n",
"episode: 675, score: 149.0, epsilon: 0.05\n",
"episode: 680, score: 166.0, epsilon: 0.05\n",
"episode: 685, score: 183.0, epsilon: 0.05\n",
"episode: 690, score: 193.0, epsilon: 0.05\n",
"episode: 695, score: 200.0, epsilon: 0.05\n",
"marking, episode: 700, score: 160.0, mean_score: 169.23, std_score: 22.15\n",
"episode: 700, score: 160.0, epsilon: 0.04\n",
"episode: 705, score: 200.0, epsilon: 0.04\n",
"episode: 710, score: 197.0, epsilon: 0.04\n",
"episode: 715, score: 153.0, epsilon: 0.04\n",
"episode: 720, score: 200.0, epsilon: 0.04\n",
"episode: 725, score: 149.0, epsilon: 0.04\n",
"episode: 730, score: 196.0, epsilon: 0.04\n",
"episode: 735, score: 197.0, epsilon: 0.04\n",
"episode: 740, score: 200.0, epsilon: 0.04\n",
"episode: 745, score: 155.0, epsilon: 0.04\n",
"episode: 750, score: 200.0, epsilon: 0.04\n",
"episode: 755, score: 200.0, epsilon: 0.04\n",
"episode: 760, score: 190.0, epsilon: 0.04\n",
"episode: 765, score: 173.0, epsilon: 0.04\n",
"episode: 770, score: 200.0, epsilon: 0.04\n",
"episode: 775, score: 200.0, epsilon: 0.04\n",
"episode: 780, score: 200.0, epsilon: 0.04\n",
"episode: 785, score: 200.0, epsilon: 0.04\n",
"episode: 790, score: 200.0, epsilon: 0.04\n",
"episode: 795, score: 200.0, epsilon: 0.04\n",
"marking, episode: 800, score: 168.0, mean_score: 181.54, std_score: 25.20\n",
"episode: 800, score: 168.0, epsilon: 0.04\n",
"episode: 805, score: 200.0, epsilon: 0.04\n",
"episode: 810, score: 200.0, epsilon: 0.04\n",
"episode: 815, score: 200.0, epsilon: 0.04\n",
"episode: 820, score: 200.0, epsilon: 0.04\n",
"episode: 825, score: 147.0, epsilon: 0.04\n",
"episode: 830, score: 198.0, epsilon: 0.04\n",
"episode: 835, score: 200.0, epsilon: 0.04\n",
"episode: 840, score: 200.0, epsilon: 0.04\n",
"episode: 845, score: 200.0, epsilon: 0.04\n",
"episode: 850, score: 200.0, epsilon: 0.04\n",
"episode: 855, score: 200.0, epsilon: 0.04\n",
"episode: 860, score: 200.0, epsilon: 0.04\n",
"episode: 865, score: 200.0, epsilon: 0.04\n",
"episode: 870, score: 200.0, epsilon: 0.04\n",
"episode: 875, score: 154.0, epsilon: 0.04\n",
"episode: 880, score: 189.0, epsilon: 0.04\n",
"episode: 885, score: 200.0, epsilon: 0.04\n",
"episode: 890, score: 192.0, epsilon: 0.04\n",
"episode: 895, score: 200.0, epsilon: 0.04\n",
"marking, episode: 900, score: 200.0, mean_score: 188.28, std_score: 20.29\n",
"episode: 900, score: 200.0, epsilon: 0.04\n",
"episode: 905, score: 192.0, epsilon: 0.03\n",
"episode: 910, score: 190.0, epsilon: 0.03\n",
"episode: 915, score: 200.0, epsilon: 0.03\n",
"episode: 920, score: 200.0, epsilon: 0.03\n",
"episode: 925, score: 158.0, epsilon: 0.03\n",
"episode: 930, score: 179.0, epsilon: 0.03\n",
"episode: 935, score: 121.0, epsilon: 0.03\n",
"episode: 940, score: 116.0, epsilon: 0.03\n",
"episode: 945, score: 180.0, epsilon: 0.03\n",
"episode: 950, score: 146.0, epsilon: 0.03\n",
"episode: 955, score: 188.0, epsilon: 0.03\n",
"episode: 960, score: 155.0, epsilon: 0.03\n",
"episode: 965, score: 139.0, epsilon: 0.03\n",
"episode: 970, score: 172.0, epsilon: 0.03\n",
"episode: 975, score: 200.0, epsilon: 0.03\n",
"episode: 980, score: 185.0, epsilon: 0.03\n",
"episode: 985, score: 177.0, epsilon: 0.03\n",
"episode: 990, score: 140.0, epsilon: 0.03\n",
"episode: 995, score: 19.0, epsilon: 0.03\n",
"marking, episode: 1000, score: 140.0, mean_score: 175.90, std_score: 28.52\n",
"episode: 1000, score: 140.0, epsilon: 0.03\n"
],
"name": "stdout"
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "iMGV0rM4To6A"
},
"source": [
"# Custom environment\n",
"# \"Play your cards right\"\n",
"drawn_count = 7\n",
"highest_card = 9\n",
"# Construct our action space\n",
"obv_space = gym.spaces.MultiBinary([drawn_count, highest_card+1])\n",
"act_space = gym.spaces.Discrete(2)\n",
"\n",
"class Brucey(gym.Env):\n",
" reward_range = (0,1)\n",
" action_space = act_space\n",
" observation_space = obv_space\n",
" _max_episode_steps = drawn_count-1\n",
" deck = [x for x in range(1,highest_card+1)]\n",
" cards = \"-123456789\"\n",
" guess='LH'\n",
"\n",
" def __init__(self):\n",
" super().__init__()\n",
" \n",
" def reset(self):\n",
" self.steps=1\n",
" self.hidden_cards = random.sample(self.deck, drawn_count)\n",
" self.guesses=[]\n",
" obs = [0] * drawn_count\n",
" obs[:self.steps] = self.hidden_cards[:self.steps]\n",
" obs = np.array(obs)\n",
" # Fancy onehot encoding\n",
" onehot = np.zeros((obs.size, highest_card+1))\n",
" onehot[np.arange(obs.size),obs] = 1\n",
" return onehot\n",
"\n",
" def step(self, action):\n",
" self.guesses.append(action)\n",
" info = dict()\n",
" cardhigher = self.hidden_cards[self.steps] > self.hidden_cards[self.steps-1] \n",
" self.steps += 1 \n",
" obs = [0] * drawn_count\n",
" obs[:self.steps] = self.hidden_cards[:self.steps]\n",
" obs = np.array(obs)\n",
" # Fancy onehot encoding\n",
" onehot = np.zeros((obs.size, highest_card+1))\n",
" onehot[np.arange(obs.size),obs] = 1\n",
" if cardhigher == action:\n",
" reward = 1\n",
" if self.steps != len(self.hidden_cards):\n",
" done = False\n",
" else:\n",
" done = True\n",
" else:\n",
" reward = 0\n",
" done = True\n",
" \n",
"\n",
" return onehot, reward, done, info\n",
"\n",
" def render(self):\n",
" obs = [0] * drawn_count\n",
" obs[:self.steps] = self.hidden_cards[:self.steps]\n",
" rend = \"{}\\n{}\\n{}\".format(\n",
" ''.join(self.cards[i] for i in obs),\n",
" ''.join(self.guess[i] for i in self.guesses),\n",
" ''.join(self.cards[i] for i in self.hidden_cards),\n",
" )\n",
" print(rend)\n",
" return rend\n",
" \n",
"\n"
],
"execution_count": 14,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "1wG7Ya0zXCFO",
"outputId": "3e8de38b-804a-4f29-f905-f377df01d97e",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 86
}
},
"source": [
"env = Brucey()\n",
"env.reset()\n",
"env.render()\n"
],
"execution_count": 16,
"outputs": [
{
"output_type": "stream",
"text": [
"6------\n",
"\n",
"6327198\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'6------\\n\\n6327198'"
]
},
"metadata": {
"tags": []
},
"execution_count": 16
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "knroQcFT0Ldl",
"outputId": "3ff036bf-f621-44d5-8599-297bda1ae0d5",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 136
}
},
"source": [
"env.step(0)"
],
"execution_count": 17,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],\n",
" [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]), 1, False, {})"
]
},
"metadata": {
"tags": []
},
"execution_count": 17
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "xD1Hnh-S0PwF",
"outputId": "d7d96d74-83da-4b9d-eab9-2fb17d00310b",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 136
}
},
"source": [
"env.step(1)"
],
"execution_count": 18,
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
"(array([[0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],\n",
" [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],\n",
" [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.],\n",
" [1., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]), 0, True, {})"
]
},
"metadata": {
"tags": []
},
"execution_count": 18
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "JSrSc_qa0ReY",
"outputId": "75b685b1-78df-4144-9a5a-88ad2cd8fe23",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 86
}
},
"source": [
"env.render()"
],
"execution_count": 19,
"outputs": [
{
"output_type": "stream",
"text": [
"632----\n",
"LH\n",
"6327198\n"
],
"name": "stdout"
},
{
"output_type": "execute_result",
"data": {
"application/vnd.google.colaboratory.intrinsic+json": {
"type": "string"
},
"text/plain": [
"'632----\\nLH\\n6327198'"
]
},
"metadata": {
"tags": []
},
"execution_count": 19
}
]
},
{
"cell_type": "code",
"metadata": {
"id": "zvjqta75drdA",
"outputId": "57808a42-ea6f-4e8f-a35e-6557cb03ee75",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 1000
}
},
"source": [
"seed = 742\n",
"torch.manual_seed(seed)\n",
"env.seed(seed)\n",
"random.seed(seed)\n",
"np.random.seed(seed)\n",
"env.action_space.seed(seed)\n",
"\n",
"q = QNetwork(drawn_count * (highest_card+1), 2)\n",
"q_target = QNetwork(drawn_count * (highest_card+1), 2)\n",
"q_target.load_state_dict(q.state_dict())\n",
"memory = ReplayBuffer()\n",
"\n",
"score = 0.0\n",
"scores = []\n",
"marking = []\n",
"optimizer = optim.Adam(q.parameters(), lr=learning_rate)\n",
"\n",
"for n_episode in range(10001):\n",
" epsilon = 0.08\n",
" s = env.reset()\n",
" done = False\n",
" score = 0.0\n",
"\n",
" while True:\n",
"\n",
" a = q.sample_action(torch.from_numpy(s).float().unsqueeze(0), epsilon)\n",
" s_prime, r, done, info = env.step(a)\n",
" done_mask = 0.0 if done else 1.0\n",
" memory.put((s,a,r/100.0,s_prime, done_mask))\n",
" s = s_prime\n",
"\n",
" score += r\n",
" if done:\n",
" break\n",
" \n",
" if memory.size()>2000:\n",
" train(q, q_target, memory, optimizer)\n",
" scores.append(score)\n",
" # do not change lines 44-48 here, they are for marking the submission log\n",
" marking.append(score)\n",
" if n_episode%100 == 0:\n",
" print(\"marking, episode: {}, score: {:.1f}, mean_score: {:.2f}, std_score: {:.2f}\".format(\n",
" n_episode, score, np.array(marking).mean(), np.array(marking).std()))\n",
" marking = []\n",
"\n",
" # you can change this part, and print any data you like (so long as it doesn't start with \"marking\")\n",
" if n_episode%print_every==0 and n_episode!=0:\n",
" q_target.load_state_dict(q.state_dict())\n",
" print(\"episode: {}, score: {:.1f}, epsilon: {:.2f}\".format(n_episode, score, epsilon))"
],
"execution_count": 20,
"outputs": [
{
"output_type": "stream",
"text": [
"marking, episode: 0, score: 2.0, mean_score: 2.00, std_score: 0.00\n",
"episode: 5, score: 0.0, epsilon: 0.08\n",
"episode: 10, score: 0.0, epsilon: 0.08\n",
"episode: 15, score: 0.0, epsilon: 0.08\n",
"episode: 20, score: 0.0, epsilon: 0.08\n",
"episode: 25, score: 1.0, epsilon: 0.08\n",
"episode: 30, score: 0.0, epsilon: 0.08\n",
"episode: 35, score: 0.0, epsilon: 0.08\n",
"episode: 40, score: 1.0, epsilon: 0.08\n",
"episode: 45, score: 3.0, epsilon: 0.08\n",
"episode: 50, score: 1.0, epsilon: 0.08\n",
"episode: 55, score: 1.0, epsilon: 0.08\n",
"episode: 60, score: 0.0, epsilon: 0.08\n",
"episode: 65, score: 0.0, epsilon: 0.08\n",
"episode: 70, score: 0.0, epsilon: 0.08\n",
"episode: 75, score: 1.0, epsilon: 0.08\n",
"episode: 80, score: 2.0, epsilon: 0.08\n",
"episode: 85, score: 0.0, epsilon: 0.08\n",
"episode: 90, score: 0.0, epsilon: 0.08\n",
"episode: 95, score: 1.0, epsilon: 0.08\n",
"marking, episode: 100, score: 2.0, mean_score: 0.81, std_score: 1.09\n",
"episode: 100, score: 2.0, epsilon: 0.08\n",
"episode: 105, score: 1.0, epsilon: 0.08\n",
"episode: 110, score: 0.0, epsilon: 0.08\n",
"episode: 115, score: 1.0, epsilon: 0.08\n",
"episode: 120, score: 2.0, epsilon: 0.08\n",
"episode: 125, score: 0.0, epsilon: 0.08\n",
"episode: 130, score: 2.0, epsilon: 0.08\n",
"episode: 135, score: 0.0, epsilon: 0.08\n",
"episode: 140, score: 0.0, epsilon: 0.08\n",
"episode: 145, score: 0.0, epsilon: 0.08\n",
"episode: 150, score: 0.0, epsilon: 0.08\n",
"episode: 155, score: 0.0, epsilon: 0.08\n",
"episode: 160, score: 0.0, epsilon: 0.08\n",
"episode: 165, score: 2.0, epsilon: 0.08\n",
"episode: 170, score: 1.0, epsilon: 0.08\n",
"episode: 175, score: 1.0, epsilon: 0.08\n",
"episode: 180, score: 0.0, epsilon: 0.08\n",
"episode: 185, score: 2.0, epsilon: 0.08\n",
"episode: 190, score: 4.0, epsilon: 0.08\n",
"episode: 195, score: 1.0, epsilon: 0.08\n",
"marking, episode: 200, score: 0.0, mean_score: 0.79, std_score: 1.00\n",
"episode: 200, score: 0.0, epsilon: 0.08\n",
"episode: 205, score: 0.0, epsilon: 0.08\n",
"episode: 210, score: 0.0, epsilon: 0.08\n",
"episode: 215, score: 0.0, epsilon: 0.08\n",
"episode: 220, score: 4.0, epsilon: 0.08\n",
"episode: 225, score: 1.0, epsilon: 0.08\n",
"episode: 230, score: 0.0, epsilon: 0.08\n",
"episode: 235, score: 2.0, epsilon: 0.08\n",
"episode: 240, score: 0.0, epsilon: 0.08\n",
"episode: 245, score: 1.0, epsilon: 0.08\n",
"episode: 250, score: 0.0, epsilon: 0.08\n",
"episode: 255, score: 0.0, epsilon: 0.08\n",
"episode: 260, score: 2.0, epsilon: 0.08\n",
"episode: 265, score: 2.0, epsilon: 0.08\n",
"episode: 270, score: 0.0, epsilon: 0.08\n",
"episode: 275, score: 0.0, epsilon: 0.08\n",
"episode: 280, score: 1.0, epsilon: 0.08\n",
"episode: 285, score: 0.0, epsilon: 0.08\n",
"episode: 290, score: 0.0, epsilon: 0.08\n",
"episode: 295, score: 0.0, epsilon: 0.08\n",
"marking, episode: 300, score: 1.0, mean_score: 0.71, std_score: 1.03\n",
"episode: 300, score: 1.0, epsilon: 0.08\n",
"episode: 305, score: 3.0, epsilon: 0.08\n",
"episode: 310, score: 0.0, epsilon: 0.08\n",
"episode: 315, score: 0.0, epsilon: 0.08\n",
"episode: 320, score: 1.0, epsilon: 0.08\n",
"episode: 325, score: 2.0, epsilon: 0.08\n",
"episode: 330, score: 0.0, epsilon: 0.08\n",
"episode: 335, score: 0.0, epsilon: 0.08\n",
"episode: 340, score: 0.0, epsilon: 0.08\n",
"episode: 345, score: 1.0, epsilon: 0.08\n",
"episode: 350, score: 4.0, epsilon: 0.08\n",
"episode: 355, score: 2.0, epsilon: 0.08\n",
"episode: 360, score: 2.0, epsilon: 0.08\n",
"episode: 365, score: 1.0, epsilon: 0.08\n",
"episode: 370, score: 4.0, epsilon: 0.08\n",
"episode: 375, score: 0.0, epsilon: 0.08\n",
"episode: 380, score: 1.0, epsilon: 0.08\n",
"episode: 385, score: 0.0, epsilon: 0.08\n",
"episode: 390, score: 0.0, epsilon: 0.08\n",
"episode: 395, score: 0.0, epsilon: 0.08\n",
"marking, episode: 400, score: 0.0, mean_score: 0.88, std_score: 1.12\n",
"episode: 400, score: 0.0, epsilon: 0.08\n",
"episode: 405, score: 1.0, epsilon: 0.08\n",
"episode: 410, score: 1.0, epsilon: 0.08\n",
"episode: 415, score: 0.0, epsilon: 0.08\n",
"episode: 420, score: 0.0, epsilon: 0.08\n",
"episode: 425, score: 0.0, epsilon: 0.08\n",
"episode: 430, score: 0.0, epsilon: 0.08\n",
"episode: 435, score: 0.0, epsilon: 0.08\n",
"episode: 440, score: 1.0, epsilon: 0.08\n",
"episode: 445, score: 3.0, epsilon: 0.08\n",
"episode: 450, score: 2.0, epsilon: 0.08\n",
"episode: 455, score: 3.0, epsilon: 0.08\n",
"episode: 460, score: 0.0, epsilon: 0.08\n",
"episode: 465, score: 1.0, epsilon: 0.08\n",
"episode: 470, score: 0.0, epsilon: 0.08\n",
"episode: 475, score: 0.0, epsilon: 0.08\n",
"episode: 480, score: 0.0, epsilon: 0.08\n",
"episode: 485, score: 1.0, epsilon: 0.08\n",
"episode: 490, score: 1.0, epsilon: 0.08\n",
"episode: 495, score: 0.0, epsilon: 0.08\n",
"marking, episode: 500, score: 1.0, mean_score: 0.69, std_score: 0.98\n",
"episode: 500, score: 1.0, epsilon: 0.08\n",
"episode: 505, score: 0.0, epsilon: 0.08\n",
"episode: 510, score: 1.0, epsilon: 0.08\n",
"episode: 515, score: 2.0, epsilon: 0.08\n",
"episode: 520, score: 0.0, epsilon: 0.08\n",
"episode: 525, score: 2.0, epsilon: 0.08\n",
"episode: 530, score: 2.0, epsilon: 0.08\n",
"episode: 535, score: 0.0, epsilon: 0.08\n",
"episode: 540, score: 2.0, epsilon: 0.08\n",
"episode: 545, score: 1.0, epsilon: 0.08\n",
"episode: 550, score: 0.0, epsilon: 0.08\n",
"episode: 555, score: 0.0, epsilon: 0.08\n",
"episode: 560, score: 0.0, epsilon: 0.08\n",
"episode: 565, score: 0.0, epsilon: 0.08\n",
"episode: 570, score: 1.0, epsilon: 0.08\n",
"episode: 575, score: 1.0, epsilon: 0.08\n",
"episode: 580, score: 1.0, epsilon: 0.08\n",
"episode: 585, score: 3.0, epsilon: 0.08\n",
"episode: 590, score: 0.0, epsilon: 0.08\n",
"episode: 595, score: 0.0, epsilon: 0.08\n",
"marking, episode: 600, score: 1.0, mean_score: 0.73, std_score: 0.82\n",
"episode: 600, score: 1.0, epsilon: 0.08\n",
"episode: 605, score: 0.0, epsilon: 0.08\n",
"episode: 610, score: 0.0, epsilon: 0.08\n",
"episode: 615, score: 1.0, epsilon: 0.08\n",
"episode: 620, score: 1.0, epsilon: 0.08\n",
"episode: 625, score: 0.0, epsilon: 0.08\n",
"episode: 630, score: 1.0, epsilon: 0.08\n",
"episode: 635, score: 1.0, epsilon: 0.08\n",
"episode: 640, score: 0.0, epsilon: 0.08\n",
"episode: 645, score: 0.0, epsilon: 0.08\n",
"episode: 650, score: 0.0, epsilon: 0.08\n",
"episode: 655, score: 1.0, epsilon: 0.08\n",
"episode: 660, score: 1.0, epsilon: 0.08\n",
"episode: 665, score: 0.0, epsilon: 0.08\n",
"episode: 670, score: 1.0, epsilon: 0.08\n",
"episode: 675, score: 0.0, epsilon: 0.08\n",
"episode: 680, score: 1.0, epsilon: 0.08\n",
"episode: 685, score: 2.0, epsilon: 0.08\n",
"episode: 690, score: 2.0, epsilon: 0.08\n",
"episode: 695, score: 1.0, epsilon: 0.08\n",
"marking, episode: 700, score: 2.0, mean_score: 0.77, std_score: 0.96\n",
"episode: 700, score: 2.0, epsilon: 0.08\n",
"episode: 705, score: 0.0, epsilon: 0.08\n",
"episode: 710, score: 3.0, epsilon: 0.08\n",
"episode: 715, score: 0.0, epsilon: 0.08\n",
"episode: 720, score: 0.0, epsilon: 0.08\n",
"episode: 725, score: 0.0, epsilon: 0.08\n",
"episode: 730, score: 0.0, epsilon: 0.08\n",
"episode: 735, score: 2.0, epsilon: 0.08\n",
"episode: 740, score: 1.0, epsilon: 0.08\n",
"episode: 745, score: 0.0, epsilon: 0.08\n",
"episode: 750, score: 1.0, epsilon: 0.08\n",
"episode: 755, score: 0.0, epsilon: 0.08\n",
"episode: 760, score: 0.0, epsilon: 0.08\n",
"episode: 765, score: 1.0, epsilon: 0.08\n",
"episode: 770, score: 1.0, epsilon: 0.08\n",
"episode: 775, score: 0.0, epsilon: 0.08\n",
"episode: 780, score: 1.0, epsilon: 0.08\n",
"episode: 785, score: 3.0, epsilon: 0.08\n",
"episode: 790, score: 0.0, epsilon: 0.08\n",
"episode: 795, score: 0.0, epsilon: 0.08\n",
"marking, episode: 800, score: 3.0, mean_score: 0.76, std_score: 0.97\n",
"episode: 800, score: 3.0, epsilon: 0.08\n",
"episode: 805, score: 1.0, epsilon: 0.08\n",
"episode: 810, score: 1.0, epsilon: 0.08\n",
"episode: 815, score: 1.0, epsilon: 0.08\n",
"episode: 820, score: 0.0, epsilon: 0.08\n",
"episode: 825, score: 3.0, epsilon: 0.08\n",
"episode: 830, score: 1.0, epsilon: 0.08\n",
"episode: 835, score: 0.0, epsilon: 0.08\n",
"episode: 840, score: 0.0, epsilon: 0.08\n",
"episode: 845, score: 2.0, epsilon: 0.08\n",
"episode: 850, score: 0.0, epsilon: 0.08\n",
"episode: 855, score: 0.0, epsilon: 0.08\n",
"episode: 860, score: 1.0, epsilon: 0.08\n",
"episode: 865, score: 0.0, epsilon: 0.08\n",
"episode: 870, score: 0.0, epsilon: 0.08\n",
"episode: 875, score: 2.0, epsilon: 0.08\n",
"episode: 880, score: 0.0, epsilon: 0.08\n",
"episode: 885, score: 1.0, epsilon: 0.08\n",
"episode: 890, score: 2.0, epsilon: 0.08\n",
"episode: 895, score: 0.0, epsilon: 0.08\n",
"marking, episode: 900, score: 0.0, mean_score: 0.71, std_score: 0.84\n",
"episode: 900, score: 0.0, epsilon: 0.08\n",
"episode: 905, score: 1.0, epsilon: 0.08\n",
"episode: 910, score: 1.0, epsilon: 0.08\n",
"episode: 915, score: 1.0, epsilon: 0.08\n",
"episode: 920, score: 2.0, epsilon: 0.08\n",
"episode: 925, score: 1.0, epsilon: 0.08\n",
"episode: 930, score: 1.0, epsilon: 0.08\n",
"episode: 935, score: 0.0, epsilon: 0.08\n",
"episode: 940, score: 1.0, epsilon: 0.08\n",
"episode: 945, score: 0.0, epsilon: 0.08\n",
"episode: 950, score: 1.0, epsilon: 0.08\n",
"episode: 955, score: 1.0, epsilon: 0.08\n",
"episode: 960, score: 0.0, epsilon: 0.08\n",
"episode: 965, score: 2.0, epsilon: 0.08\n",
"episode: 970, score: 1.0, epsilon: 0.08\n",
"episode: 975, score: 0.0, epsilon: 0.08\n",
"episode: 980, score: 1.0, epsilon: 0.08\n",
"episode: 985, score: 2.0, epsilon: 0.08\n",
"episode: 990, score: 1.0, epsilon: 0.08\n",
"episode: 995, score: 1.0, epsilon: 0.08\n",
"marking, episode: 1000, score: 0.0, mean_score: 0.90, std_score: 0.90\n",
"episode: 1000, score: 0.0, epsilon: 0.08\n",
"episode: 1005, score: 0.0, epsilon: 0.08\n",
"episode: 1010, score: 2.0, epsilon: 0.08\n",
"episode: 1015, score: 0.0, epsilon: 0.08\n",
"episode: 1020, score: 0.0, epsilon: 0.08\n",
"episode: 1025, score: 2.0, epsilon: 0.08\n",
"episode: 1030, score: 0.0, epsilon: 0.08\n",
"episode: 1035, score: 2.0, epsilon: 0.08\n",
"episode: 1040, score: 1.0, epsilon: 0.08\n",
"episode: 1045, score: 5.0, epsilon: 0.08\n",
"episode: 1050, score: 0.0, epsilon: 0.08\n",
"episode: 1055, score: 2.0, epsilon: 0.08\n",
"episode: 1060, score: 3.0, epsilon: 0.08\n",
"episode: 1065, score: 0.0, epsilon: 0.08\n",
"episode: 1070, score: 0.0, epsilon: 0.08\n",
"episode: 1075, score: 1.0, epsilon: 0.08\n",
"episode: 1080, score: 3.0, epsilon: 0.08\n",
"episode: 1085, score: 0.0, epsilon: 0.08\n",
"episode: 1090, score: 0.0, epsilon: 0.08\n",
"episode: 1095, score: 2.0, epsilon: 0.08\n",
"marking, episode: 1100, score: 0.0, mean_score: 0.93, std_score: 1.07\n",
"episode: 1100, score: 0.0, epsilon: 0.08\n",
"episode: 1105, score: 2.0, epsilon: 0.08\n",
"episode: 1110, score: 5.0, epsilon: 0.08\n",
"episode: 1115, score: 0.0, epsilon: 0.08\n",
"episode: 1120, score: 0.0, epsilon: 0.08\n",
"episode: 1125, score: 0.0, epsilon: 0.08\n",
"episode: 1130, score: 3.0, epsilon: 0.08\n",
"episode: 1135, score: 0.0, epsilon: 0.08\n",
"episode: 1140, score: 0.0, epsilon: 0.08\n",
"episode: 1145, score: 2.0, epsilon: 0.08\n",
"episode: 1150, score: 6.0, epsilon: 0.08\n",
"episode: 1155, score: 2.0, epsilon: 0.08\n",
"episode: 1160, score: 2.0, epsilon: 0.08\n",
"episode: 1165, score: 2.0, epsilon: 0.08\n",
"episode: 1170, score: 1.0, epsilon: 0.08\n",
"episode: 1175, score: 2.0, epsilon: 0.08\n",
"episode: 1180, score: 5.0, epsilon: 0.08\n",
"episode: 1185, score: 1.0, epsilon: 0.08\n",
"episode: 1190, score: 1.0, epsilon: 0.08\n",
"episode: 1195, score: 0.0, epsilon: 0.08\n",
"marking, episode: 1200, score: 3.0, mean_score: 1.44, std_score: 1.68\n",
"episode: 1200, score: 3.0, epsilon: 0.08\n",
"episode: 1205, score: 2.0, epsilon: 0.08\n",
"episode: 1210, score: 0.0, epsilon: 0.08\n",
"episode: 1215, score: 0.0, epsilon: 0.08\n",
"episode: 1220, score: 2.0, epsilon: 0.08\n",
"episode: 1225, score: 1.0, epsilon: 0.08\n",
"episode: 1230, score: 2.0, epsilon: 0.08\n",
"episode: 1235, score: 0.0, epsilon: 0.08\n",
"episode: 1240, score: 1.0, epsilon: 0.08\n",
"episode: 1245, score: 3.0, epsilon: 0.08\n",
"episode: 1250, score: 6.0, epsilon: 0.08\n",
"episode: 1255, score: 2.0, epsilon: 0.08\n",
"episode: 1260, score: 0.0, epsilon: 0.08\n",
"episode: 1265, score: 1.0, epsilon: 0.08\n",
"episode: 1270, score: 4.0, epsilon: 0.08\n",
"episode: 1275, score: 1.0, epsilon: 0.08\n",
"episode: 1280, score: 3.0, epsilon: 0.08\n",
"episode: 1285, score: 1.0, epsilon: 0.08\n",
"episode: 1290, score: 0.0, epsilon: 0.08\n",
"episode: 1295, score: 0.0, epsilon: 0.08\n",
"marking, episode: 1300, score: 6.0, mean_score: 1.66, std_score: 1.77\n",
"episode: 1300, score: 6.0, epsilon: 0.08\n",
"episode: 1305, score: 1.0, epsilon: 0.08\n",
"episode: 1310, score: 3.0, epsilon: 0.08\n",
"episode: 1315, score: 2.0, epsilon: 0.08\n",
"episode: 1320, score: 1.0, epsilon: 0.08\n",
"episode: 1325, score: 0.0, epsilon: 0.08\n",
"episode: 1330, score: 2.0, epsilon: 0.08\n",
"episode: 1335, score: 3.0, epsilon: 0.08\n",
"episode: 1340, score: 1.0, epsilon: 0.08\n",
"episode: 1345, score: 1.0, epsilon: 0.08\n",
"episode: 1350, score: 5.0, epsilon: 0.08\n",
"episode: 1355, score: 0.0, epsilon: 0.08\n",
"episode: 1360, score: 3.0, epsilon: 0.08\n",
"episode: 1365, score: 0.0, epsilon: 0.08\n",
"episode: 1370, score: 1.0, epsilon: 0.08\n",
"episode: 1375, score: 3.0, epsilon: 0.08\n",
"episode: 1380, score: 5.0, epsilon: 0.08\n",
"episode: 1385, score: 0.0, epsilon: 0.08\n",
"episode: 1390, score: 1.0, epsilon: 0.08\n",
"episode: 1395, score: 0.0, epsilon: 0.08\n",
"marking, episode: 1400, score: 1.0, mean_score: 1.87, std_score: 1.81\n",
"episode: 1400, score: 1.0, epsilon: 0.08\n",
"episode: 1405, score: 3.0, epsilon: 0.08\n",
"episode: 1410, score: 3.0, epsilon: 0.08\n",
"episode: 1415, score: 0.0, epsilon: 0.08\n",
"episode: 1420, score: 5.0, epsilon: 0.08\n",
"episode: 1425, score: 0.0, epsilon: 0.08\n",
"episode: 1430, score: 5.0, epsilon: 0.08\n",
"episode: 1435, score: 2.0, epsilon: 0.08\n",
"episode: 1440, score: 0.0, epsilon: 0.08\n",
"episode: 1445, score: 4.0, epsilon: 0.08\n",
"episode: 1450, score: 0.0, epsilon: 0.08\n",
"episode: 1455, score: 5.0, epsilon: 0.08\n",
"episode: 1460, score: 2.0, epsilon: 0.08\n",
"episode: 1465, score: 0.0, epsilon: 0.08\n",
"episode: 1470, score: 4.0, epsilon: 0.08\n",
"episode: 1475, score: 4.0, epsilon: 0.08\n",
"episode: 1480, score: 2.0, epsilon: 0.08\n",
"episode: 1485, score: 2.0, epsilon: 0.08\n",
"episode: 1490, score: 0.0, epsilon: 0.08\n",
"episode: 1495, score: 1.0, epsilon: 0.08\n",
"marking, episode: 1500, score: 1.0, mean_score: 2.37, std_score: 2.00\n",
"episode: 1500, score: 1.0, epsilon: 0.08\n",
"episode: 1505, score: 5.0, epsilon: 0.08\n",
"episode: 1510, score: 0.0, epsilon: 0.08\n",
"episode: 1515, score: 1.0, epsilon: 0.08\n",
"episode: 1520, score: 2.0, epsilon: 0.08\n",
"episode: 1525, score: 3.0, epsilon: 0.08\n",
"episode: 1530, score: 1.0, epsilon: 0.08\n",
"episode: 1535, score: 1.0, epsilon: 0.08\n",
"episode: 1540, score: 1.0, epsilon: 0.08\n",
"episode: 1545, score: 4.0, epsilon: 0.08\n",
"episode: 1550, score: 1.0, epsilon: 0.08\n",
"episode: 1555, score: 1.0, epsilon: 0.08\n",
"episode: 1560, score: 5.0, epsilon: 0.08\n",
"episode: 1565, score: 0.0, epsilon: 0.08\n",
"episode: 1570, score: 3.0, epsilon: 0.08\n",
"episode: 1575, score: 1.0, epsilon: 0.08\n",
"episode: 1580, score: 1.0, epsilon: 0.08\n",
"episode: 1585, score: 2.0, epsilon: 0.08\n",
"episode: 1590, score: 2.0, epsilon: 0.08\n",
"episode: 1595, score: 6.0, epsilon: 0.08\n",
"marking, episode: 1600, score: 1.0, mean_score: 2.13, std_score: 1.86\n",
"episode: 1600, score: 1.0, epsilon: 0.08\n",
"episode: 1605, score: 5.0, epsilon: 0.08\n",
"episode: 1610, score: 6.0, epsilon: 0.08\n",
"episode: 1615, score: 2.0, epsilon: 0.08\n",
"episode: 1620, score: 6.0, epsilon: 0.08\n",
"episode: 1625, score: 1.0, epsilon: 0.08\n",
"episode: 1630, score: 3.0, epsilon: 0.08\n",
"episode: 1635, score: 3.0, epsilon: 0.08\n",
"episode: 1640, score: 0.0, epsilon: 0.08\n",
"episode: 1645, score: 1.0, epsilon: 0.08\n",
"episode: 1650, score: 2.0, epsilon: 0.08\n",
"episode: 1655, score: 1.0, epsilon: 0.08\n",
"episode: 1660, score: 1.0, epsilon: 0.08\n",
"episode: 1665, score: 5.0, epsilon: 0.08\n",
"episode: 1670, score: 0.0, epsilon: 0.08\n",
"episode: 1675, score: 0.0, epsilon: 0.08\n",
"episode: 1680, score: 0.0, epsilon: 0.08\n",
"episode: 1685, score: 2.0, epsilon: 0.08\n",
"episode: 1690, score: 1.0, epsilon: 0.08\n",
"episode: 1695, score: 0.0, epsilon: 0.08\n",
"marking, episode: 1700, score: 4.0, mean_score: 2.11, std_score: 1.97\n",
"episode: 1700, score: 4.0, epsilon: 0.08\n",
"episode: 1705, score: 1.0, epsilon: 0.08\n",
"episode: 1710, score: 5.0, epsilon: 0.08\n",
"episode: 1715, score: 2.0, epsilon: 0.08\n",
"episode: 1720, score: 2.0, epsilon: 0.08\n",
"episode: 1725, score: 2.0, epsilon: 0.08\n",
"episode: 1730, score: 3.0, epsilon: 0.08\n",
"episode: 1735, score: 3.0, epsilon: 0.08\n",
"episode: 1740, score: 1.0, epsilon: 0.08\n",
"episode: 1745, score: 6.0, epsilon: 0.08\n",
"episode: 1750, score: 0.0, epsilon: 0.08\n",
"episode: 1755, score: 0.0, epsilon: 0.08\n",
"episode: 1760, score: 6.0, epsilon: 0.08\n",
"episode: 1765, score: 0.0, epsilon: 0.08\n",
"episode: 1770, score: 1.0, epsilon: 0.08\n",
"episode: 1775, score: 1.0, epsilon: 0.08\n",
"episode: 1780, score: 2.0, epsilon: 0.08\n",
"episode: 1785, score: 5.0, epsilon: 0.08\n",
"episode: 1790, score: 4.0, epsilon: 0.08\n",
"episode: 1795, score: 2.0, epsilon: 0.08\n",
"marking, episode: 1800, score: 2.0, mean_score: 1.88, std_score: 1.73\n",
"episode: 1800, score: 2.0, epsilon: 0.08\n",
"episode: 1805, score: 2.0, epsilon: 0.08\n",
"episode: 1810, score: 5.0, epsilon: 0.08\n",
"episode: 1815, score: 3.0, epsilon: 0.08\n",
"episode: 1820, score: 0.0, epsilon: 0.08\n",
"episode: 1825, score: 2.0, epsilon: 0.08\n",
"episode: 1830, score: 6.0, epsilon: 0.08\n",
"episode: 1835, score: 0.0, epsilon: 0.08\n",
"episode: 1840, score: 0.0, epsilon: 0.08\n",
"episode: 1845, score: 6.0, epsilon: 0.08\n",
"episode: 1850, score: 3.0, epsilon: 0.08\n",
"episode: 1855, score: 0.0, epsilon: 0.08\n",
"episode: 1860, score: 3.0, epsilon: 0.08\n",
"episode: 1865, score: 1.0, epsilon: 0.08\n",
"episode: 1870, score: 2.0, epsilon: 0.08\n",
"episode: 1875, score: 1.0, epsilon: 0.08\n",
"episode: 1880, score: 0.0, epsilon: 0.08\n",
"episode: 1885, score: 3.0, epsilon: 0.08\n",
"episode: 1890, score: 0.0, epsilon: 0.08\n",
"episode: 1895, score: 0.0, epsilon: 0.08\n",
"marking, episode: 1900, score: 3.0, mean_score: 2.26, std_score: 1.98\n",
"episode: 1900, score: 3.0, epsilon: 0.08\n",
"episode: 1905, score: 2.0, epsilon: 0.08\n",
"episode: 1910, score: 1.0, epsilon: 0.08\n",
"episode: 1915, score: 0.0, epsilon: 0.08\n",
"episode: 1920, score: 6.0, epsilon: 0.08\n",
"episode: 1925, score: 3.0, epsilon: 0.08\n",
"episode: 1930, score: 0.0, epsilon: 0.08\n",
"episode: 1935, score: 3.0, epsilon: 0.08\n",
"episode: 1940, score: 1.0, epsilon: 0.08\n",
"episode: 1945, score: 4.0, epsilon: 0.08\n",
"episode: 1950, score: 0.0, epsilon: 0.08\n",
"episode: 1955, score: 1.0, epsilon: 0.08\n",
"episode: 1960, score: 6.0, epsilon: 0.08\n",
"episode: 1965, score: 1.0, epsilon: 0.08\n",
"episode: 1970, score: 1.0, epsilon: 0.08\n",
"episode: 1975, score: 0.0, epsilon: 0.08\n",
"episode: 1980, score: 0.0, epsilon: 0.08\n",
"episode: 1985, score: 4.0, epsilon: 0.08\n",
"episode: 1990, score: 0.0, epsilon: 0.08\n",
"episode: 1995, score: 4.0, epsilon: 0.08\n",
"marking, episode: 2000, score: 6.0, mean_score: 2.03, std_score: 1.79\n",
"episode: 2000, score: 6.0, epsilon: 0.08\n",
"episode: 2005, score: 3.0, epsilon: 0.08\n",
"episode: 2010, score: 3.0, epsilon: 0.08\n",
"episode: 2015, score: 0.0, epsilon: 0.08\n",
"episode: 2020, score: 5.0, epsilon: 0.08\n",
"episode: 2025, score: 6.0, epsilon: 0.08\n",
"episode: 2030, score: 3.0, epsilon: 0.08\n",
"episode: 2035, score: 1.0, epsilon: 0.08\n",
"episode: 2040, score: 1.0, epsilon: 0.08\n",
"episode: 2045, score: 2.0, epsilon: 0.08\n",
"episode: 2050, score: 6.0, epsilon: 0.08\n",
"episode: 2055, score: 1.0, epsilon: 0.08\n",
"episode: 2060, score: 4.0, epsilon: 0.08\n",
"episode: 2065, score: 4.0, epsilon: 0.08\n",
"episode: 2070, score: 0.0, epsilon: 0.08\n",
"episode: 2075, score: 1.0, epsilon: 0.08\n",
"episode: 2080, score: 3.0, epsilon: 0.08\n",
"episode: 2085, score: 2.0, epsilon: 0.08\n",
"episode: 2090, score: 1.0, epsilon: 0.08\n",
"episode: 2095, score: 1.0, epsilon: 0.08\n",
"marking, episode: 2100, score: 4.0, mean_score: 2.23, std_score: 2.02\n",
"episode: 2100, score: 4.0, epsilon: 0.08\n",
"episode: 2105, score: 5.0, epsilon: 0.08\n",
"episode: 2110, score: 1.0, epsilon: 0.08\n",
"episode: 2115, score: 3.0, epsilon: 0.08\n",
"episode: 2120, score: 6.0, epsilon: 0.08\n",
"episode: 2125, score: 1.0, epsilon: 0.08\n",
"episode: 2130, score: 4.0, epsilon: 0.08\n",
"episode: 2135, score: 1.0, epsilon: 0.08\n",
"episode: 2140, score: 1.0, epsilon: 0.08\n",
"episode: 2145, score: 2.0, epsilon: 0.08\n",
"episode: 2150, score: 0.0, epsilon: 0.08\n",
"episode: 2155, score: 0.0, epsilon: 0.08\n",
"episode: 2160, score: 3.0, epsilon: 0.08\n",
"episode: 2165, score: 1.0, epsilon: 0.08\n",
"episode: 2170, score: 2.0, epsilon: 0.08\n",
"episode: 2175, score: 0.0, epsilon: 0.08\n",
"episode: 2180, score: 6.0, epsilon: 0.08\n",
"episode: 2185, score: 1.0, epsilon: 0.08\n",
"episode: 2190, score: 1.0, epsilon: 0.08\n",
"episode: 2195, score: 1.0, epsilon: 0.08\n",
"marking, episode: 2200, score: 2.0, mean_score: 2.03, std_score: 2.03\n",
"episode: 2200, score: 2.0, epsilon: 0.08\n",
"episode: 2205, score: 2.0, epsilon: 0.08\n",
"episode: 2210, score: 3.0, epsilon: 0.08\n",
"episode: 2215, score: 4.0, epsilon: 0.08\n",
"episode: 2220, score: 5.0, epsilon: 0.08\n",
"episode: 2225, score: 2.0, epsilon: 0.08\n",
"episode: 2230, score: 4.0, epsilon: 0.08\n",
"episode: 2235, score: 0.0, epsilon: 0.08\n",
"episode: 2240, score: 0.0, epsilon: 0.08\n",
"episode: 2245, score: 3.0, epsilon: 0.08\n",
"episode: 2250, score: 3.0, epsilon: 0.08\n",
"episode: 2255, score: 2.0, epsilon: 0.08\n",
"episode: 2260, score: 5.0, epsilon: 0.08\n",
"episode: 2265, score: 2.0, epsilon: 0.08\n",
"episode: 2270, score: 4.0, epsilon: 0.08\n",
"episode: 2275, score: 6.0, epsilon: 0.08\n",
"episode: 2280, score: 5.0, epsilon: 0.08\n",
"episode: 2285, score: 1.0, epsilon: 0.08\n",
"episode: 2290, score: 6.0, epsilon: 0.08\n",
"episode: 2295, score: 6.0, epsilon: 0.08\n",
"marking, episode: 2300, score: 6.0, mean_score: 2.27, std_score: 2.03\n",
"episode: 2300, score: 6.0, epsilon: 0.08\n",
"episode: 2305, score: 1.0, epsilon: 0.08\n",
"episode: 2310, score: 6.0, epsilon: 0.08\n",
"episode: 2315, score: 2.0, epsilon: 0.08\n",
"episode: 2320, score: 3.0, epsilon: 0.08\n",
"episode: 2325, score: 2.0, epsilon: 0.08\n",
"episode: 2330, score: 0.0, epsilon: 0.08\n",
"episode: 2335, score: 2.0, epsilon: 0.08\n",
"episode: 2340, score: 1.0, epsilon: 0.08\n",
"episode: 2345, score: 1.0, epsilon: 0.08\n",
"episode: 2350, score: 0.0, epsilon: 0.08\n",
"episode: 2355, score: 1.0, epsilon: 0.08\n",
"episode: 2360, score: 6.0, epsilon: 0.08\n",
"episode: 2365, score: 2.0, epsilon: 0.08\n",
"episode: 2370, score: 1.0, epsilon: 0.08\n",
"episode: 2375, score: 3.0, epsilon: 0.08\n",
"episode: 2380, score: 0.0, epsilon: 0.08\n",
"episode: 2385, score: 4.0, epsilon: 0.08\n",
"episode: 2390, score: 6.0, epsilon: 0.08\n",
"episode: 2395, score: 0.0, epsilon: 0.08\n",
"marking, episode: 2400, score: 6.0, mean_score: 2.21, std_score: 2.04\n",
"episode: 2400, score: 6.0, epsilon: 0.08\n",
"episode: 2405, score: 0.0, epsilon: 0.08\n",
"episode: 2410, score: 0.0, epsilon: 0.08\n",
"episode: 2415, score: 2.0, epsilon: 0.08\n",
"episode: 2420, score: 1.0, epsilon: 0.08\n",
"episode: 2425, score: 3.0, epsilon: 0.08\n",
"episode: 2430, score: 2.0, epsilon: 0.08\n",
"episode: 2435, score: 5.0, epsilon: 0.08\n",
"episode: 2440, score: 0.0, epsilon: 0.08\n",
"episode: 2445, score: 4.0, epsilon: 0.08\n",
"episode: 2450, score: 5.0, epsilon: 0.08\n",
"episode: 2455, score: 0.0, epsilon: 0.08\n",
"episode: 2460, score: 0.0, epsilon: 0.08\n",
"episode: 2465, score: 4.0, epsilon: 0.08\n",
"episode: 2470, score: 1.0, epsilon: 0.08\n",
"episode: 2475, score: 6.0, epsilon: 0.08\n",
"episode: 2480, score: 2.0, epsilon: 0.08\n",
"episode: 2485, score: 5.0, epsilon: 0.08\n",
"episode: 2490, score: 3.0, epsilon: 0.08\n",
"episode: 2495, score: 3.0, epsilon: 0.08\n",
"marking, episode: 2500, score: 3.0, mean_score: 2.40, std_score: 2.11\n",
"episode: 2500, score: 3.0, epsilon: 0.08\n",
"episode: 2505, score: 0.0, epsilon: 0.08\n",
"episode: 2510, score: 0.0, epsilon: 0.08\n",
"episode: 2515, score: 6.0, epsilon: 0.08\n",
"episode: 2520, score: 2.0, epsilon: 0.08\n",
"episode: 2525, score: 0.0, epsilon: 0.08\n",
"episode: 2530, score: 6.0, epsilon: 0.08\n",
"episode: 2535, score: 0.0, epsilon: 0.08\n",
"episode: 2540, score: 0.0, epsilon: 0.08\n",
"episode: 2545, score: 1.0, epsilon: 0.08\n",
"episode: 2550, score: 0.0, epsilon: 0.08\n",
"episode: 2555, score: 0.0, epsilon: 0.08\n",
"episode: 2560, score: 5.0, epsilon: 0.08\n",
"episode: 2565, score: 0.0, epsilon: 0.08\n",
"episode: 2570, score: 6.0, epsilon: 0.08\n",
"episode: 2575, score: 2.0, epsilon: 0.08\n",
"episode: 2580, score: 0.0, epsilon: 0.08\n",
"episode: 2585, score: 6.0, epsilon: 0.08\n",
"episode: 2590, score: 0.0, epsilon: 0.08\n",
"episode: 2595, score: 6.0, epsilon: 0.08\n",
"marking, episode: 2600, score: 0.0, mean_score: 2.31, std_score: 2.16\n",
"episode: 2600, score: 0.0, epsilon: 0.08\n",
"episode: 2605, score: 0.0, epsilon: 0.08\n",
"episode: 2610, score: 0.0, epsilon: 0.08\n",
"episode: 2615, score: 0.0, epsilon: 0.08\n",
"episode: 2620, score: 1.0, epsilon: 0.08\n",
"episode: 2625, score: 0.0, epsilon: 0.08\n",
"episode: 2630, score: 4.0, epsilon: 0.08\n",
"episode: 2635, score: 1.0, epsilon: 0.08\n",
"episode: 2640, score: 1.0, epsilon: 0.08\n",
"episode: 2645, score: 2.0, epsilon: 0.08\n",
"episode: 2650, score: 3.0, epsilon: 0.08\n",
"episode: 2655, score: 0.0, epsilon: 0.08\n",
"episode: 2660, score: 0.0, epsilon: 0.08\n",
"episode: 2665, score: 1.0, epsilon: 0.08\n",
"episode: 2670, score: 5.0, epsilon: 0.08\n",
"episode: 2675, score: 3.0, epsilon: 0.08\n",
"episode: 2680, score: 6.0, epsilon: 0.08\n",
"episode: 2685, score: 3.0, epsilon: 0.08\n",
"episode: 2690, score: 3.0, epsilon: 0.08\n",
"episode: 2695, score: 1.0, epsilon: 0.08\n",
"marking, episode: 2700, score: 4.0, mean_score: 1.81, std_score: 1.72\n",
"episode: 2700, score: 4.0, epsilon: 0.08\n",
"episode: 2705, score: 1.0, epsilon: 0.08\n",
"episode: 2710, score: 0.0, epsilon: 0.08\n",
"episode: 2715, score: 6.0, epsilon: 0.08\n",
"episode: 2720, score: 2.0, epsilon: 0.08\n",
"episode: 2725, score: 0.0, epsilon: 0.08\n",
"episode: 2730, score: 0.0, epsilon: 0.08\n",
"episode: 2735, score: 0.0, epsilon: 0.08\n",
"episode: 2740, score: 1.0, epsilon: 0.08\n",
"episode: 2745, score: 0.0, epsilon: 0.08\n",
"episode: 2750, score: 0.0, epsilon: 0.08\n",
"episode: 2755, score: 4.0, epsilon: 0.08\n",
"episode: 2760, score: 1.0, epsilon: 0.08\n",
"episode: 2765, score: 0.0, epsilon: 0.08\n",
"episode: 2770, score: 3.0, epsilon: 0.08\n",
"episode: 2775, score: 0.0, epsilon: 0.08\n",
"episode: 2780, score: 2.0, epsilon: 0.08\n",
"episode: 2785, score: 0.0, epsilon: 0.08\n",
"episode: 2790, score: 0.0, epsilon: 0.08\n",
"episode: 2795, score: 3.0, epsilon: 0.08\n",
"marking, episode: 2800, score: 0.0, mean_score: 2.11, std_score: 2.05\n",
"episode: 2800, score: 0.0, epsilon: 0.08\n",
"episode: 2805, score: 5.0, epsilon: 0.08\n",
"episode: 2810, score: 3.0, epsilon: 0.08\n",
"episode: 2815, score: 3.0, epsilon: 0.08\n",
"episode: 2820, score: 2.0, epsilon: 0.08\n",
"episode: 2825, score: 1.0, epsilon: 0.08\n",
"episode: 2830, score: 1.0, epsilon: 0.08\n",
"episode: 2835, score: 2.0, epsilon: 0.08\n",
"episode: 2840, score: 2.0, epsilon: 0.08\n",
"episode: 2845, score: 2.0, epsilon: 0.08\n",
"episode: 2850, score: 5.0, epsilon: 0.08\n",
"episode: 2855, score: 2.0, epsilon: 0.08\n",
"episode: 2860, score: 5.0, epsilon: 0.08\n",
"episode: 2865, score: 1.0, epsilon: 0.08\n",
"episode: 2870, score: 1.0, epsilon: 0.08\n",
"episode: 2875, score: 2.0, epsilon: 0.08\n",
"episode: 2880, score: 3.0, epsilon: 0.08\n",
"episode: 2885, score: 3.0, epsilon: 0.08\n",
"episode: 2890, score: 5.0, epsilon: 0.08\n",
"episode: 2895, score: 5.0, epsilon: 0.08\n",
"marking, episode: 2900, score: 1.0, mean_score: 2.44, std_score: 1.86\n",
"episode: 2900, score: 1.0, epsilon: 0.08\n",
"episode: 2905, score: 0.0, epsilon: 0.08\n",
"episode: 2910, score: 1.0, epsilon: 0.08\n",
"episode: 2915, score: 1.0, epsilon: 0.08\n",
"episode: 2920, score: 0.0, epsilon: 0.08\n",
"episode: 2925, score: 1.0, epsilon: 0.08\n",
"episode: 2930, score: 0.0, epsilon: 0.08\n",
"episode: 2935, score: 3.0, epsilon: 0.08\n",
"episode: 2940, score: 2.0, epsilon: 0.08\n",
"episode: 2945, score: 6.0, epsilon: 0.08\n",
"episode: 2950, score: 2.0, epsilon: 0.08\n",
"episode: 2955, score: 3.0, epsilon: 0.08\n",
"episode: 2960, score: 0.0, epsilon: 0.08\n",
"episode: 2965, score: 3.0, epsilon: 0.08\n",
"episode: 2970, score: 3.0, epsilon: 0.08\n",
"episode: 2975, score: 2.0, epsilon: 0.08\n",
"episode: 2980, score: 1.0, epsilon: 0.08\n",
"episode: 2985, score: 4.0, epsilon: 0.08\n",
"episode: 2990, score: 1.0, epsilon: 0.08\n",
"episode: 2995, score: 0.0, epsilon: 0.08\n",
"marking, episode: 3000, score: 2.0, mean_score: 2.40, std_score: 2.05\n",
"episode: 3000, score: 2.0, epsilon: 0.08\n",
"episode: 3005, score: 2.0, epsilon: 0.08\n",
"episode: 3010, score: 0.0, epsilon: 0.08\n",
"episode: 3015, score: 3.0, epsilon: 0.08\n",
"episode: 3020, score: 0.0, epsilon: 0.08\n",
"episode: 3025, score: 3.0, epsilon: 0.08\n",
"episode: 3030, score: 0.0, epsilon: 0.08\n",
"episode: 3035, score: 2.0, epsilon: 0.08\n",
"episode: 3040, score: 0.0, epsilon: 0.08\n",
"episode: 3045, score: 2.0, epsilon: 0.08\n",
"episode: 3050, score: 3.0, epsilon: 0.08\n",
"episode: 3055, score: 5.0, epsilon: 0.08\n",
"episode: 3060, score: 0.0, epsilon: 0.08\n",
"episode: 3065, score: 1.0, epsilon: 0.08\n",
"episode: 3070, score: 6.0, epsilon: 0.08\n",
"episode: 3075, score: 1.0, epsilon: 0.08\n",
"episode: 3080, score: 0.0, epsilon: 0.08\n",
"episode: 3085, score: 6.0, epsilon: 0.08\n",
"episode: 3090, score: 3.0, epsilon: 0.08\n",
"episode: 3095, score: 0.0, epsilon: 0.08\n",
"marking, episode: 3100, score: 1.0, mean_score: 2.22, std_score: 2.04\n",
"episode: 3100, score: 1.0, epsilon: 0.08\n",
"episode: 3105, score: 1.0, epsilon: 0.08\n",
"episode: 3110, score: 2.0, epsilon: 0.08\n",
"episode: 3115, score: 1.0, epsilon: 0.08\n",
"episode: 3120, score: 0.0, epsilon: 0.08\n",
"episode: 3125, score: 1.0, epsilon: 0.08\n",
"episode: 3130, score: 4.0, epsilon: 0.08\n",
"episode: 3135, score: 1.0, epsilon: 0.08\n",
"episode: 3140, score: 0.0, epsilon: 0.08\n",
"episode: 3145, score: 2.0, epsilon: 0.08\n",
"episode: 3150, score: 0.0, epsilon: 0.08\n",
"episode: 3155, score: 1.0, epsilon: 0.08\n",
"episode: 3160, score: 1.0, epsilon: 0.08\n",
"episode: 3165, score: 0.0, epsilon: 0.08\n",
"episode: 3170, score: 0.0, epsilon: 0.08\n",
"episode: 3175, score: 3.0, epsilon: 0.08\n",
"episode: 3180, score: 0.0, epsilon: 0.08\n",
"episode: 3185, score: 5.0, epsilon: 0.08\n",
"episode: 3190, score: 1.0, epsilon: 0.08\n",
"episode: 3195, score: 6.0, epsilon: 0.08\n",
"marking, episode: 3200, score: 5.0, mean_score: 1.90, std_score: 2.07\n",
"episode: 3200, score: 5.0, epsilon: 0.08\n",
"episode: 3205, score: 0.0, epsilon: 0.08\n",
"episode: 3210, score: 2.0, epsilon: 0.08\n",
"episode: 3215, score: 1.0, epsilon: 0.08\n",
"episode: 3220, score: 1.0, epsilon: 0.08\n",
"episode: 3225, score: 6.0, epsilon: 0.08\n",
"episode: 3230, score: 0.0, epsilon: 0.08\n",
"episode: 3235, score: 3.0, epsilon: 0.08\n",
"episode: 3240, score: 6.0, epsilon: 0.08\n",
"episode: 3245, score: 0.0, epsilon: 0.08\n",
"episode: 3250, score: 0.0, epsilon: 0.08\n",
"episode: 3255, score: 1.0, epsilon: 0.08\n",
"episode: 3260, score: 1.0, epsilon: 0.08\n",
"episode: 3265, score: 1.0, epsilon: 0.08\n",
"episode: 3270, score: 0.0, epsilon: 0.08\n",
"episode: 3275, score: 5.0, epsilon: 0.08\n",
"episode: 3280, score: 2.0, epsilon: 0.08\n",
"episode: 3285, score: 4.0, epsilon: 0.08\n",
"episode: 3290, score: 6.0, epsilon: 0.08\n",
"episode: 3295, score: 0.0, epsilon: 0.08\n",
"marking, episode: 3300, score: 6.0, mean_score: 1.91, std_score: 2.04\n",
"episode: 3300, score: 6.0, epsilon: 0.08\n",
"episode: 3305, score: 6.0, epsilon: 0.08\n",
"episode: 3310, score: 6.0, epsilon: 0.08\n",
"episode: 3315, score: 0.0, epsilon: 0.08\n",
"episode: 3320, score: 0.0, epsilon: 0.08\n",
"episode: 3325, score: 1.0, epsilon: 0.08\n",
"episode: 3330, score: 3.0, epsilon: 0.08\n",
"episode: 3335, score: 2.0, epsilon: 0.08\n",
"episode: 3340, score: 2.0, epsilon: 0.08\n",
"episode: 3345, score: 0.0, epsilon: 0.08\n",
"episode: 3350, score: 3.0, epsilon: 0.08\n",
"episode: 3355, score: 2.0, epsilon: 0.08\n",
"episode: 3360, score: 3.0, epsilon: 0.08\n",
"episode: 3365, score: 2.0, epsilon: 0.08\n",
"episode: 3370, score: 3.0, epsilon: 0.08\n",
"episode: 3375, score: 4.0, epsilon: 0.08\n",
"episode: 3380, score: 6.0, epsilon: 0.08\n",
"episode: 3385, score: 1.0, epsilon: 0.08\n",
"episode: 3390, score: 1.0, epsilon: 0.08\n",
"episode: 3395, score: 0.0, epsilon: 0.08\n",
"marking, episode: 3400, score: 6.0, mean_score: 2.17, std_score: 2.13\n",
"episode: 3400, score: 6.0, epsilon: 0.08\n",
"episode: 3405, score: 1.0, epsilon: 0.08\n",
"episode: 3410, score: 6.0, epsilon: 0.08\n",
"episode: 3415, score: 1.0, epsilon: 0.08\n",
"episode: 3420, score: 3.0, epsilon: 0.08\n",
"episode: 3425, score: 3.0, epsilon: 0.08\n",
"episode: 3430, score: 1.0, epsilon: 0.08\n",
"episode: 3435, score: 2.0, epsilon: 0.08\n",
"episode: 3440, score: 0.0, epsilon: 0.08\n",
"episode: 3445, score: 3.0, epsilon: 0.08\n",
"episode: 3450, score: 1.0, epsilon: 0.08\n",
"episode: 3455, score: 5.0, epsilon: 0.08\n",
"episode: 3460, score: 2.0, epsilon: 0.08\n",
"episode: 3465, score: 0.0, epsilon: 0.08\n",
"episode: 3470, score: 1.0, epsilon: 0.08\n",
"episode: 3475, score: 6.0, epsilon: 0.08\n",
"episode: 3480, score: 2.0, epsilon: 0.08\n",
"episode: 3485, score: 1.0, epsilon: 0.08\n",
"episode: 3490, score: 0.0, epsilon: 0.08\n",
"episode: 3495, score: 4.0, epsilon: 0.08\n",
"marking, episode: 3500, score: 0.0, mean_score: 2.18, std_score: 1.97\n",
"episode: 3500, score: 0.0, epsilon: 0.08\n",
"episode: 3505, score: 0.0, epsilon: 0.08\n",
"episode: 3510, score: 3.0, epsilon: 0.08\n",
"episode: 3515, score: 2.0, epsilon: 0.08\n",
"episode: 3520, score: 0.0, epsilon: 0.08\n",
"episode: 3525, score: 1.0, epsilon: 0.08\n",
"episode: 3530, score: 2.0, epsilon: 0.08\n",
"episode: 3535, score: 2.0, epsilon: 0.08\n",
"episode: 3540, score: 6.0, epsilon: 0.08\n",
"episode: 3545, score: 0.0, epsilon: 0.08\n",
"episode: 3550, score: 4.0, epsilon: 0.08\n",
"episode: 3555, score: 1.0, epsilon: 0.08\n",
"episode: 3560, score: 1.0, epsilon: 0.08\n",
"episode: 3565, score: 1.0, epsilon: 0.08\n",
"episode: 3570, score: 0.0, epsilon: 0.08\n",
"episode: 3575, score: 6.0, epsilon: 0.08\n",
"episode: 3580, score: 6.0, epsilon: 0.08\n",
"episode: 3585, score: 2.0, epsilon: 0.08\n",
"episode: 3590, score: 1.0, epsilon: 0.08\n",
"episode: 3595, score: 0.0, epsilon: 0.08\n",
"marking, episode: 3600, score: 1.0, mean_score: 2.49, std_score: 2.05\n",
"episode: 3600, score: 1.0, epsilon: 0.08\n",
"episode: 3605, score: 6.0, epsilon: 0.08\n",
"episode: 3610, score: 4.0, epsilon: 0.08\n",
"episode: 3615, score: 6.0, epsilon: 0.08\n",
"episode: 3620, score: 6.0, epsilon: 0.08\n",
"episode: 3625, score: 0.0, epsilon: 0.08\n",
"episode: 3630, score: 3.0, epsilon: 0.08\n",
"episode: 3635, score: 0.0, epsilon: 0.08\n",
"episode: 3640, score: 0.0, epsilon: 0.08\n",
"episode: 3645, score: 0.0, epsilon: 0.08\n",
"episode: 3650, score: 5.0, epsilon: 0.08\n",
"episode: 3655, score: 0.0, epsilon: 0.08\n",
"episode: 3660, score: 0.0, epsilon: 0.08\n",
"episode: 3665, score: 0.0, epsilon: 0.08\n",
"episode: 3670, score: 0.0, epsilon: 0.08\n",
"episode: 3675, score: 3.0, epsilon: 0.08\n",
"episode: 3680, score: 1.0, epsilon: 0.08\n",
"episode: 3685, score: 0.0, epsilon: 0.08\n",
"episode: 3690, score: 0.0, epsilon: 0.08\n",
"episode: 3695, score: 2.0, epsilon: 0.08\n",
"marking, episode: 3700, score: 2.0, mean_score: 2.33, std_score: 2.06\n",
"episode: 3700, score: 2.0, epsilon: 0.08\n",
"episode: 3705, score: 4.0, epsilon: 0.08\n",
"episode: 3710, score: 2.0, epsilon: 0.08\n",
"episode: 3715, score: 1.0, epsilon: 0.08\n",
"episode: 3720, score: 2.0, epsilon: 0.08\n",
"episode: 3725, score: 2.0, epsilon: 0.08\n",
"episode: 3730, score: 0.0, epsilon: 0.08\n",
"episode: 3735, score: 0.0, epsilon: 0.08\n",
"episode: 3740, score: 0.0, epsilon: 0.08\n",
"episode: 3745, score: 6.0, epsilon: 0.08\n",
"episode: 3750, score: 0.0, epsilon: 0.08\n",
"episode: 3755, score: 1.0, epsilon: 0.08\n",
"episode: 3760, score: 3.0, epsilon: 0.08\n",
"episode: 3765, score: 6.0, epsilon: 0.08\n",
"episode: 3770, score: 2.0, epsilon: 0.08\n",
"episode: 3775, score: 2.0, epsilon: 0.08\n",
"episode: 3780, score: 3.0, epsilon: 0.08\n",
"episode: 3785, score: 1.0, epsilon: 0.08\n",
"episode: 3790, score: 0.0, epsilon: 0.08\n",
"episode: 3795, score: 6.0, epsilon: 0.08\n",
"marking, episode: 3800, score: 2.0, mean_score: 2.37, std_score: 2.11\n",
"episode: 3800, score: 2.0, epsilon: 0.08\n",
"episode: 3805, score: 0.0, epsilon: 0.08\n",
"episode: 3810, score: 0.0, epsilon: 0.08\n",
"episode: 3815, score: 4.0, epsilon: 0.08\n",
"episode: 3820, score: 6.0, epsilon: 0.08\n",
"episode: 3825, score: 3.0, epsilon: 0.08\n",
"episode: 3830, score: 3.0, epsilon: 0.08\n",
"episode: 3835, score: 5.0, epsilon: 0.08\n",
"episode: 3840, score: 0.0, epsilon: 0.08\n",
"episode: 3845, score: 0.0, epsilon: 0.08\n",
"episode: 3850, score: 1.0, epsilon: 0.08\n",
"episode: 3855, score: 6.0, epsilon: 0.08\n",
"episode: 3860, score: 0.0, epsilon: 0.08\n",
"episode: 3865, score: 4.0, epsilon: 0.08\n",
"episode: 3870, score: 1.0, epsilon: 0.08\n",
"episode: 3875, score: 0.0, epsilon: 0.08\n",
"episode: 3880, score: 4.0, epsilon: 0.08\n",
"episode: 3885, score: 1.0, epsilon: 0.08\n",
"episode: 3890, score: 3.0, epsilon: 0.08\n",
"episode: 3895, score: 1.0, epsilon: 0.08\n",
"marking, episode: 3900, score: 3.0, mean_score: 2.16, std_score: 1.94\n",
"episode: 3900, score: 3.0, epsilon: 0.08\n",
"episode: 3905, score: 1.0, epsilon: 0.08\n",
"episode: 3910, score: 1.0, epsilon: 0.08\n",
"episode: 3915, score: 1.0, epsilon: 0.08\n",
"episode: 3920, score: 2.0, epsilon: 0.08\n",
"episode: 3925, score: 1.0, epsilon: 0.08\n",
"episode: 3930, score: 1.0, epsilon: 0.08\n",
"episode: 3935, score: 1.0, epsilon: 0.08\n",
"episode: 3940, score: 6.0, epsilon: 0.08\n",
"episode: 3945, score: 2.0, epsilon: 0.08\n",
"episode: 3950, score: 1.0, epsilon: 0.08\n",
"episode: 3955, score: 2.0, epsilon: 0.08\n",
"episode: 3960, score: 6.0, epsilon: 0.08\n",
"episode: 3965, score: 2.0, epsilon: 0.08\n",
"episode: 3970, score: 1.0, epsilon: 0.08\n",
"episode: 3975, score: 6.0, epsilon: 0.08\n",
"episode: 3980, score: 3.0, epsilon: 0.08\n",
"episode: 3985, score: 0.0, epsilon: 0.08\n",
"episode: 3990, score: 0.0, epsilon: 0.08\n",
"episode: 3995, score: 6.0, epsilon: 0.08\n",
"marking, episode: 4000, score: 4.0, mean_score: 2.64, std_score: 2.19\n",
"episode: 4000, score: 4.0, epsilon: 0.08\n",
"episode: 4005, score: 6.0, epsilon: 0.08\n",
"episode: 4010, score: 1.0, epsilon: 0.08\n",
"episode: 4015, score: 2.0, epsilon: 0.08\n",
"episode: 4020, score: 0.0, epsilon: 0.08\n",
"episode: 4025, score: 1.0, epsilon: 0.08\n",
"episode: 4030, score: 5.0, epsilon: 0.08\n",
"episode: 4035, score: 1.0, epsilon: 0.08\n",
"episode: 4040, score: 2.0, epsilon: 0.08\n",
"episode: 4045, score: 0.0, epsilon: 0.08\n",
"episode: 4050, score: 0.0, epsilon: 0.08\n",
"episode: 4055, score: 6.0, epsilon: 0.08\n",
"episode: 4060, score: 3.0, epsilon: 0.08\n",
"episode: 4065, score: 1.0, epsilon: 0.08\n",
"episode: 4070, score: 0.0, epsilon: 0.08\n",
"episode: 4075, score: 0.0, epsilon: 0.08\n",
"episode: 4080, score: 2.0, epsilon: 0.08\n",
"episode: 4085, score: 1.0, epsilon: 0.08\n",
"episode: 4090, score: 2.0, epsilon: 0.08\n",
"episode: 4095, score: 1.0, epsilon: 0.08\n",
"marking, episode: 4100, score: 6.0, mean_score: 2.20, std_score: 1.89\n",
"episode: 4100, score: 6.0, epsilon: 0.08\n",
"episode: 4105, score: 4.0, epsilon: 0.08\n",
"episode: 4110, score: 6.0, epsilon: 0.08\n",
"episode: 4115, score: 2.0, epsilon: 0.08\n",
"episode: 4120, score: 3.0, epsilon: 0.08\n",
"episode: 4125, score: 2.0, epsilon: 0.08\n",
"episode: 4130, score: 0.0, epsilon: 0.08\n",
"episode: 4135, score: 2.0, epsilon: 0.08\n",
"episode: 4140, score: 0.0, epsilon: 0.08\n",
"episode: 4145, score: 0.0, epsilon: 0.08\n",
"episode: 4150, score: 1.0, epsilon: 0.08\n",
"episode: 4155, score: 4.0, epsilon: 0.08\n",
"episode: 4160, score: 3.0, epsilon: 0.08\n",
"episode: 4165, score: 6.0, epsilon: 0.08\n",
"episode: 4170, score: 0.0, epsilon: 0.08\n",
"episode: 4175, score: 2.0, epsilon: 0.08\n",
"episode: 4180, score: 4.0, epsilon: 0.08\n",
"episode: 4185, score: 3.0, epsilon: 0.08\n",
"episode: 4190, score: 6.0, epsilon: 0.08\n",
"episode: 4195, score: 4.0, epsilon: 0.08\n",
"marking, episode: 4200, score: 1.0, mean_score: 2.55, std_score: 2.17\n",
"episode: 4200, score: 1.0, epsilon: 0.08\n",
"episode: 4205, score: 1.0, epsilon: 0.08\n",
"episode: 4210, score: 0.0, epsilon: 0.08\n",
"episode: 4215, score: 0.0, epsilon: 0.08\n",
"episode: 4220, score: 2.0, epsilon: 0.08\n",
"episode: 4225, score: 6.0, epsilon: 0.08\n",
"episode: 4230, score: 1.0, epsilon: 0.08\n",
"episode: 4235, score: 4.0, epsilon: 0.08\n",
"episode: 4240, score: 4.0, epsilon: 0.08\n",
"episode: 4245, score: 1.0, epsilon: 0.08\n",
"episode: 4250, score: 1.0, epsilon: 0.08\n",
"episode: 4255, score: 0.0, epsilon: 0.08\n",
"episode: 4260, score: 0.0, epsilon: 0.08\n",
"episode: 4265, score: 2.0, epsilon: 0.08\n",
"episode: 4270, score: 6.0, epsilon: 0.08\n",
"episode: 4275, score: 2.0, epsilon: 0.08\n",
"episode: 4280, score: 1.0, epsilon: 0.08\n",
"episode: 4285, score: 0.0, epsilon: 0.08\n",
"episode: 4290, score: 4.0, epsilon: 0.08\n",
"episode: 4295, score: 3.0, epsilon: 0.08\n",
"marking, episode: 4300, score: 5.0, mean_score: 2.40, std_score: 2.05\n",
"episode: 4300, score: 5.0, epsilon: 0.08\n",
"episode: 4305, score: 3.0, epsilon: 0.08\n",
"episode: 4310, score: 0.0, epsilon: 0.08\n",
"episode: 4315, score: 0.0, epsilon: 0.08\n",
"episode: 4320, score: 6.0, epsilon: 0.08\n",
"episode: 4325, score: 2.0, epsilon: 0.08\n",
"episode: 4330, score: 0.0, epsilon: 0.08\n",
"episode: 4335, score: 2.0, epsilon: 0.08\n",
"episode: 4340, score: 0.0, epsilon: 0.08\n",
"episode: 4345, score: 1.0, epsilon: 0.08\n",
"episode: 4350, score: 0.0, epsilon: 0.08\n",
"episode: 4355, score: 3.0, epsilon: 0.08\n",
"episode: 4360, score: 6.0, epsilon: 0.08\n",
"episode: 4365, score: 6.0, epsilon: 0.08\n",
"episode: 4370, score: 0.0, epsilon: 0.08\n",
"episode: 4375, score: 6.0, epsilon: 0.08\n",
"episode: 4380, score: 2.0, epsilon: 0.08\n",
"episode: 4385, score: 6.0, epsilon: 0.08\n",
"episode: 4390, score: 3.0, epsilon: 0.08\n",
"episode: 4395, score: 1.0, epsilon: 0.08\n",
"marking, episode: 4400, score: 1.0, mean_score: 2.31, std_score: 1.97\n",
"episode: 4400, score: 1.0, epsilon: 0.08\n",
"episode: 4405, score: 1.0, epsilon: 0.08\n",
"episode: 4410, score: 1.0, epsilon: 0.08\n",
"episode: 4415, score: 2.0, epsilon: 0.08\n",
"episode: 4420, score: 0.0, epsilon: 0.08\n",
"episode: 4425, score: 1.0, epsilon: 0.08\n",
"episode: 4430, score: 6.0, epsilon: 0.08\n",
"episode: 4435, score: 0.0, epsilon: 0.08\n",
"episode: 4440, score: 2.0, epsilon: 0.08\n",
"episode: 4445, score: 2.0, epsilon: 0.08\n",
"episode: 4450, score: 0.0, epsilon: 0.08\n",
"episode: 4455, score: 2.0, epsilon: 0.08\n",
"episode: 4460, score: 6.0, epsilon: 0.08\n",
"episode: 4465, score: 6.0, epsilon: 0.08\n",
"episode: 4470, score: 4.0, epsilon: 0.08\n",
"episode: 4475, score: 1.0, epsilon: 0.08\n",
"episode: 4480, score: 1.0, epsilon: 0.08\n",
"episode: 4485, score: 0.0, epsilon: 0.08\n",
"episode: 4490, score: 5.0, epsilon: 0.08\n",
"episode: 4495, score: 3.0, epsilon: 0.08\n",
"marking, episode: 4500, score: 1.0, mean_score: 2.48, std_score: 2.06\n",
"episode: 4500, score: 1.0, epsilon: 0.08\n",
"episode: 4505, score: 2.0, epsilon: 0.08\n",
"episode: 4510, score: 6.0, epsilon: 0.08\n",
"episode: 4515, score: 3.0, epsilon: 0.08\n",
"episode: 4520, score: 1.0, epsilon: 0.08\n",
"episode: 4525, score: 3.0, epsilon: 0.08\n",
"episode: 4530, score: 5.0, epsilon: 0.08\n",
"episode: 4535, score: 3.0, epsilon: 0.08\n",
"episode: 4540, score: 4.0, epsilon: 0.08\n",
"episode: 4545, score: 0.0, epsilon: 0.08\n",
"episode: 4550, score: 1.0, epsilon: 0.08\n",
"episode: 4555, score: 6.0, epsilon: 0.08\n",
"episode: 4560, score: 2.0, epsilon: 0.08\n",
"episode: 4565, score: 6.0, epsilon: 0.08\n",
"episode: 4570, score: 1.0, epsilon: 0.08\n",
"episode: 4575, score: 5.0, epsilon: 0.08\n",
"episode: 4580, score: 6.0, epsilon: 0.08\n",
"episode: 4585, score: 6.0, epsilon: 0.08\n",
"episode: 4590, score: 6.0, epsilon: 0.08\n",
"episode: 4595, score: 1.0, epsilon: 0.08\n",
"marking, episode: 4600, score: 6.0, mean_score: 2.18, std_score: 2.05\n",
"episode: 4600, score: 6.0, epsilon: 0.08\n",
"episode: 4605, score: 1.0, epsilon: 0.08\n",
"episode: 4610, score: 2.0, epsilon: 0.08\n",
"episode: 4615, score: 1.0, epsilon: 0.08\n",
"episode: 4620, score: 0.0, epsilon: 0.08\n",
"episode: 4625, score: 2.0, epsilon: 0.08\n",
"episode: 4630, score: 0.0, epsilon: 0.08\n",
"episode: 4635, score: 0.0, epsilon: 0.08\n",
"episode: 4640, score: 6.0, epsilon: 0.08\n",
"episode: 4645, score: 5.0, epsilon: 0.08\n",
"episode: 4650, score: 5.0, epsilon: 0.08\n",
"episode: 4655, score: 6.0, epsilon: 0.08\n",
"episode: 4660, score: 6.0, epsilon: 0.08\n",
"episode: 4665, score: 2.0, epsilon: 0.08\n",
"episode: 4670, score: 4.0, epsilon: 0.08\n",
"episode: 4675, score: 5.0, epsilon: 0.08\n",
"episode: 4680, score: 0.0, epsilon: 0.08\n",
"episode: 4685, score: 4.0, epsilon: 0.08\n",
"episode: 4690, score: 0.0, epsilon: 0.08\n",
"episode: 4695, score: 4.0, epsilon: 0.08\n",
"marking, episode: 4700, score: 2.0, mean_score: 2.02, std_score: 2.02\n",
"episode: 4700, score: 2.0, epsilon: 0.08\n",
"episode: 4705, score: 1.0, epsilon: 0.08\n",
"episode: 4710, score: 5.0, epsilon: 0.08\n",
"episode: 4715, score: 0.0, epsilon: 0.08\n",
"episode: 4720, score: 4.0, epsilon: 0.08\n",
"episode: 4725, score: 6.0, epsilon: 0.08\n",
"episode: 4730, score: 1.0, epsilon: 0.08\n",
"episode: 4735, score: 1.0, epsilon: 0.08\n",
"episode: 4740, score: 3.0, epsilon: 0.08\n",
"episode: 4745, score: 0.0, epsilon: 0.08\n",
"episode: 4750, score: 3.0, epsilon: 0.08\n",
"episode: 4755, score: 1.0, epsilon: 0.08\n",
"episode: 4760, score: 3.0, epsilon: 0.08\n",
"episode: 4765, score: 1.0, epsilon: 0.08\n",
"episode: 4770, score: 1.0, epsilon: 0.08\n",
"episode: 4775, score: 1.0, epsilon: 0.08\n",
"episode: 4780, score: 1.0, epsilon: 0.08\n",
"episode: 4785, score: 3.0, epsilon: 0.08\n",
"episode: 4790, score: 4.0, epsilon: 0.08\n",
"episode: 4795, score: 4.0, epsilon: 0.08\n",
"marking, episode: 4800, score: 0.0, mean_score: 1.92, std_score: 1.82\n",
"episode: 4800, score: 0.0, epsilon: 0.08\n",
"episode: 4805, score: 6.0, epsilon: 0.08\n",
"episode: 4810, score: 5.0, epsilon: 0.08\n",
"episode: 4815, score: 0.0, epsilon: 0.08\n",
"episode: 4820, score: 6.0, epsilon: 0.08\n",
"episode: 4825, score: 2.0, epsilon: 0.08\n",
"episode: 4830, score: 6.0, epsilon: 0.08\n",
"episode: 4835, score: 0.0, epsilon: 0.08\n",
"episode: 4840, score: 6.0, epsilon: 0.08\n",
"episode: 4845, score: 0.0, epsilon: 0.08\n",
"episode: 4850, score: 5.0, epsilon: 0.08\n",
"episode: 4855, score: 3.0, epsilon: 0.08\n",
"episode: 4860, score: 0.0, epsilon: 0.08\n",
"episode: 4865, score: 0.0, epsilon: 0.08\n",
"episode: 4870, score: 6.0, epsilon: 0.08\n",
"episode: 4875, score: 1.0, epsilon: 0.08\n",
"episode: 4880, score: 3.0, epsilon: 0.08\n",
"episode: 4885, score: 6.0, epsilon: 0.08\n",
"episode: 4890, score: 0.0, epsilon: 0.08\n",
"episode: 4895, score: 4.0, epsilon: 0.08\n",
"marking, episode: 4900, score: 2.0, mean_score: 2.49, std_score: 2.07\n",
"episode: 4900, score: 2.0, epsilon: 0.08\n",
"episode: 4905, score: 0.0, epsilon: 0.08\n",
"episode: 4910, score: 5.0, epsilon: 0.08\n",
"episode: 4915, score: 6.0, epsilon: 0.08\n",
"episode: 4920, score: 0.0, epsilon: 0.08\n",
"episode: 4925, score: 0.0, epsilon: 0.08\n",
"episode: 4930, score: 3.0, epsilon: 0.08\n",
"episode: 4935, score: 0.0, epsilon: 0.08\n",
"episode: 4940, score: 0.0, epsilon: 0.08\n",
"episode: 4945, score: 3.0, epsilon: 0.08\n",
"episode: 4950, score: 1.0, epsilon: 0.08\n",
"episode: 4955, score: 0.0, epsilon: 0.08\n",
"episode: 4960, score: 2.0, epsilon: 0.08\n",
"episode: 4965, score: 0.0, epsilon: 0.08\n",
"episode: 4970, score: 6.0, epsilon: 0.08\n",
"episode: 4975, score: 1.0, epsilon: 0.08\n",
"episode: 4980, score: 1.0, epsilon: 0.08\n",
"episode: 4985, score: 4.0, epsilon: 0.08\n",
"episode: 4990, score: 3.0, epsilon: 0.08\n",
"episode: 4995, score: 1.0, epsilon: 0.08\n",
"marking, episode: 5000, score: 5.0, mean_score: 1.51, std_score: 1.74\n",
"episode: 5000, score: 5.0, epsilon: 0.08\n",
"episode: 5005, score: 1.0, epsilon: 0.08\n",
"episode: 5010, score: 0.0, epsilon: 0.08\n",
"episode: 5015, score: 2.0, epsilon: 0.08\n",
"episode: 5020, score: 2.0, epsilon: 0.08\n",
"episode: 5025, score: 5.0, epsilon: 0.08\n",
"episode: 5030, score: 3.0, epsilon: 0.08\n",
"episode: 5035, score: 1.0, epsilon: 0.08\n",
"episode: 5040, score: 0.0, epsilon: 0.08\n",
"episode: 5045, score: 3.0, epsilon: 0.08\n",
"episode: 5050, score: 1.0, epsilon: 0.08\n",
"episode: 5055, score: 6.0, epsilon: 0.08\n",
"episode: 5060, score: 0.0, epsilon: 0.08\n",
"episode: 5065, score: 5.0, epsilon: 0.08\n",
"episode: 5070, score: 6.0, epsilon: 0.08\n",
"episode: 5075, score: 1.0, epsilon: 0.08\n",
"episode: 5080, score: 0.0, epsilon: 0.08\n",
"episode: 5085, score: 5.0, epsilon: 0.08\n",
"episode: 5090, score: 2.0, epsilon: 0.08\n",
"episode: 5095, score: 6.0, epsilon: 0.08\n",
"marking, episode: 5100, score: 1.0, mean_score: 2.85, std_score: 2.23\n",
"episode: 5100, score: 1.0, epsilon: 0.08\n",
"episode: 5105, score: 0.0, epsilon: 0.08\n",
"episode: 5110, score: 3.0, epsilon: 0.08\n",
"episode: 5115, score: 1.0, epsilon: 0.08\n",
"episode: 5120, score: 1.0, epsilon: 0.08\n",
"episode: 5125, score: 1.0, epsilon: 0.08\n",
"episode: 5130, score: 0.0, epsilon: 0.08\n",
"episode: 5135, score: 4.0, epsilon: 0.08\n",
"episode: 5140, score: 3.0, epsilon: 0.08\n",
"episode: 5145, score: 4.0, epsilon: 0.08\n",
"episode: 5150, score: 0.0, epsilon: 0.08\n",
"episode: 5155, score: 3.0, epsilon: 0.08\n",
"episode: 5160, score: 0.0, epsilon: 0.08\n",
"episode: 5165, score: 0.0, epsilon: 0.08\n",
"episode: 5170, score: 3.0, epsilon: 0.08\n",
"episode: 5175, score: 0.0, epsilon: 0.08\n",
"episode: 5180, score: 2.0, epsilon: 0.08\n",
"episode: 5185, score: 0.0, epsilon: 0.08\n",
"episode: 5190, score: 0.0, epsilon: 0.08\n",
"episode: 5195, score: 1.0, epsilon: 0.08\n",
"marking, episode: 5200, score: 3.0, mean_score: 2.12, std_score: 2.01\n",
"episode: 5200, score: 3.0, epsilon: 0.08\n",
"episode: 5205, score: 3.0, epsilon: 0.08\n",
"episode: 5210, score: 3.0, epsilon: 0.08\n",
"episode: 5215, score: 1.0, epsilon: 0.08\n",
"episode: 5220, score: 3.0, epsilon: 0.08\n",
"episode: 5225, score: 1.0, epsilon: 0.08\n",
"episode: 5230, score: 4.0, epsilon: 0.08\n",
"episode: 5235, score: 0.0, epsilon: 0.08\n",
"episode: 5240, score: 2.0, epsilon: 0.08\n",
"episode: 5245, score: 0.0, epsilon: 0.08\n",
"episode: 5250, score: 0.0, epsilon: 0.08\n",
"episode: 5255, score: 3.0, epsilon: 0.08\n",
"episode: 5260, score: 3.0, epsilon: 0.08\n",
"episode: 5265, score: 6.0, epsilon: 0.08\n",
"episode: 5270, score: 0.0, epsilon: 0.08\n",
"episode: 5275, score: 4.0, epsilon: 0.08\n",
"episode: 5280, score: 3.0, epsilon: 0.08\n",
"episode: 5285, score: 2.0, epsilon: 0.08\n",
"episode: 5290, score: 0.0, epsilon: 0.08\n",
"episode: 5295, score: 3.0, epsilon: 0.08\n",
"marking, episode: 5300, score: 6.0, mean_score: 2.60, std_score: 2.03\n",
"episode: 5300, score: 6.0, epsilon: 0.08\n",
"episode: 5305, score: 2.0, epsilon: 0.08\n",
"episode: 5310, score: 2.0, epsilon: 0.08\n",
"episode: 5315, score: 6.0, epsilon: 0.08\n",
"episode: 5320, score: 6.0, epsilon: 0.08\n",
"episode: 5325, score: 0.0, epsilon: 0.08\n",
"episode: 5330, score: 1.0, epsilon: 0.08\n",
"episode: 5335, score: 2.0, epsilon: 0.08\n",
"episode: 5340, score: 2.0, epsilon: 0.08\n",
"episode: 5345, score: 6.0, epsilon: 0.08\n",
"episode: 5350, score: 1.0, epsilon: 0.08\n",
"episode: 5355, score: 0.0, epsilon: 0.08\n",
"episode: 5360, score: 5.0, epsilon: 0.08\n",
"episode: 5365, score: 2.0, epsilon: 0.08\n",
"episode: 5370, score: 2.0, epsilon: 0.08\n",
"episode: 5375, score: 3.0, epsilon: 0.08\n",
"episode: 5380, score: 2.0, epsilon: 0.08\n",
"episode: 5385, score: 2.0, epsilon: 0.08\n",
"episode: 5390, score: 3.0, epsilon: 0.08\n",
"episode: 5395, score: 5.0, epsilon: 0.08\n",
"marking, episode: 5400, score: 1.0, mean_score: 2.39, std_score: 1.95\n",
"episode: 5400, score: 1.0, epsilon: 0.08\n",
"episode: 5405, score: 2.0, epsilon: 0.08\n",
"episode: 5410, score: 2.0, epsilon: 0.08\n",
"episode: 5415, score: 5.0, epsilon: 0.08\n",
"episode: 5420, score: 5.0, epsilon: 0.08\n",
"episode: 5425, score: 0.0, epsilon: 0.08\n",
"episode: 5430, score: 3.0, epsilon: 0.08\n",
"episode: 5435, score: 1.0, epsilon: 0.08\n",
"episode: 5440, score: 3.0, epsilon: 0.08\n",
"episode: 5445, score: 0.0, epsilon: 0.08\n",
"episode: 5450, score: 3.0, epsilon: 0.08\n",
"episode: 5455, score: 1.0, epsilon: 0.08\n",
"episode: 5460, score: 1.0, epsilon: 0.08\n",
"episode: 5465, score: 2.0, epsilon: 0.08\n",
"episode: 5470, score: 2.0, epsilon: 0.08\n",
"episode: 5475, score: 1.0, epsilon: 0.08\n",
"episode: 5480, score: 3.0, epsilon: 0.08\n",
"episode: 5485, score: 1.0, epsilon: 0.08\n",
"episode: 5490, score: 1.0, epsilon: 0.08\n",
"episode: 5495, score: 2.0, epsilon: 0.08\n",
"marking, episode: 5500, score: 0.0, mean_score: 2.23, std_score: 1.83\n",
"episode: 5500, score: 0.0, epsilon: 0.08\n",
"episode: 5505, score: 3.0, epsilon: 0.08\n",
"episode: 5510, score: 5.0, epsilon: 0.08\n",
"episode: 5515, score: 0.0, epsilon: 0.08\n",
"episode: 5520, score: 0.0, epsilon: 0.08\n",
"episode: 5525, score: 2.0, epsilon: 0.08\n",
"episode: 5530, score: 0.0, epsilon: 0.08\n",
"episode: 5535, score: 6.0, epsilon: 0.08\n",
"episode: 5540, score: 0.0, epsilon: 0.08\n",
"episode: 5545, score: 2.0, epsilon: 0.08\n",
"episode: 5550, score: 0.0, epsilon: 0.08\n",
"episode: 5555, score: 1.0, epsilon: 0.08\n",
"episode: 5560, score: 0.0, epsilon: 0.08\n",
"episode: 5565, score: 3.0, epsilon: 0.08\n",
"episode: 5570, score: 6.0, epsilon: 0.08\n",
"episode: 5575, score: 1.0, epsilon: 0.08\n",
"episode: 5580, score: 2.0, epsilon: 0.08\n",
"episode: 5585, score: 6.0, epsilon: 0.08\n",
"episode: 5590, score: 1.0, epsilon: 0.08\n",
"episode: 5595, score: 5.0, epsilon: 0.08\n",
"marking, episode: 5600, score: 0.0, mean_score: 1.88, std_score: 1.94\n",
"episode: 5600, score: 0.0, epsilon: 0.08\n",
"episode: 5605, score: 0.0, epsilon: 0.08\n",
"episode: 5610, score: 4.0, epsilon: 0.08\n",
"episode: 5615, score: 6.0, epsilon: 0.08\n",
"episode: 5620, score: 4.0, epsilon: 0.08\n",
"episode: 5625, score: 0.0, epsilon: 0.08\n",
"episode: 5630, score: 3.0, epsilon: 0.08\n",
"episode: 5635, score: 6.0, epsilon: 0.08\n",
"episode: 5640, score: 1.0, epsilon: 0.08\n",
"episode: 5645, score: 2.0, epsilon: 0.08\n",
"episode: 5650, score: 5.0, epsilon: 0.08\n",
"episode: 5655, score: 1.0, epsilon: 0.08\n",
"episode: 5660, score: 2.0, epsilon: 0.08\n",
"episode: 5665, score: 0.0, epsilon: 0.08\n",
"episode: 5670, score: 1.0, epsilon: 0.08\n",
"episode: 5675, score: 2.0, epsilon: 0.08\n",
"episode: 5680, score: 0.0, epsilon: 0.08\n",
"episode: 5685, score: 3.0, epsilon: 0.08\n",
"episode: 5690, score: 2.0, epsilon: 0.08\n",
"episode: 5695, score: 3.0, epsilon: 0.08\n",
"marking, episode: 5700, score: 4.0, mean_score: 1.90, std_score: 1.82\n",
"episode: 5700, score: 4.0, epsilon: 0.08\n",
"episode: 5705, score: 1.0, epsilon: 0.08\n",
"episode: 5710, score: 1.0, epsilon: 0.08\n",
"episode: 5715, score: 2.0, epsilon: 0.08\n",
"episode: 5720, score: 2.0, epsilon: 0.08\n",
"episode: 5725, score: 2.0, epsilon: 0.08\n",
"episode: 5730, score: 4.0, epsilon: 0.08\n",
"episode: 5735, score: 0.0, epsilon: 0.08\n",
"episode: 5740, score: 0.0, epsilon: 0.08\n",
"episode: 5745, score: 3.0, epsilon: 0.08\n",
"episode: 5750, score: 0.0, epsilon: 0.08\n",
"episode: 5755, score: 0.0, epsilon: 0.08\n",
"episode: 5760, score: 2.0, epsilon: 0.08\n",
"episode: 5765, score: 4.0, epsilon: 0.08\n",
"episode: 5770, score: 3.0, epsilon: 0.08\n",
"episode: 5775, score: 3.0, epsilon: 0.08\n",
"episode: 5780, score: 0.0, epsilon: 0.08\n",
"episode: 5785, score: 6.0, epsilon: 0.08\n",
"episode: 5790, score: 2.0, epsilon: 0.08\n",
"episode: 5795, score: 1.0, epsilon: 0.08\n",
"marking, episode: 5800, score: 1.0, mean_score: 2.20, std_score: 1.98\n",
"episode: 5800, score: 1.0, epsilon: 0.08\n",
"episode: 5805, score: 1.0, epsilon: 0.08\n",
"episode: 5810, score: 6.0, epsilon: 0.08\n",
"episode: 5815, score: 0.0, epsilon: 0.08\n",
"episode: 5820, score: 6.0, epsilon: 0.08\n",
"episode: 5825, score: 0.0, epsilon: 0.08\n",
"episode: 5830, score: 6.0, epsilon: 0.08\n",
"episode: 5835, score: 1.0, epsilon: 0.08\n",
"episode: 5840, score: 6.0, epsilon: 0.08\n",
"episode: 5845, score: 0.0, epsilon: 0.08\n",
"episode: 5850, score: 4.0, epsilon: 0.08\n",
"episode: 5855, score: 6.0, epsilon: 0.08\n",
"episode: 5860, score: 1.0, epsilon: 0.08\n",
"episode: 5865, score: 2.0, epsilon: 0.08\n",
"episode: 5870, score: 1.0, epsilon: 0.08\n",
"episode: 5875, score: 5.0, epsilon: 0.08\n",
"episode: 5880, score: 2.0, epsilon: 0.08\n",
"episode: 5885, score: 6.0, epsilon: 0.08\n",
"episode: 5890, score: 2.0, epsilon: 0.08\n",
"episode: 5895, score: 0.0, epsilon: 0.08\n",
"marking, episode: 5900, score: 1.0, mean_score: 2.44, std_score: 2.16\n",
"episode: 5900, score: 1.0, epsilon: 0.08\n",
"episode: 5905, score: 0.0, epsilon: 0.08\n",
"episode: 5910, score: 1.0, epsilon: 0.08\n",
"episode: 5915, score: 3.0, epsilon: 0.08\n",
"episode: 5920, score: 6.0, epsilon: 0.08\n",
"episode: 5925, score: 0.0, epsilon: 0.08\n",
"episode: 5930, score: 1.0, epsilon: 0.08\n",
"episode: 5935, score: 6.0, epsilon: 0.08\n",
"episode: 5940, score: 1.0, epsilon: 0.08\n",
"episode: 5945, score: 0.0, epsilon: 0.08\n",
"episode: 5950, score: 1.0, epsilon: 0.08\n",
"episode: 5955, score: 0.0, epsilon: 0.08\n",
"episode: 5960, score: 0.0, epsilon: 0.08\n",
"episode: 5965, score: 2.0, epsilon: 0.08\n",
"episode: 5970, score: 0.0, epsilon: 0.08\n",
"episode: 5975, score: 1.0, epsilon: 0.08\n",
"episode: 5980, score: 2.0, epsilon: 0.08\n",
"episode: 5985, score: 4.0, epsilon: 0.08\n",
"episode: 5990, score: 1.0, epsilon: 0.08\n",
"episode: 5995, score: 6.0, epsilon: 0.08\n",
"marking, episode: 6000, score: 1.0, mean_score: 1.94, std_score: 1.85\n",
"episode: 6000, score: 1.0, epsilon: 0.08\n",
"episode: 6005, score: 1.0, epsilon: 0.08\n",
"episode: 6010, score: 3.0, epsilon: 0.08\n",
"episode: 6015, score: 0.0, epsilon: 0.08\n",
"episode: 6020, score: 1.0, epsilon: 0.08\n",
"episode: 6025, score: 1.0, epsilon: 0.08\n",
"episode: 6030, score: 5.0, epsilon: 0.08\n",
"episode: 6035, score: 4.0, epsilon: 0.08\n",
"episode: 6040, score: 1.0, epsilon: 0.08\n",
"episode: 6045, score: 2.0, epsilon: 0.08\n",
"episode: 6050, score: 3.0, epsilon: 0.08\n",
"episode: 6055, score: 0.0, epsilon: 0.08\n",
"episode: 6060, score: 6.0, epsilon: 0.08\n",
"episode: 6065, score: 1.0, epsilon: 0.08\n",
"episode: 6070, score: 5.0, epsilon: 0.08\n",
"episode: 6075, score: 4.0, epsilon: 0.08\n",
"episode: 6080, score: 0.0, epsilon: 0.08\n",
"episode: 6085, score: 0.0, epsilon: 0.08\n",
"episode: 6090, score: 2.0, epsilon: 0.08\n",
"episode: 6095, score: 6.0, epsilon: 0.08\n",
"marking, episode: 6100, score: 0.0, mean_score: 2.40, std_score: 2.12\n",
"episode: 6100, score: 0.0, epsilon: 0.08\n",
"episode: 6105, score: 1.0, epsilon: 0.08\n",
"episode: 6110, score: 6.0, epsilon: 0.08\n",
"episode: 6115, score: 4.0, epsilon: 0.08\n",
"episode: 6120, score: 1.0, epsilon: 0.08\n",
"episode: 6125, score: 2.0, epsilon: 0.08\n",
"episode: 6130, score: 2.0, epsilon: 0.08\n",
"episode: 6135, score: 0.0, epsilon: 0.08\n",
"episode: 6140, score: 1.0, epsilon: 0.08\n",
"episode: 6145, score: 2.0, epsilon: 0.08\n",
"episode: 6150, score: 2.0, epsilon: 0.08\n",
"episode: 6155, score: 2.0, epsilon: 0.08\n",
"episode: 6160, score: 4.0, epsilon: 0.08\n",
"episode: 6165, score: 0.0, epsilon: 0.08\n",
"episode: 6170, score: 0.0, epsilon: 0.08\n",
"episode: 6175, score: 6.0, epsilon: 0.08\n",
"episode: 6180, score: 1.0, epsilon: 0.08\n",
"episode: 6185, score: 1.0, epsilon: 0.08\n",
"episode: 6190, score: 6.0, epsilon: 0.08\n",
"episode: 6195, score: 2.0, epsilon: 0.08\n",
"marking, episode: 6200, score: 1.0, mean_score: 2.15, std_score: 2.01\n",
"episode: 6200, score: 1.0, epsilon: 0.08\n",
"episode: 6205, score: 0.0, epsilon: 0.08\n",
"episode: 6210, score: 0.0, epsilon: 0.08\n",
"episode: 6215, score: 0.0, epsilon: 0.08\n",
"episode: 6220, score: 6.0, epsilon: 0.08\n",
"episode: 6225, score: 6.0, epsilon: 0.08\n",
"episode: 6230, score: 0.0, epsilon: 0.08\n",
"episode: 6235, score: 4.0, epsilon: 0.08\n",
"episode: 6240, score: 0.0, epsilon: 0.08\n",
"episode: 6245, score: 6.0, epsilon: 0.08\n",
"episode: 6250, score: 3.0, epsilon: 0.08\n",
"episode: 6255, score: 6.0, epsilon: 0.08\n",
"episode: 6260, score: 4.0, epsilon: 0.08\n",
"episode: 6265, score: 1.0, epsilon: 0.08\n",
"episode: 6270, score: 1.0, epsilon: 0.08\n",
"episode: 6275, score: 6.0, epsilon: 0.08\n",
"episode: 6280, score: 1.0, epsilon: 0.08\n",
"episode: 6285, score: 2.0, epsilon: 0.08\n",
"episode: 6290, score: 3.0, epsilon: 0.08\n",
"episode: 6295, score: 2.0, epsilon: 0.08\n",
"marking, episode: 6300, score: 2.0, mean_score: 2.37, std_score: 2.12\n",
"episode: 6300, score: 2.0, epsilon: 0.08\n",
"episode: 6305, score: 1.0, epsilon: 0.08\n",
"episode: 6310, score: 0.0, epsilon: 0.08\n",
"episode: 6315, score: 3.0, epsilon: 0.08\n",
"episode: 6320, score: 3.0, epsilon: 0.08\n",
"episode: 6325, score: 0.0, epsilon: 0.08\n",
"episode: 6330, score: 6.0, epsilon: 0.08\n",
"episode: 6335, score: 4.0, epsilon: 0.08\n",
"episode: 6340, score: 0.0, epsilon: 0.08\n",
"episode: 6345, score: 6.0, epsilon: 0.08\n",
"episode: 6350, score: 1.0, epsilon: 0.08\n",
"episode: 6355, score: 1.0, epsilon: 0.08\n",
"episode: 6360, score: 6.0, epsilon: 0.08\n",
"episode: 6365, score: 0.0, epsilon: 0.08\n",
"episode: 6370, score: 4.0, epsilon: 0.08\n",
"episode: 6375, score: 0.0, epsilon: 0.08\n",
"episode: 6380, score: 0.0, epsilon: 0.08\n",
"episode: 6385, score: 1.0, epsilon: 0.08\n",
"episode: 6390, score: 0.0, epsilon: 0.08\n",
"episode: 6395, score: 3.0, epsilon: 0.08\n",
"marking, episode: 6400, score: 6.0, mean_score: 2.23, std_score: 2.06\n",
"episode: 6400, score: 6.0, epsilon: 0.08\n",
"episode: 6405, score: 1.0, epsilon: 0.08\n",
"episode: 6410, score: 1.0, epsilon: 0.08\n",
"episode: 6415, score: 0.0, epsilon: 0.08\n",
"episode: 6420, score: 6.0, epsilon: 0.08\n",
"episode: 6425, score: 0.0, epsilon: 0.08\n",
"episode: 6430, score: 3.0, epsilon: 0.08\n",
"episode: 6435, score: 4.0, epsilon: 0.08\n",
"episode: 6440, score: 5.0, epsilon: 0.08\n",
"episode: 6445, score: 6.0, epsilon: 0.08\n",
"episode: 6450, score: 6.0, epsilon: 0.08\n",
"episode: 6455, score: 2.0, epsilon: 0.08\n",
"episode: 6460, score: 0.0, epsilon: 0.08\n",
"episode: 6465, score: 0.0, epsilon: 0.08\n",
"episode: 6470, score: 2.0, epsilon: 0.08\n",
"episode: 6475, score: 4.0, epsilon: 0.08\n",
"episode: 6480, score: 3.0, epsilon: 0.08\n",
"episode: 6485, score: 3.0, epsilon: 0.08\n",
"episode: 6490, score: 4.0, epsilon: 0.08\n",
"episode: 6495, score: 1.0, epsilon: 0.08\n",
"marking, episode: 6500, score: 0.0, mean_score: 2.43, std_score: 2.08\n",
"episode: 6500, score: 0.0, epsilon: 0.08\n",
"episode: 6505, score: 6.0, epsilon: 0.08\n",
"episode: 6510, score: 0.0, epsilon: 0.08\n",
"episode: 6515, score: 1.0, epsilon: 0.08\n",
"episode: 6520, score: 1.0, epsilon: 0.08\n",
"episode: 6525, score: 2.0, epsilon: 0.08\n",
"episode: 6530, score: 1.0, epsilon: 0.08\n",
"episode: 6535, score: 4.0, epsilon: 0.08\n",
"episode: 6540, score: 0.0, epsilon: 0.08\n",
"episode: 6545, score: 2.0, epsilon: 0.08\n",
"episode: 6550, score: 4.0, epsilon: 0.08\n",
"episode: 6555, score: 2.0, epsilon: 0.08\n",
"episode: 6560, score: 5.0, epsilon: 0.08\n",
"episode: 6565, score: 5.0, epsilon: 0.08\n",
"episode: 6570, score: 1.0, epsilon: 0.08\n",
"episode: 6575, score: 0.0, epsilon: 0.08\n",
"episode: 6580, score: 0.0, epsilon: 0.08\n",
"episode: 6585, score: 1.0, epsilon: 0.08\n",
"episode: 6590, score: 2.0, epsilon: 0.08\n",
"episode: 6595, score: 2.0, epsilon: 0.08\n",
"marking, episode: 6600, score: 4.0, mean_score: 2.34, std_score: 2.13\n",
"episode: 6600, score: 4.0, epsilon: 0.08\n",
"episode: 6605, score: 1.0, epsilon: 0.08\n",
"episode: 6610, score: 2.0, epsilon: 0.08\n",
"episode: 6615, score: 0.0, epsilon: 0.08\n",
"episode: 6620, score: 3.0, epsilon: 0.08\n",
"episode: 6625, score: 0.0, epsilon: 0.08\n",
"episode: 6630, score: 2.0, epsilon: 0.08\n",
"episode: 6635, score: 0.0, epsilon: 0.08\n",
"episode: 6640, score: 1.0, epsilon: 0.08\n",
"episode: 6645, score: 6.0, epsilon: 0.08\n",
"episode: 6650, score: 2.0, epsilon: 0.08\n",
"episode: 6655, score: 2.0, epsilon: 0.08\n",
"episode: 6660, score: 0.0, epsilon: 0.08\n",
"episode: 6665, score: 6.0, epsilon: 0.08\n",
"episode: 6670, score: 4.0, epsilon: 0.08\n",
"episode: 6675, score: 0.0, epsilon: 0.08\n",
"episode: 6680, score: 0.0, epsilon: 0.08\n",
"episode: 6685, score: 3.0, epsilon: 0.08\n",
"episode: 6690, score: 0.0, epsilon: 0.08\n",
"episode: 6695, score: 0.0, epsilon: 0.08\n",
"marking, episode: 6700, score: 6.0, mean_score: 2.18, std_score: 2.03\n",
"episode: 6700, score: 6.0, epsilon: 0.08\n",
"episode: 6705, score: 0.0, epsilon: 0.08\n",
"episode: 6710, score: 3.0, epsilon: 0.08\n",
"episode: 6715, score: 3.0, epsilon: 0.08\n",
"episode: 6720, score: 6.0, epsilon: 0.08\n",
"episode: 6725, score: 3.0, epsilon: 0.08\n",
"episode: 6730, score: 0.0, epsilon: 0.08\n",
"episode: 6735, score: 0.0, epsilon: 0.08\n",
"episode: 6740, score: 4.0, epsilon: 0.08\n",
"episode: 6745, score: 0.0, epsilon: 0.08\n",
"episode: 6750, score: 5.0, epsilon: 0.08\n",
"episode: 6755, score: 2.0, epsilon: 0.08\n",
"episode: 6760, score: 1.0, epsilon: 0.08\n",
"episode: 6765, score: 0.0, epsilon: 0.08\n",
"episode: 6770, score: 6.0, epsilon: 0.08\n",
"episode: 6775, score: 1.0, epsilon: 0.08\n",
"episode: 6780, score: 3.0, epsilon: 0.08\n",
"episode: 6785, score: 0.0, epsilon: 0.08\n",
"episode: 6790, score: 6.0, epsilon: 0.08\n",
"episode: 6795, score: 0.0, epsilon: 0.08\n",
"marking, episode: 6800, score: 0.0, mean_score: 2.70, std_score: 2.16\n",
"episode: 6800, score: 0.0, epsilon: 0.08\n",
"episode: 6805, score: 1.0, epsilon: 0.08\n",
"episode: 6810, score: 2.0, epsilon: 0.08\n",
"episode: 6815, score: 0.0, epsilon: 0.08\n",
"episode: 6820, score: 0.0, epsilon: 0.08\n",
"episode: 6825, score: 0.0, epsilon: 0.08\n",
"episode: 6830, score: 0.0, epsilon: 0.08\n",
"episode: 6835, score: 0.0, epsilon: 0.08\n",
"episode: 6840, score: 3.0, epsilon: 0.08\n",
"episode: 6845, score: 4.0, epsilon: 0.08\n",
"episode: 6850, score: 2.0, epsilon: 0.08\n",
"episode: 6855, score: 1.0, epsilon: 0.08\n",
"episode: 6860, score: 0.0, epsilon: 0.08\n",
"episode: 6865, score: 1.0, epsilon: 0.08\n",
"episode: 6870, score: 1.0, epsilon: 0.08\n",
"episode: 6875, score: 2.0, epsilon: 0.08\n",
"episode: 6880, score: 6.0, epsilon: 0.08\n",
"episode: 6885, score: 6.0, epsilon: 0.08\n",
"episode: 6890, score: 6.0, epsilon: 0.08\n",
"episode: 6895, score: 1.0, epsilon: 0.08\n",
"marking, episode: 6900, score: 0.0, mean_score: 2.40, std_score: 2.19\n",
"episode: 6900, score: 0.0, epsilon: 0.08\n",
"episode: 6905, score: 1.0, epsilon: 0.08\n",
"episode: 6910, score: 1.0, epsilon: 0.08\n",
"episode: 6915, score: 1.0, epsilon: 0.08\n",
"episode: 6920, score: 4.0, epsilon: 0.08\n",
"episode: 6925, score: 3.0, epsilon: 0.08\n",
"episode: 6930, score: 0.0, epsilon: 0.08\n",
"episode: 6935, score: 3.0, epsilon: 0.08\n",
"episode: 6940, score: 6.0, epsilon: 0.08\n",
"episode: 6945, score: 6.0, epsilon: 0.08\n",
"episode: 6950, score: 6.0, epsilon: 0.08\n",
"episode: 6955, score: 0.0, epsilon: 0.08\n",
"episode: 6960, score: 2.0, epsilon: 0.08\n",
"episode: 6965, score: 0.0, epsilon: 0.08\n",
"episode: 6970, score: 6.0, epsilon: 0.08\n",
"episode: 6975, score: 4.0, epsilon: 0.08\n",
"episode: 6980, score: 3.0, epsilon: 0.08\n",
"episode: 6985, score: 0.0, epsilon: 0.08\n",
"episode: 6990, score: 1.0, epsilon: 0.08\n",
"episode: 6995, score: 0.0, epsilon: 0.08\n",
"marking, episode: 7000, score: 6.0, mean_score: 2.42, std_score: 2.09\n",
"episode: 7000, score: 6.0, epsilon: 0.08\n",
"episode: 7005, score: 1.0, epsilon: 0.08\n",
"episode: 7010, score: 3.0, epsilon: 0.08\n",
"episode: 7015, score: 4.0, epsilon: 0.08\n",
"episode: 7020, score: 0.0, epsilon: 0.08\n",
"episode: 7025, score: 2.0, epsilon: 0.08\n",
"episode: 7030, score: 0.0, epsilon: 0.08\n",
"episode: 7035, score: 3.0, epsilon: 0.08\n",
"episode: 7040, score: 2.0, epsilon: 0.08\n",
"episode: 7045, score: 2.0, epsilon: 0.08\n",
"episode: 7050, score: 3.0, epsilon: 0.08\n",
"episode: 7055, score: 5.0, epsilon: 0.08\n",
"episode: 7060, score: 1.0, epsilon: 0.08\n",
"episode: 7065, score: 0.0, epsilon: 0.08\n",
"episode: 7070, score: 1.0, epsilon: 0.08\n",
"episode: 7075, score: 3.0, epsilon: 0.08\n",
"episode: 7080, score: 0.0, epsilon: 0.08\n",
"episode: 7085, score: 6.0, epsilon: 0.08\n",
"episode: 7090, score: 0.0, epsilon: 0.08\n",
"episode: 7095, score: 1.0, epsilon: 0.08\n",
"marking, episode: 7100, score: 6.0, mean_score: 2.52, std_score: 2.03\n",
"episode: 7100, score: 6.0, epsilon: 0.08\n",
"episode: 7105, score: 0.0, epsilon: 0.08\n",
"episode: 7110, score: 6.0, epsilon: 0.08\n",
"episode: 7115, score: 1.0, epsilon: 0.08\n",
"episode: 7120, score: 0.0, epsilon: 0.08\n",
"episode: 7125, score: 2.0, epsilon: 0.08\n",
"episode: 7130, score: 0.0, epsilon: 0.08\n",
"episode: 7135, score: 1.0, epsilon: 0.08\n",
"episode: 7140, score: 0.0, epsilon: 0.08\n",
"episode: 7145, score: 5.0, epsilon: 0.08\n",
"episode: 7150, score: 6.0, epsilon: 0.08\n",
"episode: 7155, score: 6.0, epsilon: 0.08\n",
"episode: 7160, score: 0.0, epsilon: 0.08\n",
"episode: 7165, score: 6.0, epsilon: 0.08\n",
"episode: 7170, score: 6.0, epsilon: 0.08\n",
"episode: 7175, score: 0.0, epsilon: 0.08\n",
"episode: 7180, score: 5.0, epsilon: 0.08\n",
"episode: 7185, score: 6.0, epsilon: 0.08\n",
"episode: 7190, score: 1.0, epsilon: 0.08\n",
"episode: 7195, score: 1.0, epsilon: 0.08\n",
"marking, episode: 7200, score: 0.0, mean_score: 2.45, std_score: 2.21\n",
"episode: 7200, score: 0.0, epsilon: 0.08\n",
"episode: 7205, score: 6.0, epsilon: 0.08\n",
"episode: 7210, score: 1.0, epsilon: 0.08\n",
"episode: 7215, score: 1.0, epsilon: 0.08\n",
"episode: 7220, score: 1.0, epsilon: 0.08\n",
"episode: 7225, score: 1.0, epsilon: 0.08\n",
"episode: 7230, score: 1.0, epsilon: 0.08\n",
"episode: 7235, score: 6.0, epsilon: 0.08\n",
"episode: 7240, score: 0.0, epsilon: 0.08\n",
"episode: 7245, score: 0.0, epsilon: 0.08\n",
"episode: 7250, score: 3.0, epsilon: 0.08\n",
"episode: 7255, score: 1.0, epsilon: 0.08\n",
"episode: 7260, score: 1.0, epsilon: 0.08\n",
"episode: 7265, score: 0.0, epsilon: 0.08\n",
"episode: 7270, score: 3.0, epsilon: 0.08\n",
"episode: 7275, score: 0.0, epsilon: 0.08\n",
"episode: 7280, score: 1.0, epsilon: 0.08\n",
"episode: 7285, score: 6.0, epsilon: 0.08\n",
"episode: 7290, score: 1.0, epsilon: 0.08\n",
"episode: 7295, score: 6.0, epsilon: 0.08\n",
"marking, episode: 7300, score: 1.0, mean_score: 2.20, std_score: 2.11\n",
"episode: 7300, score: 1.0, epsilon: 0.08\n",
"episode: 7305, score: 2.0, epsilon: 0.08\n",
"episode: 7310, score: 1.0, epsilon: 0.08\n",
"episode: 7315, score: 2.0, epsilon: 0.08\n",
"episode: 7320, score: 0.0, epsilon: 0.08\n",
"episode: 7325, score: 5.0, epsilon: 0.08\n",
"episode: 7330, score: 6.0, epsilon: 0.08\n",
"episode: 7335, score: 2.0, epsilon: 0.08\n",
"episode: 7340, score: 0.0, epsilon: 0.08\n",
"episode: 7345, score: 0.0, epsilon: 0.08\n",
"episode: 7350, score: 1.0, epsilon: 0.08\n",
"episode: 7355, score: 0.0, epsilon: 0.08\n",
"episode: 7360, score: 0.0, epsilon: 0.08\n",
"episode: 7365, score: 1.0, epsilon: 0.08\n",
"episode: 7370, score: 6.0, epsilon: 0.08\n",
"episode: 7375, score: 0.0, epsilon: 0.08\n",
"episode: 7380, score: 3.0, epsilon: 0.08\n",
"episode: 7385, score: 0.0, epsilon: 0.08\n",
"episode: 7390, score: 1.0, epsilon: 0.08\n",
"episode: 7395, score: 3.0, epsilon: 0.08\n",
"marking, episode: 7400, score: 6.0, mean_score: 1.91, std_score: 1.93\n",
"episode: 7400, score: 6.0, epsilon: 0.08\n",
"episode: 7405, score: 0.0, epsilon: 0.08\n",
"episode: 7410, score: 4.0, epsilon: 0.08\n",
"episode: 7415, score: 2.0, epsilon: 0.08\n",
"episode: 7420, score: 0.0, epsilon: 0.08\n",
"episode: 7425, score: 4.0, epsilon: 0.08\n",
"episode: 7430, score: 0.0, epsilon: 0.08\n",
"episode: 7435, score: 0.0, epsilon: 0.08\n",
"episode: 7440, score: 6.0, epsilon: 0.08\n",
"episode: 7445, score: 2.0, epsilon: 0.08\n",
"episode: 7450, score: 5.0, epsilon: 0.08\n",
"episode: 7455, score: 2.0, epsilon: 0.08\n",
"episode: 7460, score: 6.0, epsilon: 0.08\n",
"episode: 7465, score: 6.0, epsilon: 0.08\n",
"episode: 7470, score: 5.0, epsilon: 0.08\n",
"episode: 7475, score: 6.0, epsilon: 0.08\n",
"episode: 7480, score: 6.0, epsilon: 0.08\n",
"episode: 7485, score: 0.0, epsilon: 0.08\n",
"episode: 7490, score: 0.0, epsilon: 0.08\n",
"episode: 7495, score: 1.0, epsilon: 0.08\n",
"marking, episode: 7500, score: 2.0, mean_score: 2.49, std_score: 2.21\n",
"episode: 7500, score: 2.0, epsilon: 0.08\n",
"episode: 7505, score: 6.0, epsilon: 0.08\n",
"episode: 7510, score: 3.0, epsilon: 0.08\n",
"episode: 7515, score: 5.0, epsilon: 0.08\n",
"episode: 7520, score: 6.0, epsilon: 0.08\n",
"episode: 7525, score: 0.0, epsilon: 0.08\n",
"episode: 7530, score: 2.0, epsilon: 0.08\n",
"episode: 7535, score: 2.0, epsilon: 0.08\n",
"episode: 7540, score: 2.0, epsilon: 0.08\n",
"episode: 7545, score: 0.0, epsilon: 0.08\n",
"episode: 7550, score: 5.0, epsilon: 0.08\n",
"episode: 7555, score: 3.0, epsilon: 0.08\n",
"episode: 7560, score: 6.0, epsilon: 0.08\n",
"episode: 7565, score: 0.0, epsilon: 0.08\n",
"episode: 7570, score: 1.0, epsilon: 0.08\n",
"episode: 7575, score: 4.0, epsilon: 0.08\n",
"episode: 7580, score: 1.0, epsilon: 0.08\n",
"episode: 7585, score: 0.0, epsilon: 0.08\n",
"episode: 7590, score: 2.0, epsilon: 0.08\n",
"episode: 7595, score: 3.0, epsilon: 0.08\n",
"marking, episode: 7600, score: 2.0, mean_score: 2.29, std_score: 1.98\n",
"episode: 7600, score: 2.0, epsilon: 0.08\n",
"episode: 7605, score: 1.0, epsilon: 0.08\n",
"episode: 7610, score: 1.0, epsilon: 0.08\n",
"episode: 7615, score: 1.0, epsilon: 0.08\n",
"episode: 7620, score: 0.0, epsilon: 0.08\n",
"episode: 7625, score: 1.0, epsilon: 0.08\n",
"episode: 7630, score: 0.0, epsilon: 0.08\n",
"episode: 7635, score: 0.0, epsilon: 0.08\n",
"episode: 7640, score: 4.0, epsilon: 0.08\n",
"episode: 7645, score: 1.0, epsilon: 0.08\n",
"episode: 7650, score: 2.0, epsilon: 0.08\n",
"episode: 7655, score: 2.0, epsilon: 0.08\n",
"episode: 7660, score: 3.0, epsilon: 0.08\n",
"episode: 7665, score: 1.0, epsilon: 0.08\n",
"episode: 7670, score: 6.0, epsilon: 0.08\n",
"episode: 7675, score: 6.0, epsilon: 0.08\n",
"episode: 7680, score: 0.0, epsilon: 0.08\n",
"episode: 7685, score: 1.0, epsilon: 0.08\n",
"episode: 7690, score: 0.0, epsilon: 0.08\n",
"episode: 7695, score: 0.0, epsilon: 0.08\n",
"marking, episode: 7700, score: 1.0, mean_score: 2.47, std_score: 2.01\n",
"episode: 7700, score: 1.0, epsilon: 0.08\n",
"episode: 7705, score: 4.0, epsilon: 0.08\n",
"episode: 7710, score: 3.0, epsilon: 0.08\n",
"episode: 7715, score: 3.0, epsilon: 0.08\n",
"episode: 7720, score: 1.0, epsilon: 0.08\n",
"episode: 7725, score: 0.0, epsilon: 0.08\n",
"episode: 7730, score: 3.0, epsilon: 0.08\n",
"episode: 7735, score: 0.0, epsilon: 0.08\n",
"episode: 7740, score: 0.0, epsilon: 0.08\n",
"episode: 7745, score: 2.0, epsilon: 0.08\n",
"episode: 7750, score: 1.0, epsilon: 0.08\n",
"episode: 7755, score: 0.0, epsilon: 0.08\n",
"episode: 7760, score: 3.0, epsilon: 0.08\n",
"episode: 7765, score: 6.0, epsilon: 0.08\n",
"episode: 7770, score: 1.0, epsilon: 0.08\n",
"episode: 7775, score: 2.0, epsilon: 0.08\n",
"episode: 7780, score: 2.0, epsilon: 0.08\n",
"episode: 7785, score: 3.0, epsilon: 0.08\n",
"episode: 7790, score: 4.0, epsilon: 0.08\n",
"episode: 7795, score: 0.0, epsilon: 0.08\n",
"marking, episode: 7800, score: 3.0, mean_score: 2.25, std_score: 1.88\n",
"episode: 7800, score: 3.0, epsilon: 0.08\n",
"episode: 7805, score: 1.0, epsilon: 0.08\n",
"episode: 7810, score: 1.0, epsilon: 0.08\n",
"episode: 7815, score: 2.0, epsilon: 0.08\n",
"episode: 7820, score: 3.0, epsilon: 0.08\n",
"episode: 7825, score: 2.0, epsilon: 0.08\n",
"episode: 7830, score: 1.0, epsilon: 0.08\n",
"episode: 7835, score: 1.0, epsilon: 0.08\n",
"episode: 7840, score: 1.0, epsilon: 0.08\n",
"episode: 7845, score: 0.0, epsilon: 0.08\n",
"episode: 7850, score: 0.0, epsilon: 0.08\n",
"episode: 7855, score: 2.0, epsilon: 0.08\n",
"episode: 7860, score: 0.0, epsilon: 0.08\n",
"episode: 7865, score: 2.0, epsilon: 0.08\n",
"episode: 7870, score: 2.0, epsilon: 0.08\n",
"episode: 7875, score: 6.0, epsilon: 0.08\n",
"episode: 7880, score: 3.0, epsilon: 0.08\n",
"episode: 7885, score: 6.0, epsilon: 0.08\n",
"episode: 7890, score: 4.0, epsilon: 0.08\n",
"episode: 7895, score: 0.0, epsilon: 0.08\n",
"marking, episode: 7900, score: 0.0, mean_score: 2.13, std_score: 1.96\n",
"episode: 7900, score: 0.0, epsilon: 0.08\n",
"episode: 7905, score: 1.0, epsilon: 0.08\n",
"episode: 7910, score: 5.0, epsilon: 0.08\n",
"episode: 7915, score: 1.0, epsilon: 0.08\n",
"episode: 7920, score: 0.0, epsilon: 0.08\n",
"episode: 7925, score: 0.0, epsilon: 0.08\n",
"episode: 7930, score: 4.0, epsilon: 0.08\n",
"episode: 7935, score: 1.0, epsilon: 0.08\n",
"episode: 7940, score: 3.0, epsilon: 0.08\n",
"episode: 7945, score: 3.0, epsilon: 0.08\n",
"episode: 7950, score: 6.0, epsilon: 0.08\n",
"episode: 7955, score: 3.0, epsilon: 0.08\n",
"episode: 7960, score: 0.0, epsilon: 0.08\n",
"episode: 7965, score: 2.0, epsilon: 0.08\n",
"episode: 7970, score: 2.0, epsilon: 0.08\n",
"episode: 7975, score: 6.0, epsilon: 0.08\n",
"episode: 7980, score: 6.0, epsilon: 0.08\n",
"episode: 7985, score: 6.0, epsilon: 0.08\n",
"episode: 7990, score: 3.0, epsilon: 0.08\n",
"episode: 7995, score: 2.0, epsilon: 0.08\n",
"marking, episode: 8000, score: 0.0, mean_score: 2.37, std_score: 2.02\n",
"episode: 8000, score: 0.0, epsilon: 0.08\n",
"episode: 8005, score: 3.0, epsilon: 0.08\n",
"episode: 8010, score: 0.0, epsilon: 0.08\n",
"episode: 8015, score: 0.0, epsilon: 0.08\n",
"episode: 8020, score: 3.0, epsilon: 0.08\n",
"episode: 8025, score: 5.0, epsilon: 0.08\n",
"episode: 8030, score: 6.0, epsilon: 0.08\n",
"episode: 8035, score: 1.0, epsilon: 0.08\n",
"episode: 8040, score: 5.0, epsilon: 0.08\n",
"episode: 8045, score: 5.0, epsilon: 0.08\n",
"episode: 8050, score: 0.0, epsilon: 0.08\n",
"episode: 8055, score: 4.0, epsilon: 0.08\n",
"episode: 8060, score: 3.0, epsilon: 0.08\n",
"episode: 8065, score: 5.0, epsilon: 0.08\n",
"episode: 8070, score: 0.0, epsilon: 0.08\n",
"episode: 8075, score: 0.0, epsilon: 0.08\n",
"episode: 8080, score: 3.0, epsilon: 0.08\n",
"episode: 8085, score: 6.0, epsilon: 0.08\n",
"episode: 8090, score: 1.0, epsilon: 0.08\n",
"episode: 8095, score: 1.0, epsilon: 0.08\n",
"marking, episode: 8100, score: 1.0, mean_score: 2.16, std_score: 2.00\n",
"episode: 8100, score: 1.0, epsilon: 0.08\n",
"episode: 8105, score: 0.0, epsilon: 0.08\n",
"episode: 8110, score: 0.0, epsilon: 0.08\n",
"episode: 8115, score: 2.0, epsilon: 0.08\n",
"episode: 8120, score: 1.0, epsilon: 0.08\n",
"episode: 8125, score: 1.0, epsilon: 0.08\n",
"episode: 8130, score: 1.0, epsilon: 0.08\n",
"episode: 8135, score: 0.0, epsilon: 0.08\n",
"episode: 8140, score: 3.0, epsilon: 0.08\n",
"episode: 8145, score: 1.0, epsilon: 0.08\n",
"episode: 8150, score: 3.0, epsilon: 0.08\n",
"episode: 8155, score: 0.0, epsilon: 0.08\n",
"episode: 8160, score: 0.0, epsilon: 0.08\n",
"episode: 8165, score: 2.0, epsilon: 0.08\n",
"episode: 8170, score: 1.0, epsilon: 0.08\n",
"episode: 8175, score: 2.0, epsilon: 0.08\n",
"episode: 8180, score: 2.0, epsilon: 0.08\n",
"episode: 8185, score: 0.0, epsilon: 0.08\n",
"episode: 8190, score: 1.0, epsilon: 0.08\n",
"episode: 8195, score: 0.0, epsilon: 0.08\n",
"marking, episode: 8200, score: 0.0, mean_score: 2.14, std_score: 1.90\n",
"episode: 8200, score: 0.0, epsilon: 0.08\n",
"episode: 8205, score: 3.0, epsilon: 0.08\n",
"episode: 8210, score: 1.0, epsilon: 0.08\n",
"episode: 8215, score: 0.0, epsilon: 0.08\n",
"episode: 8220, score: 6.0, epsilon: 0.08\n",
"episode: 8225, score: 6.0, epsilon: 0.08\n",
"episode: 8230, score: 2.0, epsilon: 0.08\n",
"episode: 8235, score: 6.0, epsilon: 0.08\n",
"episode: 8240, score: 3.0, epsilon: 0.08\n",
"episode: 8245, score: 1.0, epsilon: 0.08\n",
"episode: 8250, score: 0.0, epsilon: 0.08\n",
"episode: 8255, score: 0.0, epsilon: 0.08\n",
"episode: 8260, score: 5.0, epsilon: 0.08\n",
"episode: 8265, score: 0.0, epsilon: 0.08\n",
"episode: 8270, score: 2.0, epsilon: 0.08\n",
"episode: 8275, score: 4.0, epsilon: 0.08\n",
"episode: 8280, score: 0.0, epsilon: 0.08\n",
"episode: 8285, score: 0.0, epsilon: 0.08\n",
"episode: 8290, score: 2.0, epsilon: 0.08\n",
"episode: 8295, score: 5.0, epsilon: 0.08\n",
"marking, episode: 8300, score: 0.0, mean_score: 2.32, std_score: 2.12\n",
"episode: 8300, score: 0.0, epsilon: 0.08\n",
"episode: 8305, score: 6.0, epsilon: 0.08\n",
"episode: 8310, score: 1.0, epsilon: 0.08\n",
"episode: 8315, score: 3.0, epsilon: 0.08\n",
"episode: 8320, score: 0.0, epsilon: 0.08\n",
"episode: 8325, score: 0.0, epsilon: 0.08\n",
"episode: 8330, score: 2.0, epsilon: 0.08\n",
"episode: 8335, score: 6.0, epsilon: 0.08\n",
"episode: 8340, score: 0.0, epsilon: 0.08\n",
"episode: 8345, score: 0.0, epsilon: 0.08\n",
"episode: 8350, score: 2.0, epsilon: 0.08\n",
"episode: 8355, score: 0.0, epsilon: 0.08\n",
"episode: 8360, score: 1.0, epsilon: 0.08\n",
"episode: 8365, score: 6.0, epsilon: 0.08\n",
"episode: 8370, score: 2.0, epsilon: 0.08\n",
"episode: 8375, score: 3.0, epsilon: 0.08\n",
"episode: 8380, score: 5.0, epsilon: 0.08\n",
"episode: 8385, score: 3.0, epsilon: 0.08\n",
"episode: 8390, score: 0.0, epsilon: 0.08\n",
"episode: 8395, score: 1.0, epsilon: 0.08\n",
"marking, episode: 8400, score: 1.0, mean_score: 1.97, std_score: 1.92\n",
"episode: 8400, score: 1.0, epsilon: 0.08\n",
"episode: 8405, score: 0.0, epsilon: 0.08\n",
"episode: 8410, score: 5.0, epsilon: 0.08\n",
"episode: 8415, score: 5.0, epsilon: 0.08\n",
"episode: 8420, score: 0.0, epsilon: 0.08\n",
"episode: 8425, score: 0.0, epsilon: 0.08\n",
"episode: 8430, score: 2.0, epsilon: 0.08\n",
"episode: 8435, score: 1.0, epsilon: 0.08\n",
"episode: 8440, score: 0.0, epsilon: 0.08\n",
"episode: 8445, score: 2.0, epsilon: 0.08\n",
"episode: 8450, score: 4.0, epsilon: 0.08\n",
"episode: 8455, score: 3.0, epsilon: 0.08\n",
"episode: 8460, score: 0.0, epsilon: 0.08\n",
"episode: 8465, score: 3.0, epsilon: 0.08\n",
"episode: 8470, score: 2.0, epsilon: 0.08\n",
"episode: 8475, score: 1.0, epsilon: 0.08\n",
"episode: 8480, score: 0.0, epsilon: 0.08\n",
"episode: 8485, score: 1.0, epsilon: 0.08\n",
"episode: 8490, score: 1.0, epsilon: 0.08\n",
"episode: 8495, score: 3.0, epsilon: 0.08\n",
"marking, episode: 8500, score: 0.0, mean_score: 2.28, std_score: 2.08\n",
"episode: 8500, score: 0.0, epsilon: 0.08\n",
"episode: 8505, score: 2.0, epsilon: 0.08\n",
"episode: 8510, score: 1.0, epsilon: 0.08\n",
"episode: 8515, score: 0.0, epsilon: 0.08\n",
"episode: 8520, score: 6.0, epsilon: 0.08\n",
"episode: 8525, score: 2.0, epsilon: 0.08\n",
"episode: 8530, score: 2.0, epsilon: 0.08\n",
"episode: 8535, score: 2.0, epsilon: 0.08\n",
"episode: 8540, score: 0.0, epsilon: 0.08\n",
"episode: 8545, score: 0.0, epsilon: 0.08\n",
"episode: 8550, score: 1.0, epsilon: 0.08\n",
"episode: 8555, score: 3.0, epsilon: 0.08\n",
"episode: 8560, score: 0.0, epsilon: 0.08\n",
"episode: 8565, score: 1.0, epsilon: 0.08\n",
"episode: 8570, score: 0.0, epsilon: 0.08\n",
"episode: 8575, score: 3.0, epsilon: 0.08\n",
"episode: 8580, score: 6.0, epsilon: 0.08\n",
"episode: 8585, score: 3.0, epsilon: 0.08\n",
"episode: 8590, score: 1.0, epsilon: 0.08\n",
"episode: 8595, score: 6.0, epsilon: 0.08\n",
"marking, episode: 8600, score: 1.0, mean_score: 2.30, std_score: 2.17\n",
"episode: 8600, score: 1.0, epsilon: 0.08\n",
"episode: 8605, score: 2.0, epsilon: 0.08\n",
"episode: 8610, score: 1.0, epsilon: 0.08\n",
"episode: 8615, score: 1.0, epsilon: 0.08\n",
"episode: 8620, score: 0.0, epsilon: 0.08\n",
"episode: 8625, score: 0.0, epsilon: 0.08\n",
"episode: 8630, score: 1.0, epsilon: 0.08\n",
"episode: 8635, score: 2.0, epsilon: 0.08\n",
"episode: 8640, score: 1.0, epsilon: 0.08\n",
"episode: 8645, score: 6.0, epsilon: 0.08\n",
"episode: 8650, score: 6.0, epsilon: 0.08\n",
"episode: 8655, score: 4.0, epsilon: 0.08\n",
"episode: 8660, score: 3.0, epsilon: 0.08\n",
"episode: 8665, score: 1.0, epsilon: 0.08\n",
"episode: 8670, score: 0.0, epsilon: 0.08\n",
"episode: 8675, score: 0.0, epsilon: 0.08\n",
"episode: 8680, score: 1.0, epsilon: 0.08\n",
"episode: 8685, score: 2.0, epsilon: 0.08\n",
"episode: 8690, score: 0.0, epsilon: 0.08\n",
"episode: 8695, score: 0.0, epsilon: 0.08\n",
"marking, episode: 8700, score: 1.0, mean_score: 2.33, std_score: 2.14\n",
"episode: 8700, score: 1.0, epsilon: 0.08\n",
"episode: 8705, score: 4.0, epsilon: 0.08\n",
"episode: 8710, score: 4.0, epsilon: 0.08\n",
"episode: 8715, score: 4.0, epsilon: 0.08\n",
"episode: 8720, score: 2.0, epsilon: 0.08\n",
"episode: 8725, score: 6.0, epsilon: 0.08\n",
"episode: 8730, score: 6.0, epsilon: 0.08\n",
"episode: 8735, score: 0.0, epsilon: 0.08\n",
"episode: 8740, score: 4.0, epsilon: 0.08\n",
"episode: 8745, score: 6.0, epsilon: 0.08\n",
"episode: 8750, score: 2.0, epsilon: 0.08\n",
"episode: 8755, score: 4.0, epsilon: 0.08\n",
"episode: 8760, score: 1.0, epsilon: 0.08\n",
"episode: 8765, score: 6.0, epsilon: 0.08\n",
"episode: 8770, score: 6.0, epsilon: 0.08\n",
"episode: 8775, score: 1.0, epsilon: 0.08\n",
"episode: 8780, score: 3.0, epsilon: 0.08\n",
"episode: 8785, score: 3.0, epsilon: 0.08\n",
"episode: 8790, score: 3.0, epsilon: 0.08\n",
"episode: 8795, score: 0.0, epsilon: 0.08\n",
"marking, episode: 8800, score: 1.0, mean_score: 2.42, std_score: 2.07\n",
"episode: 8800, score: 1.0, epsilon: 0.08\n",
"episode: 8805, score: 2.0, epsilon: 0.08\n",
"episode: 8810, score: 3.0, epsilon: 0.08\n",
"episode: 8815, score: 2.0, epsilon: 0.08\n",
"episode: 8820, score: 0.0, epsilon: 0.08\n",
"episode: 8825, score: 1.0, epsilon: 0.08\n",
"episode: 8830, score: 3.0, epsilon: 0.08\n",
"episode: 8835, score: 2.0, epsilon: 0.08\n",
"episode: 8840, score: 5.0, epsilon: 0.08\n",
"episode: 8845, score: 1.0, epsilon: 0.08\n",
"episode: 8850, score: 1.0, epsilon: 0.08\n",
"episode: 8855, score: 5.0, epsilon: 0.08\n",
"episode: 8860, score: 6.0, epsilon: 0.08\n",
"episode: 8865, score: 0.0, epsilon: 0.08\n",
"episode: 8870, score: 5.0, epsilon: 0.08\n",
"episode: 8875, score: 0.0, epsilon: 0.08\n",
"episode: 8880, score: 2.0, epsilon: 0.08\n",
"episode: 8885, score: 6.0, epsilon: 0.08\n",
"episode: 8890, score: 2.0, epsilon: 0.08\n",
"episode: 8895, score: 0.0, epsilon: 0.08\n",
"marking, episode: 8900, score: 0.0, mean_score: 2.23, std_score: 1.82\n",
"episode: 8900, score: 0.0, epsilon: 0.08\n",
"episode: 8905, score: 0.0, epsilon: 0.08\n",
"episode: 8910, score: 3.0, epsilon: 0.08\n",
"episode: 8915, score: 6.0, epsilon: 0.08\n",
"episode: 8920, score: 0.0, epsilon: 0.08\n",
"episode: 8925, score: 1.0, epsilon: 0.08\n",
"episode: 8930, score: 6.0, epsilon: 0.08\n",
"episode: 8935, score: 1.0, epsilon: 0.08\n",
"episode: 8940, score: 4.0, epsilon: 0.08\n",
"episode: 8945, score: 1.0, epsilon: 0.08\n",
"episode: 8950, score: 2.0, epsilon: 0.08\n",
"episode: 8955, score: 2.0, epsilon: 0.08\n",
"episode: 8960, score: 4.0, epsilon: 0.08\n",
"episode: 8965, score: 3.0, epsilon: 0.08\n",
"episode: 8970, score: 2.0, epsilon: 0.08\n",
"episode: 8975, score: 3.0, epsilon: 0.08\n",
"episode: 8980, score: 0.0, epsilon: 0.08\n",
"episode: 8985, score: 1.0, epsilon: 0.08\n",
"episode: 8990, score: 0.0, epsilon: 0.08\n",
"episode: 8995, score: 3.0, epsilon: 0.08\n",
"marking, episode: 9000, score: 2.0, mean_score: 2.30, std_score: 2.07\n",
"episode: 9000, score: 2.0, epsilon: 0.08\n",
"episode: 9005, score: 1.0, epsilon: 0.08\n",
"episode: 9010, score: 5.0, epsilon: 0.08\n",
"episode: 9015, score: 1.0, epsilon: 0.08\n",
"episode: 9020, score: 1.0, epsilon: 0.08\n",
"episode: 9025, score: 1.0, epsilon: 0.08\n",
"episode: 9030, score: 2.0, epsilon: 0.08\n",
"episode: 9035, score: 1.0, epsilon: 0.08\n",
"episode: 9040, score: 6.0, epsilon: 0.08\n",
"episode: 9045, score: 2.0, epsilon: 0.08\n",
"episode: 9050, score: 1.0, epsilon: 0.08\n",
"episode: 9055, score: 0.0, epsilon: 0.08\n",
"episode: 9060, score: 2.0, epsilon: 0.08\n",
"episode: 9065, score: 1.0, epsilon: 0.08\n",
"episode: 9070, score: 6.0, epsilon: 0.08\n",
"episode: 9075, score: 1.0, epsilon: 0.08\n",
"episode: 9080, score: 3.0, epsilon: 0.08\n",
"episode: 9085, score: 0.0, epsilon: 0.08\n",
"episode: 9090, score: 3.0, epsilon: 0.08\n",
"episode: 9095, score: 0.0, epsilon: 0.08\n",
"marking, episode: 9100, score: 1.0, mean_score: 2.40, std_score: 2.01\n",
"episode: 9100, score: 1.0, epsilon: 0.08\n",
"episode: 9105, score: 3.0, epsilon: 0.08\n",
"episode: 9110, score: 5.0, epsilon: 0.08\n",
"episode: 9115, score: 2.0, epsilon: 0.08\n",
"episode: 9120, score: 6.0, epsilon: 0.08\n",
"episode: 9125, score: 3.0, epsilon: 0.08\n",
"episode: 9130, score: 2.0, epsilon: 0.08\n",
"episode: 9135, score: 4.0, epsilon: 0.08\n",
"episode: 9140, score: 0.0, epsilon: 0.08\n",
"episode: 9145, score: 6.0, epsilon: 0.08\n",
"episode: 9150, score: 1.0, epsilon: 0.08\n",
"episode: 9155, score: 6.0, epsilon: 0.08\n",
"episode: 9160, score: 0.0, epsilon: 0.08\n",
"episode: 9165, score: 1.0, epsilon: 0.08\n",
"episode: 9170, score: 0.0, epsilon: 0.08\n",
"episode: 9175, score: 5.0, epsilon: 0.08\n",
"episode: 9180, score: 2.0, epsilon: 0.08\n",
"episode: 9185, score: 0.0, epsilon: 0.08\n",
"episode: 9190, score: 2.0, epsilon: 0.08\n",
"episode: 9195, score: 0.0, epsilon: 0.08\n",
"marking, episode: 9200, score: 1.0, mean_score: 2.70, std_score: 2.26\n",
"episode: 9200, score: 1.0, epsilon: 0.08\n",
"episode: 9205, score: 1.0, epsilon: 0.08\n",
"episode: 9210, score: 0.0, epsilon: 0.08\n",
"episode: 9215, score: 2.0, epsilon: 0.08\n",
"episode: 9220, score: 6.0, epsilon: 0.08\n",
"episode: 9225, score: 6.0, epsilon: 0.08\n",
"episode: 9230, score: 0.0, epsilon: 0.08\n",
"episode: 9235, score: 4.0, epsilon: 0.08\n",
"episode: 9240, score: 2.0, epsilon: 0.08\n",
"episode: 9245, score: 0.0, epsilon: 0.08\n",
"episode: 9250, score: 0.0, epsilon: 0.08\n",
"episode: 9255, score: 1.0, epsilon: 0.08\n",
"episode: 9260, score: 0.0, epsilon: 0.08\n",
"episode: 9265, score: 2.0, epsilon: 0.08\n",
"episode: 9270, score: 3.0, epsilon: 0.08\n",
"episode: 9275, score: 0.0, epsilon: 0.08\n",
"episode: 9280, score: 0.0, epsilon: 0.08\n",
"episode: 9285, score: 0.0, epsilon: 0.08\n",
"episode: 9290, score: 2.0, epsilon: 0.08\n",
"episode: 9295, score: 0.0, epsilon: 0.08\n",
"marking, episode: 9300, score: 0.0, mean_score: 2.12, std_score: 2.09\n",
"episode: 9300, score: 0.0, epsilon: 0.08\n",
"episode: 9305, score: 5.0, epsilon: 0.08\n",
"episode: 9310, score: 2.0, epsilon: 0.08\n",
"episode: 9315, score: 2.0, epsilon: 0.08\n",
"episode: 9320, score: 6.0, epsilon: 0.08\n",
"episode: 9325, score: 6.0, epsilon: 0.08\n",
"episode: 9330, score: 2.0, epsilon: 0.08\n",
"episode: 9335, score: 6.0, epsilon: 0.08\n",
"episode: 9340, score: 1.0, epsilon: 0.08\n",
"episode: 9345, score: 4.0, epsilon: 0.08\n",
"episode: 9350, score: 3.0, epsilon: 0.08\n",
"episode: 9355, score: 0.0, epsilon: 0.08\n",
"episode: 9360, score: 6.0, epsilon: 0.08\n",
"episode: 9365, score: 6.0, epsilon: 0.08\n",
"episode: 9370, score: 1.0, epsilon: 0.08\n",
"episode: 9375, score: 1.0, epsilon: 0.08\n",
"episode: 9380, score: 6.0, epsilon: 0.08\n",
"episode: 9385, score: 5.0, epsilon: 0.08\n",
"episode: 9390, score: 2.0, epsilon: 0.08\n",
"episode: 9395, score: 5.0, epsilon: 0.08\n",
"marking, episode: 9400, score: 4.0, mean_score: 2.68, std_score: 2.31\n",
"episode: 9400, score: 4.0, epsilon: 0.08\n",
"episode: 9405, score: 0.0, epsilon: 0.08\n",
"episode: 9410, score: 2.0, epsilon: 0.08\n",
"episode: 9415, score: 2.0, epsilon: 0.08\n",