Created
September 20, 2020 04:02
-
-
Save Kemquiros/cbaeb23a1c99b80815be8411ac98e30e to your computer and use it in GitHub Desktop.
Bandido-Multibrazo.ipynb
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
{ | |
"nbformat": 4, | |
"nbformat_minor": 0, | |
"metadata": { | |
"colab": { | |
"name": "Bandido-Multibrazo.ipynb", | |
"provenance": [], | |
"authorship_tag": "ABX9TyNg6301tSATcvbVaS28GAAc", | |
"include_colab_link": true | |
}, | |
"kernelspec": { | |
"name": "python3", | |
"display_name": "Python 3" | |
} | |
}, | |
"cells": [ | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "view-in-github", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"<a href=\"https://colab.research.google.com/gist/Kemquiros/cbaeb23a1c99b80815be8411ac98e30e/bandido-multibrazo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>" | |
] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "TvFZ77jxZnG3", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"# Introducción al Aprendizaje Reforzado\n", | |
"## Práctica # 1\n", | |
"Bandido Multibrazo para 10 Máquinas" | |
] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "bAi9xhQ-ZZKz", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"import numpy as np\n", | |
"import numpy.random as rnd" | |
], | |
"execution_count": 62, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "K4lmTogkZltZ", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"SEED = 2020\n", | |
"LIMITE_MONEDAS = 1000\n", | |
"NUMERO_MAQUINAS = 10\n", | |
"\n", | |
"class Maquina:\n", | |
"\n", | |
" def __init__(self, k, mu, sigma):\n", | |
" np.random.seed(seed=SEED)\n", | |
" self.k = k\n", | |
" self.resultados = rnd.normal(mu, sigma, LIMITE_MONEDAS)\n", | |
"\n", | |
" def insertar_moneda(self):\n", | |
" try:\n", | |
" valor, self.resultados = self.resultados[-1], self.resultados[:-1]\n", | |
" except:\n", | |
" raise ValueError('La máquina %d está fuera de servicio' % self.k)\n", | |
" return valor\n", | |
"\n", | |
"class Casino:\n", | |
"\n", | |
" def __init__(self, numero_maquinas):\n", | |
" self.numero_maquinas = numero_maquinas\n", | |
" self.maquinas = [Maquina(i+1, rnd.normal(), rnd.uniform(1,10)) for i in range(0,numero_maquinas)]\n", | |
"\n", | |
" def apostar(self, maquina):\n", | |
" if maquina < 1 or maquina > self.numero_maquinas:\n", | |
" raise ValueError('El casino no cuenta con esa máquina')\n", | |
" return self.maquinas[maquina-1].insertar_moneda()" | |
], | |
"execution_count": 63, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "code", | |
"metadata": { | |
"id": "cNs2e52ocNoh", | |
"colab_type": "code", | |
"colab": {} | |
}, | |
"source": [ | |
"# El casino cuenta con 10 máquinas\n", | |
"# La primera máquina es la 1 y la última es la 10\n", | |
"# Cada máquina sólo soporta máximo 1000 apuestas\n", | |
"casino = Casino(NUMERO_MAQUINAS)\n", | |
"\n", | |
"# Ejemplo:\n", | |
"# Para apostar por la máquina 1\n", | |
"#print (casino.apostar(1))" | |
], | |
"execution_count": 64, | |
"outputs": [] | |
}, | |
{ | |
"cell_type": "markdown", | |
"metadata": { | |
"id": "Ap2VpXjHmdwd", | |
"colab_type": "text" | |
}, | |
"source": [ | |
"# Ejercicio\n", | |
"Teniendo en cuenta que sólo cuenta con 1000 monedas y que cada apuesta cuesta 1 moneda: \n", | |
"Calcular el valor estimado ***Q(a)*** de cada máquina, para tomar la mejor decisión de inversión en el casino.\n", | |
"\n", | |
"Repetir el algoritmo utilizando el método epsilon codicioso, con epsilon igual a:\n", | |
"\n", | |
"\n", | |
"* e = 0\n", | |
"* e = 0.01\n", | |
"* e = 0.1\n", | |
"* e = 0.3\n", | |
"* e = 0.9\n", | |
"\n", | |
"\n", | |
"\n", | |
"\n", | |
"Graficar la distribución de cada máquina utilizando la media (valor estimado) y la desviación estándar.\n", | |
"\n", | |
"Graficar la ganancia promedio en el tiempo comparando cada valor de epsilon.\n", | |
"\n" | |
] | |
} | |
] | |
} |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment