Skip to content

Instantly share code, notes, and snippets.

@Kemquiros
Created September 20, 2020 04:02
Show Gist options
  • Save Kemquiros/cbaeb23a1c99b80815be8411ac98e30e to your computer and use it in GitHub Desktop.
Save Kemquiros/cbaeb23a1c99b80815be8411ac98e30e to your computer and use it in GitHub Desktop.
Bandido-Multibrazo.ipynb
Display the source blob
Display the rendered blob
Raw
{
"nbformat": 4,
"nbformat_minor": 0,
"metadata": {
"colab": {
"name": "Bandido-Multibrazo.ipynb",
"provenance": [],
"authorship_tag": "ABX9TyNg6301tSATcvbVaS28GAAc",
"include_colab_link": true
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3"
}
},
"cells": [
{
"cell_type": "markdown",
"metadata": {
"id": "view-in-github",
"colab_type": "text"
},
"source": [
"<a href=\"https://colab.research.google.com/gist/Kemquiros/cbaeb23a1c99b80815be8411ac98e30e/bandido-multibrazo.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "TvFZ77jxZnG3",
"colab_type": "text"
},
"source": [
"# Introducción al Aprendizaje Reforzado\n",
"## Práctica # 1\n",
"Bandido Multibrazo para 10 Máquinas"
]
},
{
"cell_type": "code",
"metadata": {
"id": "bAi9xhQ-ZZKz",
"colab_type": "code",
"colab": {}
},
"source": [
"import numpy as np\n",
"import numpy.random as rnd"
],
"execution_count": 62,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "K4lmTogkZltZ",
"colab_type": "code",
"colab": {}
},
"source": [
"SEED = 2020\n",
"LIMITE_MONEDAS = 1000\n",
"NUMERO_MAQUINAS = 10\n",
"\n",
"class Maquina:\n",
"\n",
" def __init__(self, k, mu, sigma):\n",
" np.random.seed(seed=SEED)\n",
" self.k = k\n",
" self.resultados = rnd.normal(mu, sigma, LIMITE_MONEDAS)\n",
"\n",
" def insertar_moneda(self):\n",
" try:\n",
" valor, self.resultados = self.resultados[-1], self.resultados[:-1]\n",
" except:\n",
" raise ValueError('La máquina %d está fuera de servicio' % self.k)\n",
" return valor\n",
"\n",
"class Casino:\n",
"\n",
" def __init__(self, numero_maquinas):\n",
" self.numero_maquinas = numero_maquinas\n",
" self.maquinas = [Maquina(i+1, rnd.normal(), rnd.uniform(1,10)) for i in range(0,numero_maquinas)]\n",
"\n",
" def apostar(self, maquina):\n",
" if maquina < 1 or maquina > self.numero_maquinas:\n",
" raise ValueError('El casino no cuenta con esa máquina')\n",
" return self.maquinas[maquina-1].insertar_moneda()"
],
"execution_count": 63,
"outputs": []
},
{
"cell_type": "code",
"metadata": {
"id": "cNs2e52ocNoh",
"colab_type": "code",
"colab": {}
},
"source": [
"# El casino cuenta con 10 máquinas\n",
"# La primera máquina es la 1 y la última es la 10\n",
"# Cada máquina sólo soporta máximo 1000 apuestas\n",
"casino = Casino(NUMERO_MAQUINAS)\n",
"\n",
"# Ejemplo:\n",
"# Para apostar por la máquina 1\n",
"#print (casino.apostar(1))"
],
"execution_count": 64,
"outputs": []
},
{
"cell_type": "markdown",
"metadata": {
"id": "Ap2VpXjHmdwd",
"colab_type": "text"
},
"source": [
"# Ejercicio\n",
"Teniendo en cuenta que sólo cuenta con 1000 monedas y que cada apuesta cuesta 1 moneda: \n",
"Calcular el valor estimado ***Q(a)*** de cada máquina, para tomar la mejor decisión de inversión en el casino.\n",
"\n",
"Repetir el algoritmo utilizando el método epsilon codicioso, con epsilon igual a:\n",
"\n",
"\n",
"* e = 0\n",
"* e = 0.01\n",
"* e = 0.1\n",
"* e = 0.3\n",
"* e = 0.9\n",
"\n",
"\n",
"\n",
"\n",
"Graficar la distribución de cada máquina utilizando la media (valor estimado) y la desviación estándar.\n",
"\n",
"Graficar la ganancia promedio en el tiempo comparando cada valor de epsilon.\n",
"\n"
]
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment