renatoppl/adversarial-scaling-supplementary-material.ipynb Secret

## adversarial-scaling-supplementary-material.ipynb
{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Adversarial Scaling (Supplementary Material)",
      "provenance": [],
      "authorship_tag": "ABX9TyPecIpeIXb7No6wRjkWOOMY",
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/gist/renatoppl/70d21d7ddfb38c9ffc392b91a3ba82d7/adversarial-scaling-supplementary-material.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "colab_type": "text",
        "id": "6TDs4UYndFhA"
      },
      "source": [
        "# Adversarial Scaling (Supplementary Material)\n",
        "\n",
        "We provide below the implementation of the following stochastic bandit algorithms:\n",
        "\n",
        "*   [Active Arm Elimination](http://jmlr.csail.mit.edu/papers/volume7/evendar06a/evendar06a.pdf) (AAE)\n",
        "*   [Upper Confidence Bound](https://www.sciencedirect.com/science/article/pii/0196885885900028) (UCB)\n",
        "*   [Thompson Sampling](https://arxiv.org/abs/1111.1797) (TS)\n",
        "*   Active Arm Elimination with Adversarial Scaling (AAEAS)\n",
        "*   [Barrier-Regularized with Optimism and Adaptivity Online Mirror Descent](https://arxiv.org/abs/1801.03265) (BROAD)\n",
        "*   [Mirror Descent Regularized with Tsallis Entropy](https://arxiv.org/abs/1807.07623) (Tsallis)\n",
        "*   [Mirror Descent Regularized with Hybrid Regularizer](https://arxiv.org/abs/1901.08779) (Hybrid)\n",
        "*   [EXP3++](http://proceedings.mlr.press/v32/seldinb14.html)\n",
        "\n",
        "All algorithms provide $\\log(T)/\\Delta$ guarantees for stochastic bandits, where $T$ is the time horizon and $\\Delta$ is the gap between the means of best and second best arm.. The algorithms are then compared in a $\\{0,1\\}$-setting where there are $n$ arms with means $(\\mu_1, .., \\mu_n)$. Each algorithm takes as input the following:\n",
        "\n",
        "*  a vector $\\mu$ of arm means\n",
        "*  a function $q(t)$ which specifies the quality of each round\n",
        "*  the time horizon $T$\n",
        "*  cold start parameter $t_0$ specifying the first period where $q(t) > 0$.\n",
        "\n",
        "Each algorithm produces the vector of losses in each period. We first present the implementation of each algorithm and next we present the code used to produce the figures in the paper.\n",
        "\n",
        "\n",
        "**Note about mirror descent implementation:** In the Mirror Descent based algorithms (Tsallis, BROAD and Hybrid), we use the\n",
        "Lagrangian formulation of the convex problem solved in each period to obtain\n",
        "a closed form formula given the Lagrange multiplier. Then we use binary\n",
        "search to find the correct Lagrange multiplier. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "eCmRbrv6dFhG",
        "colab": {}
      },
      "source": [
        "# Importing python modules\n",
        "import matplotlib.pyplot as plt\n",
        "import math\n",
        "import sys, time\n",
        "import random\n",
        "from numpy import cumsum, arange, median, mean\n",
        "from numpy.random import beta, choice"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "wPIhOYkQdFhN",
        "colab": {}
      },
      "source": [
        "# Active Arm Elimination\n",
        "def AAE(mu, q, T, t0=0):\n",
        "  A = list(range(len(mu))) # active arms\n",
        "  r = [0 for i in A] # total rewards\n",
        "  n = [t0 / len(mu) for i in A] # number of arm pulls\n",
        "  l = []\n",
        "  for t in range(T):\n",
        "    a = random.sample(A, 1)[0] # sample a random active arm\n",
        "    n[a] += float(1) # increase the count of number of pulls\n",
        "    r[a] += float(random.random() < mu[a] * q(t)) # update its reward\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "    mr = 0 # maximum reward\n",
        "    ao = 0 # optimal arm so far\n",
        "    for a in A:\n",
        "      if r[a] > mr:\n",
        "        (ao, mr) = (a, r[a])\n",
        "\n",
        "    for a in A:\n",
        "      if r[a] < mr - math.sqrt(2 * math.log(T) * n[a]) - math.sqrt(2 * math.log(T) * n[ao]):\n",
        "        A.remove(a)\n",
        "\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "iLzum1iDdFhR",
        "colab": {}
      },
      "source": [
        "# Upper Confidence Bound algorithm\n",
        "def UCB(mu, q, T, t0=0):\n",
        "  A = list(range(len(mu))) # active arms\n",
        "  r = [float(0) for i in A] # total rewards\n",
        "  n = [float(t0 / len(mu)) for i in A] # number of arm pulls\n",
        "  l = []\n",
        "\n",
        "  def upper_bound(r,n):\n",
        "    return (float(r) / n) + math.sqrt(2 * math.log(T) / float(n)) \n",
        "\n",
        "  def lower_bound(r,n):\n",
        "    return (float(r) / n) - math.sqrt(2 * math.log(T) / float(n)) \n",
        "\n",
        "  for t in range(T):\n",
        "    a = A[0]\n",
        "    if len(A) > 1:\n",
        "      if t < 2:\n",
        "        a = t\n",
        "      elif upper_bound(r[1], n[1]) >  upper_bound(r[0], n[0]):\n",
        "        a = 1\n",
        "    n[a] += float(1) # increase the count of number of pulls\n",
        "    r[a] += float(random.random() < mu[a] * q(t)) # update its reward\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "    mr = 0 # maximum reward\n",
        "    ao = 0 # optimal arm so far\n",
        "    for a in A:\n",
        "      if r[a] > mr:\n",
        "        (ao, mr) = (a, r[a])\n",
        "\n",
        "    for a in A:\n",
        "      if t > 2 and upper_bound(r[a], n[a]) < lower_bound(r[ao], n[ao]):\n",
        "        A.remove(a)\n",
        "\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "PdOdnTbrdFhW",
        "colab": {}
      },
      "source": [
        "# Thompson Sampling\n",
        "def TS(mu, q, T, t0=0):\n",
        "  l = []\n",
        "  A = list(range(len(mu)))\n",
        "  S = [0 for i in A]\n",
        "  F = [t0 / len(mu) for i in A]\n",
        "  for t in range(T):\n",
        "    theta = [beta(1+S[i], 1+F[i]) for i in A]\n",
        "    a = 0\n",
        "    for i in A:\n",
        "      if theta[i] > theta[a]:\n",
        "        a = i\n",
        "    ra = float(random.random() < mu[a] * q(t)) # update its reward\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "    if ra > 0.99: \n",
        "      S[a] += 1\n",
        "    else:\n",
        "      F[a] += 1\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "2f3zRE7hdFha",
        "colab": {}
      },
      "source": [
        "# Active Arm Elimination with Adversarial Scaling\n",
        "def AAEAS(mu, q, T, t0=0):\n",
        "  A = list(range(len(mu))) # active arms\n",
        "  r = [0 for i in A] # total rewards\n",
        "  s = 0 # total reward across all arms\n",
        "  l = []\n",
        "  for t in range(T):\n",
        "    a = random.sample(A, 1)[0] # sample a random active arm\n",
        "    rt = float((random.random() < mu[a] * q(t)))\n",
        "    r[a] += rt\n",
        "    s += rt\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "    mr = 0 # maximum reward\n",
        "    ao = 0 # optimal arm so far\n",
        "    for a in A:\n",
        "      if r[a] > mr:\n",
        "        (ao, mr) = (a, r[a])\n",
        "        \n",
        "    for a in A:\n",
        "      if r[a] < mr - math.sqrt(max(8 * s * math.log(T), 32 * math.log(T) * math.log(T))):\n",
        "        A.remove(a)\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "ru3HJF4adFhe",
        "colab": {}
      },
      "source": [
        "# Barrier-Regularized with Optimism and Adaptivity Online Mirror Descent\n",
        "def BROAD(mu, q, T, t0=0):\n",
        "  n = len(mu)\n",
        "  w = [1.0/n for i in range(n)];\n",
        "  l = [];\n",
        "  eta = 0.5;\n",
        "  estimator = 0\n",
        "\n",
        "  for t in range(T):\n",
        "    a = choice(range(n), p = w)\n",
        "    r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
        "    cost = -r;\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "\n",
        "    estimator += (w[a]**2) * (r / w[a] - r)**2 + (w[1-a] * r)**2;\n",
        "    \n",
        "    def GammaLowerBound():\n",
        "      gamma = (-1- eta * cost)/w[a]\n",
        "      for i in range(n):\n",
        "        if i != a:\n",
        "          gamma = max(gamma, -1/w[i] )\n",
        "      return gamma + 1e-6\n",
        "\n",
        "    def UpdatedWeight(gamma):\n",
        "      uw = [w[i] / (1 + gamma * w[i]) for i in range(n)];\n",
        "      uw[a] = w[a] / (1+ eta * cost + gamma * w[a])\n",
        "      return uw;\n",
        "\n",
        "    def UpdatedWeightSum(gamma):\n",
        "      return sum(UpdatedWeight(gamma));\n",
        "\n",
        "    def FindGamma():\n",
        "      low = GammaLowerBound();\n",
        "      high = 10.0 / w[a];\n",
        "      assert(UpdatedWeightSum(low) >= 1-1e-6);\n",
        "      assert(UpdatedWeightSum(high) < 1);\n",
        "      while (high - low > 1e-6):\n",
        "        mid = 0.5 * (high + low);\n",
        "        if UpdatedWeightSum(mid) > 1  :\n",
        "          low = mid;\n",
        "        else:\n",
        "          high = mid;\n",
        "      return high\n",
        "    \n",
        "    if estimator > math.log(T) / (eta**2):\n",
        "        eta = eta/2;\n",
        "        w = [1.0/n for i in range(n)];\n",
        "        estimator = 0;\n",
        "    else:\n",
        "      nw = UpdatedWeight(FindGamma())\n",
        "      tw = sum(nw)\n",
        "      # At this point the total weight should be 1 +/- 1e-6\n",
        "      # We re-normalize to make exactly equal to 1 otherwise the sampling\n",
        "      # procedure returns an error\n",
        "      w = [nw[i] / tw for i in range(n)];\n",
        "\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "EJcuK6t5dFhj",
        "colab": {}
      },
      "source": [
        "# Mirror Descent Regularized with Tsallis Entropy\n",
        "def Tsallis(mu, q, T, t0=0):\n",
        "  n = len(mu)\n",
        "  w = [1.0/n for i in range(n)];\n",
        "  L = [0 for i in range(n)]\n",
        "  l = [];\n",
        "\n",
        "  def eta(t):\n",
        "    return 2 * math.sqrt((1-1.0/math.sqrt(2))*(1-1.0/math.sqrt((t+2)))/(t+2));\n",
        "\n",
        "  for dt in range(T):\n",
        "    t = t0 + dt;\n",
        "    a = choice(range(n), p = w)\n",
        "    r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
        "    L[a] += (-r)/w[a];\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "    \n",
        "    def GammaLowerBound():\n",
        "      return -min(L) - 2.0 / eta(t) + 1e-6\n",
        "    \n",
        "    def GammaUpperBound():\n",
        "      return -min(L) + 2.0 * n / eta(t)\n",
        "\n",
        "    def UpdatedWeight(gamma):\n",
        "      return [(2/(2 + eta(t) * (gamma + L[i])))**2 for i in range(n)];\n",
        "\n",
        "    def UpdatedWeightSum(gamma):\n",
        "      return sum(UpdatedWeight(gamma));\n",
        "\n",
        "    def FindGamma():\n",
        "      low = GammaLowerBound();\n",
        "      high = GammaUpperBound();\n",
        "      assert(UpdatedWeightSum(low) >= 1-1e-6);\n",
        "      assert(UpdatedWeightSum(high) < 1);\n",
        "      while (high - low > 1e-6):\n",
        "        mid = 0.5 * (high + low);\n",
        "        if UpdatedWeightSum(mid) > 1  :\n",
        "          low = mid;\n",
        "        else:\n",
        "          high = mid;\n",
        "      return high\n",
        "    \n",
        "\n",
        "    nw = UpdatedWeight(FindGamma())\n",
        "    tw = sum(nw)\n",
        "    # At this point the total weight should be 1 +/- 1e-6\n",
        "    # We re-normalize to make exactly equal to 1 otherwise the sampling\n",
        "    # procedure returns an error\n",
        "    w = [nw[i] / tw for i in range(n)];\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "wqZtDa1JdFhn",
        "colab": {}
      },
      "source": [
        "# Mirror Descent Regularized with Hybrid Regularizer\n",
        "def Hybrid(mu, q, T, t0=0):\n",
        "  n = len(mu)\n",
        "  w = [1.0/n for i in range(n)];\n",
        "  L = [0 for i in range(n)]\n",
        "  l = [];\n",
        "\n",
        "  def eta(t):\n",
        "    return math.sqrt(1.0/t);\n",
        "\n",
        "  for dt in range(T):\n",
        "    t = 1 + t0 + dt;\n",
        "    a = choice(range(n), p = w)\n",
        "    r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
        "    L[a] += (-r)/w[a];\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "    \n",
        "    def g(w):\n",
        "      if (w >= 1):\n",
        "        return -float(\"inf\");\n",
        "      elif (w <= 0):\n",
        "        return float(\"inf\");\n",
        "      else:\n",
        "        return 1/(2*math.sqrt(w)) + math.log(1.0-w);\n",
        "      \n",
        "    def GammaLowerBound():\n",
        "      return g(0.9)/eta(t) - max(L)\n",
        "    \n",
        "    def GammaUpperBound():\n",
        "      return g(0.1)/eta(t) - min(L)\n",
        "\n",
        "    def SolveForW(a, gamma):\n",
        "      # Solve the equation eta * (L[a] + gamma) + 1 = 1/(2sqrt(w)) + log(1-w) \n",
        "      low = 0\n",
        "      high = 1\n",
        "\n",
        "      def f(w):\n",
        "        return g(w) - eta(t) * (L[a] + gamma)\n",
        "      \n",
        "      assert(f(low) > 0);\n",
        "      assert(f(high) < 0);\n",
        "      while (high - low > 1e-6):\n",
        "        mid = 0.5 * (high + low);\n",
        "        if f(mid) > 0  :\n",
        "          low = mid;\n",
        "        else:\n",
        "          high = mid;\n",
        "      return high \n",
        "\n",
        "    def UpdatedWeight(gamma):\n",
        "      return [SolveForW(a,gamma) for a in range(n)];\n",
        "\n",
        "    def UpdatedWeightSum(gamma):\n",
        "      return sum(UpdatedWeight(gamma));\n",
        "\n",
        "    def FindGamma():\n",
        "      low = GammaLowerBound();\n",
        "      high = GammaUpperBound();\n",
        "      assert(UpdatedWeightSum(low) > 1);\n",
        "      assert(UpdatedWeightSum(high) < 1);\n",
        "      while (high - low > 1e-6):\n",
        "        mid = 0.5 * (high + low);\n",
        "        if UpdatedWeightSum(mid) > 1 :\n",
        "          low = mid;\n",
        "        else:\n",
        "          high = mid;\n",
        "      return high\n",
        "    \n",
        "\n",
        "    nw = UpdatedWeight(FindGamma())\n",
        "    tw = sum(nw)\n",
        "    # At this point the total weight should be 1 +/- 1e-6\n",
        "    # We re-normalize to make exactly equal to 1 otherwise the sampling\n",
        "    # procedure returns an error\n",
        "    w = [nw[i] / tw for i in range(n)];\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "F32oSDCHdFhr",
        "colab": {}
      },
      "source": [
        "# EXP3++\n",
        "def EXP3pp(mu, q, T, t0=0):\n",
        "  n = len(mu)\n",
        "  L = [0 for i in range(n)]\n",
        "  l = [];\n",
        "\n",
        "  def eta(t):\n",
        "    return 0.5 * math.sqrt(math.log(n) / (t * n));\n",
        "\n",
        "  def delta(t, a):\n",
        "    return min(1, (1.0/t) * (L[a] - min(L)))\n",
        "\n",
        "  def xi(t, a):\n",
        "    if (delta(t,a) <= 0):\n",
        "      return float(\"inf\");\n",
        "    else:\n",
        "      return math.log(t)**2 / (t * delta(t,a)**2);\n",
        "\n",
        "  def epsilon(t, a):\n",
        "    return min(eta(t), 1.0/(2*n), xi(t,a));\n",
        "\n",
        "  for dt in range(T):\n",
        "    t = 1 + t0 + dt;\n",
        "    w = [math.exp(eta(t)* L[a]) for a in range(n)]\n",
        "    sw = sum(w)\n",
        "    w = [x / sw for x in w];\n",
        "    sum_epsilon = sum([epsilon(t,a) for a in range(n)])\n",
        "    wt = [(1-sum_epsilon) * w[a] + epsilon(t,a) for a in range(n)];\n",
        "    swt = sum(wt)\n",
        "    wt = [x / swt for x in wt];\n",
        "    a = choice(range(n), p = wt)\n",
        "    r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
        "    L[a] -= (-r)/wt[a];\n",
        "    l.append((max(mu) - mu[a]) * q(t))\n",
        "  return l"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "wKQPYb8JeY94",
        "colab_type": "text"
      },
      "source": [
        "## Utilities to Run Bandit Algorithms"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "jmZbJTatdFhv",
        "colab": {}
      },
      "source": [
        "def RunAlgo(algo, mu, T, q, t0, samples):\n",
        "  print(\"Running \" + algo.__name__)\n",
        "  def do_algo():\n",
        "    sys.stdout.write('.')\n",
        "    sys.stdout.flush()\n",
        "    return algo(mu, q, T, t0);\n",
        "\n",
        "  start_time = time.time()\n",
        "  L = [do_algo() for r in range(samples)];\n",
        "  end_time = time.time()\n",
        "  print(\"\\ntime = \" + str(end_time - start_time) + \"sec\")\n",
        "  Lm = [mean([ L[s][t] for s in range(samples) ],0) for t in range(T)]\n",
        "  return cumsum(Lm)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "0XfPqxfmdFhz",
        "colab": {}
      },
      "source": [
        "def RunAllAlgos(mu, T, q, t0, samples):\n",
        "  LUCB = RunAlgo(UCB, mu, T, q, t0, samples)\n",
        "  LTS = RunAlgo(TS, mu, T, q, t0, samples)\n",
        "  LAAE = RunAlgo(AAE, mu, T, q, t0, samples)\n",
        "  LAAEAS = RunAlgo(AAEAS, mu, T, q, t0, samples)\n",
        "  LBROAD = RunAlgo(BROAD, mu, T, q, t0, samples)\n",
        "  LTsallis = RunAlgo(Tsallis, mu, T, q, t0, samples) \n",
        "  LEXP3pp = RunAlgo(EXP3pp, mu, T, q, t0, samples)\n",
        "  return [LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp]"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "BYPM-nDkdFh3",
        "colab": {}
      },
      "source": [
        "def PlotAlgos(I, LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp):\n",
        "  plt.xlabel('round')\n",
        "  plt.ylabel('cummulative loss')\n",
        "  plt.plot(I, LUCB, label=\"UCB\")\n",
        "  plt.plot(I, LTS, label=\"TS\")\n",
        "  plt.plot(I, LAAE, label=\"AAE\")\n",
        "  plt.plot(I, LAAEAS, label=\"AAEAS\")\n",
        "  plt.plot(I, LBROAD, label=\"BROAD\")\n",
        "  plt.plot(I, LTsallis, label=\"Tsallis\")\n",
        "  plt.plot(I, LEXP3pp, label=\"EXP3++\")\n",
        "  plt.legend(loc='upper left')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "5X6czDeOeoCS",
        "colab_type": "text"
      },
      "source": [
        "## Figure 1"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "EGXFe4d7dFh7",
        "colab": {}
      },
      "source": [
        "mu = [.5,.8]\n",
        "T = 10000\n",
        "samples = 100\n",
        "t0  = 0\n",
        "I = range(T)\n",
        "q = lambda t: 1\n",
        "\n",
        "[LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp] = RunAllAlgos(mu, T, q, t0, samples);"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "sCEE9pB3dFiA",
        "colab": {}
      },
      "source": [
        "PlotAlgos(I, LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "eHmtlbXie06Z",
        "colab_type": "text"
      },
      "source": [
        "## Figure 2"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "uKsinerfe5kx",
        "colab": {}
      },
      "source": [
        "mu = [.005,.001]\n",
        "T = 200000\n",
        "samples = 100\n",
        "t0  = 0\n",
        "I = range(T)\n",
        "q = lambda t: 1\n",
        "\n",
        "[LUCB1, LTS1, LAAE1, LAAEAS1, LBROAD1, LTsallis1, LEXP3pp1] = RunAllAlgos(mu, T, q, t0, samples);"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "skCPGBm8e5k7",
        "colab": {}
      },
      "source": [
        "PlotAlgos(I, LUCB1, LTS1, LAAE1, LAAEAS1, LBROAD1, LTsallis1, LEXP3pp1)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "MMu_5QMOe8pY",
        "colab_type": "text"
      },
      "source": [
        "## Figure 3"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "JiT7NiX1dFiU",
        "colab": {}
      },
      "source": [
        "mu = [.5,.8]\n",
        "T = 20000\n",
        "samples = 100\n",
        "t0  = 1000000\n",
        "I = [t+t0 for t in range(T)]\n",
        "q = lambda t: 1\n",
        "\n",
        "[LUCB3, LTS3, LAAE3, LAAEAS3, LBROAD3, LTsallis3, LEXP3pp3] = RunAllAlgos(mu, T, q, t0, samples);"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "iKpk-iZndFiY",
        "colab": {}
      },
      "source": [
        "PlotAlgos(I, LUCB3, LTS3, LAAE3, LAAEAS3, LBROAD3, LTsallis3, LEXP3pp3)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "XwOHXE8ffIrx",
        "colab_type": "text"
      },
      "source": [
        "## Figure 4"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "3fvI_J7ldFit",
        "colab": {}
      },
      "source": [
        "mu = [.5,.8]\n",
        "T = 40000\n",
        "samples = 200\n",
        "t0  = 25\n",
        "I = [t for t in range(T)]\n",
        "q = lambda t: 0 if t < t0 else 1;\n",
        "\n",
        "LTS_A = RunAlgo(TS, mu, T, q, t0, samples)\n",
        "LAAEAS_A = RunAlgo(AAEAS, mu, T, q, t0, samples)\n",
        "LBROAD_A = RunAlgo(BROAD, mu, T, q, t0, samples)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "colab_type": "code",
        "id": "evpWiQawdFix",
        "colab": {}
      },
      "source": [
        "plt.xlabel('round')\n",
        "plt.ylabel('cummulative loss')\n",
        "I = [t for t in range(15000)]\n",
        "plt.plot(I, LTS_A[I], label=\"TS\", color='#ff7f0e')\n",
        "plt.plot(I, LAAEAS_A[I], label=\"AAEAS\", color='#d62728')\n",
        "plt.plot(I, LBROAD_A[I], label=\"BROAD\", color='#9467bd')\n",
        "plt.legend(loc='upper left')"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}
	{
	"nbformat": 4,
	"nbformat_minor": 0,
	"metadata": {
	"colab": {
	"name": "Adversarial Scaling (Supplementary Material)",
	"provenance": [],
	"authorship_tag": "ABX9TyPecIpeIXb7No6wRjkWOOMY",
	"include_colab_link": true
	},
	"kernelspec": {
	"name": "python3",
	"display_name": "Python 3"
	}
	},
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "view-in-github",
	"colab_type": "text"
	},
	"source": [
	"<a href=\"https://colab.research.google.com/gist/renatoppl/70d21d7ddfb38c9ffc392b91a3ba82d7/adversarial-scaling-supplementary-material.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"colab_type": "text",
	"id": "6TDs4UYndFhA"
	},
	"source": [
	"# Adversarial Scaling (Supplementary Material)\n",
	"\n",
	"We provide below the implementation of the following stochastic bandit algorithms:\n",
	"\n",
	"* [Active Arm Elimination](http://jmlr.csail.mit.edu/papers/volume7/evendar06a/evendar06a.pdf) (AAE)\n",
	"* [Upper Confidence Bound](https://www.sciencedirect.com/science/article/pii/0196885885900028) (UCB)\n",
	"* [Thompson Sampling](https://arxiv.org/abs/1111.1797) (TS)\n",
	"* Active Arm Elimination with Adversarial Scaling (AAEAS)\n",
	"* [Barrier-Regularized with Optimism and Adaptivity Online Mirror Descent](https://arxiv.org/abs/1801.03265) (BROAD)\n",
	"* [Mirror Descent Regularized with Tsallis Entropy](https://arxiv.org/abs/1807.07623) (Tsallis)\n",
	"* [Mirror Descent Regularized with Hybrid Regularizer](https://arxiv.org/abs/1901.08779) (Hybrid)\n",
	"* [EXP3++](http://proceedings.mlr.press/v32/seldinb14.html)\n",
	"\n",
	"All algorithms provide $\\log(T)/\\Delta$ guarantees for stochastic bandits, where $T$ is the time horizon and $\\Delta$ is the gap between the means of best and second best arm.. The algorithms are then compared in a $\\{0,1\\}$-setting where there are $n$ arms with means $(\\mu_1, .., \\mu_n)$. Each algorithm takes as input the following:\n",
	"\n",
	"* a vector $\\mu$ of arm means\n",
	"* a function $q(t)$ which specifies the quality of each round\n",
	"* the time horizon $T$\n",
	"* cold start parameter $t_0$ specifying the first period where $q(t) > 0$.\n",
	"\n",
	"Each algorithm produces the vector of losses in each period. We first present the implementation of each algorithm and next we present the code used to produce the figures in the paper.\n",
	"\n",
	"\n",
	"Note about mirror descent implementation: In the Mirror Descent based algorithms (Tsallis, BROAD and Hybrid), we use the\n",
	"Lagrangian formulation of the convex problem solved in each period to obtain\n",
	"a closed form formula given the Lagrange multiplier. Then we use binary\n",
	"search to find the correct Lagrange multiplier. "
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "eCmRbrv6dFhG",
	"colab": {}
	},
	"source": [
	"# Importing python modules\n",
	"import matplotlib.pyplot as plt\n",
	"import math\n",
	"import sys, time\n",
	"import random\n",
	"from numpy import cumsum, arange, median, mean\n",
	"from numpy.random import beta, choice"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "wPIhOYkQdFhN",
	"colab": {}
	},
	"source": [
	"# Active Arm Elimination\n",
	"def AAE(mu, q, T, t0=0):\n",
	" A = list(range(len(mu))) # active arms\n",
	" r = [0 for i in A] # total rewards\n",
	" n = [t0 / len(mu) for i in A] # number of arm pulls\n",
	" l = []\n",
	" for t in range(T):\n",
	" a = random.sample(A, 1)[0] # sample a random active arm\n",
	" n[a] += float(1) # increase the count of number of pulls\n",
	" r[a] += float(random.random() < mu[a] * q(t)) # update its reward\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" mr = 0 # maximum reward\n",
	" ao = 0 # optimal arm so far\n",
	" for a in A:\n",
	" if r[a] > mr:\n",
	" (ao, mr) = (a, r[a])\n",
	"\n",
	" for a in A:\n",
	" if r[a] < mr - math.sqrt(2 * math.log(T) * n[a]) - math.sqrt(2 * math.log(T) * n[ao]):\n",
	" A.remove(a)\n",
	"\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "iLzum1iDdFhR",
	"colab": {}
	},
	"source": [
	"# Upper Confidence Bound algorithm\n",
	"def UCB(mu, q, T, t0=0):\n",
	" A = list(range(len(mu))) # active arms\n",
	" r = [float(0) for i in A] # total rewards\n",
	" n = [float(t0 / len(mu)) for i in A] # number of arm pulls\n",
	" l = []\n",
	"\n",
	" def upper_bound(r,n):\n",
	" return (float(r) / n) + math.sqrt(2 * math.log(T) / float(n)) \n",
	"\n",
	" def lower_bound(r,n):\n",
	" return (float(r) / n) - math.sqrt(2 * math.log(T) / float(n)) \n",
	"\n",
	" for t in range(T):\n",
	" a = A[0]\n",
	" if len(A) > 1:\n",
	" if t < 2:\n",
	" a = t\n",
	" elif upper_bound(r[1], n[1]) > upper_bound(r[0], n[0]):\n",
	" a = 1\n",
	" n[a] += float(1) # increase the count of number of pulls\n",
	" r[a] += float(random.random() < mu[a] * q(t)) # update its reward\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" mr = 0 # maximum reward\n",
	" ao = 0 # optimal arm so far\n",
	" for a in A:\n",
	" if r[a] > mr:\n",
	" (ao, mr) = (a, r[a])\n",
	"\n",
	" for a in A:\n",
	" if t > 2 and upper_bound(r[a], n[a]) < lower_bound(r[ao], n[ao]):\n",
	" A.remove(a)\n",
	"\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "PdOdnTbrdFhW",
	"colab": {}
	},
	"source": [
	"# Thompson Sampling\n",
	"def TS(mu, q, T, t0=0):\n",
	" l = []\n",
	" A = list(range(len(mu)))\n",
	" S = [0 for i in A]\n",
	" F = [t0 / len(mu) for i in A]\n",
	" for t in range(T):\n",
	" theta = [beta(1+S[i], 1+F[i]) for i in A]\n",
	" a = 0\n",
	" for i in A:\n",
	" if theta[i] > theta[a]:\n",
	" a = i\n",
	" ra = float(random.random() < mu[a] * q(t)) # update its reward\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" if ra > 0.99: \n",
	" S[a] += 1\n",
	" else:\n",
	" F[a] += 1\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "2f3zRE7hdFha",
	"colab": {}
	},
	"source": [
	"# Active Arm Elimination with Adversarial Scaling\n",
	"def AAEAS(mu, q, T, t0=0):\n",
	" A = list(range(len(mu))) # active arms\n",
	" r = [0 for i in A] # total rewards\n",
	" s = 0 # total reward across all arms\n",
	" l = []\n",
	" for t in range(T):\n",
	" a = random.sample(A, 1)[0] # sample a random active arm\n",
	" rt = float((random.random() < mu[a] * q(t)))\n",
	" r[a] += rt\n",
	" s += rt\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" mr = 0 # maximum reward\n",
	" ao = 0 # optimal arm so far\n",
	" for a in A:\n",
	" if r[a] > mr:\n",
	" (ao, mr) = (a, r[a])\n",
	" \n",
	" for a in A:\n",
	" if r[a] < mr - math.sqrt(max(8 * s * math.log(T), 32 * math.log(T) * math.log(T))):\n",
	" A.remove(a)\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "ru3HJF4adFhe",
	"colab": {}
	},
	"source": [
	"# Barrier-Regularized with Optimism and Adaptivity Online Mirror Descent\n",
	"def BROAD(mu, q, T, t0=0):\n",
	" n = len(mu)\n",
	" w = [1.0/n for i in range(n)];\n",
	" l = [];\n",
	" eta = 0.5;\n",
	" estimator = 0\n",
	"\n",
	" for t in range(T):\n",
	" a = choice(range(n), p = w)\n",
	" r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
	" cost = -r;\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	"\n",
	" estimator += (w[a]*2) (r / w[a] - r)*2 + (w[1-a] r)**2;\n",
	" \n",
	" def GammaLowerBound():\n",
	" gamma = (-1- eta * cost)/w[a]\n",
	" for i in range(n):\n",
	" if i != a:\n",
	" gamma = max(gamma, -1/w[i] )\n",
	" return gamma + 1e-6\n",
	"\n",
	" def UpdatedWeight(gamma):\n",
	" uw = [w[i] / (1 + gamma * w[i]) for i in range(n)];\n",
	" uw[a] = w[a] / (1+ eta * cost + gamma * w[a])\n",
	" return uw;\n",
	"\n",
	" def UpdatedWeightSum(gamma):\n",
	" return sum(UpdatedWeight(gamma));\n",
	"\n",
	" def FindGamma():\n",
	" low = GammaLowerBound();\n",
	" high = 10.0 / w[a];\n",
	" assert(UpdatedWeightSum(low) >= 1-1e-6);\n",
	" assert(UpdatedWeightSum(high) < 1);\n",
	" while (high - low > 1e-6):\n",
	" mid = 0.5 * (high + low);\n",
	" if UpdatedWeightSum(mid) > 1 :\n",
	" low = mid;\n",
	" else:\n",
	" high = mid;\n",
	" return high\n",
	" \n",
	" if estimator > math.log(T) / (eta**2):\n",
	" eta = eta/2;\n",
	" w = [1.0/n for i in range(n)];\n",
	" estimator = 0;\n",
	" else:\n",
	" nw = UpdatedWeight(FindGamma())\n",
	" tw = sum(nw)\n",
	" # At this point the total weight should be 1 +/- 1e-6\n",
	" # We re-normalize to make exactly equal to 1 otherwise the sampling\n",
	" # procedure returns an error\n",
	" w = [nw[i] / tw for i in range(n)];\n",
	"\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "EJcuK6t5dFhj",
	"colab": {}
	},
	"source": [
	"# Mirror Descent Regularized with Tsallis Entropy\n",
	"def Tsallis(mu, q, T, t0=0):\n",
	" n = len(mu)\n",
	" w = [1.0/n for i in range(n)];\n",
	" L = [0 for i in range(n)]\n",
	" l = [];\n",
	"\n",
	" def eta(t):\n",
	" return 2 * math.sqrt((1-1.0/math.sqrt(2))*(1-1.0/math.sqrt((t+2)))/(t+2));\n",
	"\n",
	" for dt in range(T):\n",
	" t = t0 + dt;\n",
	" a = choice(range(n), p = w)\n",
	" r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
	" L[a] += (-r)/w[a];\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" \n",
	" def GammaLowerBound():\n",
	" return -min(L) - 2.0 / eta(t) + 1e-6\n",
	" \n",
	" def GammaUpperBound():\n",
	" return -min(L) + 2.0 * n / eta(t)\n",
	"\n",
	" def UpdatedWeight(gamma):\n",
	" return [(2/(2 + eta(t) * (gamma + L[i])))**2 for i in range(n)];\n",
	"\n",
	" def UpdatedWeightSum(gamma):\n",
	" return sum(UpdatedWeight(gamma));\n",
	"\n",
	" def FindGamma():\n",
	" low = GammaLowerBound();\n",
	" high = GammaUpperBound();\n",
	" assert(UpdatedWeightSum(low) >= 1-1e-6);\n",
	" assert(UpdatedWeightSum(high) < 1);\n",
	" while (high - low > 1e-6):\n",
	" mid = 0.5 * (high + low);\n",
	" if UpdatedWeightSum(mid) > 1 :\n",
	" low = mid;\n",
	" else:\n",
	" high = mid;\n",
	" return high\n",
	" \n",
	"\n",
	" nw = UpdatedWeight(FindGamma())\n",
	" tw = sum(nw)\n",
	" # At this point the total weight should be 1 +/- 1e-6\n",
	" # We re-normalize to make exactly equal to 1 otherwise the sampling\n",
	" # procedure returns an error\n",
	" w = [nw[i] / tw for i in range(n)];\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "wqZtDa1JdFhn",
	"colab": {}
	},
	"source": [
	"# Mirror Descent Regularized with Hybrid Regularizer\n",
	"def Hybrid(mu, q, T, t0=0):\n",
	" n = len(mu)\n",
	" w = [1.0/n for i in range(n)];\n",
	" L = [0 for i in range(n)]\n",
	" l = [];\n",
	"\n",
	" def eta(t):\n",
	" return math.sqrt(1.0/t);\n",
	"\n",
	" for dt in range(T):\n",
	" t = 1 + t0 + dt;\n",
	" a = choice(range(n), p = w)\n",
	" r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
	" L[a] += (-r)/w[a];\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" \n",
	" def g(w):\n",
	" if (w >= 1):\n",
	" return -float(\"inf\");\n",
	" elif (w <= 0):\n",
	" return float(\"inf\");\n",
	" else:\n",
	" return 1/(2*math.sqrt(w)) + math.log(1.0-w);\n",
	" \n",
	" def GammaLowerBound():\n",
	" return g(0.9)/eta(t) - max(L)\n",
	" \n",
	" def GammaUpperBound():\n",
	" return g(0.1)/eta(t) - min(L)\n",
	"\n",
	" def SolveForW(a, gamma):\n",
	" # Solve the equation eta * (L[a] + gamma) + 1 = 1/(2sqrt(w)) + log(1-w) \n",
	" low = 0\n",
	" high = 1\n",
	"\n",
	" def f(w):\n",
	" return g(w) - eta(t) * (L[a] + gamma)\n",
	" \n",
	" assert(f(low) > 0);\n",
	" assert(f(high) < 0);\n",
	" while (high - low > 1e-6):\n",
	" mid = 0.5 * (high + low);\n",
	" if f(mid) > 0 :\n",
	" low = mid;\n",
	" else:\n",
	" high = mid;\n",
	" return high \n",
	"\n",
	" def UpdatedWeight(gamma):\n",
	" return [SolveForW(a,gamma) for a in range(n)];\n",
	"\n",
	" def UpdatedWeightSum(gamma):\n",
	" return sum(UpdatedWeight(gamma));\n",
	"\n",
	" def FindGamma():\n",
	" low = GammaLowerBound();\n",
	" high = GammaUpperBound();\n",
	" assert(UpdatedWeightSum(low) > 1);\n",
	" assert(UpdatedWeightSum(high) < 1);\n",
	" while (high - low > 1e-6):\n",
	" mid = 0.5 * (high + low);\n",
	" if UpdatedWeightSum(mid) > 1 :\n",
	" low = mid;\n",
	" else:\n",
	" high = mid;\n",
	" return high\n",
	" \n",
	"\n",
	" nw = UpdatedWeight(FindGamma())\n",
	" tw = sum(nw)\n",
	" # At this point the total weight should be 1 +/- 1e-6\n",
	" # We re-normalize to make exactly equal to 1 otherwise the sampling\n",
	" # procedure returns an error\n",
	" w = [nw[i] / tw for i in range(n)];\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "F32oSDCHdFhr",
	"colab": {}
	},
	"source": [
	"# EXP3++\n",
	"def EXP3pp(mu, q, T, t0=0):\n",
	" n = len(mu)\n",
	" L = [0 for i in range(n)]\n",
	" l = [];\n",
	"\n",
	" def eta(t):\n",
	" return 0.5 * math.sqrt(math.log(n) / (t * n));\n",
	"\n",
	" def delta(t, a):\n",
	" return min(1, (1.0/t) * (L[a] - min(L)))\n",
	"\n",
	" def xi(t, a):\n",
	" if (delta(t,a) <= 0):\n",
	" return float(\"inf\");\n",
	" else:\n",
	" return math.log(t)*2 / (t delta(t,a)**2);\n",
	"\n",
	" def epsilon(t, a):\n",
	" return min(eta(t), 1.0/(2*n), xi(t,a));\n",
	"\n",
	" for dt in range(T):\n",
	" t = 1 + t0 + dt;\n",
	" w = [math.exp(eta(t)* L[a]) for a in range(n)]\n",
	" sw = sum(w)\n",
	" w = [x / sw for x in w];\n",
	" sum_epsilon = sum([epsilon(t,a) for a in range(n)])\n",
	" wt = [(1-sum_epsilon) * w[a] + epsilon(t,a) for a in range(n)];\n",
	" swt = sum(wt)\n",
	" wt = [x / swt for x in wt];\n",
	" a = choice(range(n), p = wt)\n",
	" r = float(random.random() < mu[a] * q(t)) # reward of arm i\n",
	" L[a] -= (-r)/wt[a];\n",
	" l.append((max(mu) - mu[a]) * q(t))\n",
	" return l"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "wKQPYb8JeY94",
	"colab_type": "text"
	},
	"source": [
	"## Utilities to Run Bandit Algorithms"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "jmZbJTatdFhv",
	"colab": {}
	},
	"source": [
	"def RunAlgo(algo, mu, T, q, t0, samples):\n",
	" print(\"Running \" + algo.__name__)\n",
	" def do_algo():\n",
	" sys.stdout.write('.')\n",
	" sys.stdout.flush()\n",
	" return algo(mu, q, T, t0);\n",
	"\n",
	" start_time = time.time()\n",
	" L = [do_algo() for r in range(samples)];\n",
	" end_time = time.time()\n",
	" print(\"\\ntime = \" + str(end_time - start_time) + \"sec\")\n",
	" Lm = [mean([ L[s][t] for s in range(samples) ],0) for t in range(T)]\n",
	" return cumsum(Lm)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "0XfPqxfmdFhz",
	"colab": {}
	},
	"source": [
	"def RunAllAlgos(mu, T, q, t0, samples):\n",
	" LUCB = RunAlgo(UCB, mu, T, q, t0, samples)\n",
	" LTS = RunAlgo(TS, mu, T, q, t0, samples)\n",
	" LAAE = RunAlgo(AAE, mu, T, q, t0, samples)\n",
	" LAAEAS = RunAlgo(AAEAS, mu, T, q, t0, samples)\n",
	" LBROAD = RunAlgo(BROAD, mu, T, q, t0, samples)\n",
	" LTsallis = RunAlgo(Tsallis, mu, T, q, t0, samples) \n",
	" LEXP3pp = RunAlgo(EXP3pp, mu, T, q, t0, samples)\n",
	" return [LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp]"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "BYPM-nDkdFh3",
	"colab": {}
	},
	"source": [
	"def PlotAlgos(I, LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp):\n",
	" plt.xlabel('round')\n",
	" plt.ylabel('cummulative loss')\n",
	" plt.plot(I, LUCB, label=\"UCB\")\n",
	" plt.plot(I, LTS, label=\"TS\")\n",
	" plt.plot(I, LAAE, label=\"AAE\")\n",
	" plt.plot(I, LAAEAS, label=\"AAEAS\")\n",
	" plt.plot(I, LBROAD, label=\"BROAD\")\n",
	" plt.plot(I, LTsallis, label=\"Tsallis\")\n",
	" plt.plot(I, LEXP3pp, label=\"EXP3++\")\n",
	" plt.legend(loc='upper left')"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "5X6czDeOeoCS",
	"colab_type": "text"
	},
	"source": [
	"## Figure 1"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "EGXFe4d7dFh7",
	"colab": {}
	},
	"source": [
	"mu = [.5,.8]\n",
	"T = 10000\n",
	"samples = 100\n",
	"t0 = 0\n",
	"I = range(T)\n",
	"q = lambda t: 1\n",
	"\n",
	"[LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp] = RunAllAlgos(mu, T, q, t0, samples);"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "sCEE9pB3dFiA",
	"colab": {}
	},
	"source": [
	"PlotAlgos(I, LUCB, LTS, LAAE, LAAEAS, LBROAD, LTsallis, LEXP3pp)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "eHmtlbXie06Z",
	"colab_type": "text"
	},
	"source": [
	"## Figure 2"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "uKsinerfe5kx",
	"colab": {}
	},
	"source": [
	"mu = [.005,.001]\n",
	"T = 200000\n",
	"samples = 100\n",
	"t0 = 0\n",
	"I = range(T)\n",
	"q = lambda t: 1\n",
	"\n",
	"[LUCB1, LTS1, LAAE1, LAAEAS1, LBROAD1, LTsallis1, LEXP3pp1] = RunAllAlgos(mu, T, q, t0, samples);"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "skCPGBm8e5k7",
	"colab": {}
	},
	"source": [
	"PlotAlgos(I, LUCB1, LTS1, LAAE1, LAAEAS1, LBROAD1, LTsallis1, LEXP3pp1)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "MMu_5QMOe8pY",
	"colab_type": "text"
	},
	"source": [
	"## Figure 3"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "JiT7NiX1dFiU",
	"colab": {}
	},
	"source": [
	"mu = [.5,.8]\n",
	"T = 20000\n",
	"samples = 100\n",
	"t0 = 1000000\n",
	"I = [t+t0 for t in range(T)]\n",
	"q = lambda t: 1\n",
	"\n",
	"[LUCB3, LTS3, LAAE3, LAAEAS3, LBROAD3, LTsallis3, LEXP3pp3] = RunAllAlgos(mu, T, q, t0, samples);"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "iKpk-iZndFiY",
	"colab": {}
	},
	"source": [
	"PlotAlgos(I, LUCB3, LTS3, LAAE3, LAAEAS3, LBROAD3, LTsallis3, LEXP3pp3)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"id": "XwOHXE8ffIrx",
	"colab_type": "text"
	},
	"source": [
	"## Figure 4"
	]
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "3fvI_J7ldFit",
	"colab": {}
	},
	"source": [
	"mu = [.5,.8]\n",
	"T = 40000\n",
	"samples = 200\n",
	"t0 = 25\n",
	"I = [t for t in range(T)]\n",
	"q = lambda t: 0 if t < t0 else 1;\n",
	"\n",
	"LTS_A = RunAlgo(TS, mu, T, q, t0, samples)\n",
	"LAAEAS_A = RunAlgo(AAEAS, mu, T, q, t0, samples)\n",
	"LBROAD_A = RunAlgo(BROAD, mu, T, q, t0, samples)"
	],
	"execution_count": 0,
	"outputs": []
	},
	{
	"cell_type": "code",
	"metadata": {
	"colab_type": "code",
	"id": "evpWiQawdFix",
	"colab": {}
	},
	"source": [
	"plt.xlabel('round')\n",
	"plt.ylabel('cummulative loss')\n",
	"I = [t for t in range(15000)]\n",
	"plt.plot(I, LTS_A[I], label=\"TS\", color='#ff7f0e')\n",
	"plt.plot(I, LAAEAS_A[I], label=\"AAEAS\", color='#d62728')\n",
	"plt.plot(I, LBROAD_A[I], label=\"BROAD\", color='#9467bd')\n",
	"plt.legend(loc='upper left')"
	],
	"execution_count": 0,
	"outputs": []
	}
	]
	}