mdagost/The Adding Problem.ipynb

## The Adding Problem.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "# Baseline MSE of the Adding Problem\n",
    "**Michelangelo D'Agostino**<br>\n",
    "**mdagost@gmail.com**<br>\n",
    "**August 2018**<br>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "In [\"An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling\"](https://arxiv.org/abs/1803.01271), Bai et al. use the \"adding problem\" as a canonical test task.  They describe the problem like this:\n",
    "\n",
    "**In this task, each input consists of a length-n sequence of depth 2, with all values randomly chosen in [0, 1], and the second dimension being all zeros except for two elements that are marked by 1. The objective is to sum the two random values whose second dimensions are marked by 1. Simply predicting the sum to be 1 should give an MSE of about 0.1767. First introduced by Hochreiter & Schmidhuber (1997), the adding problem has been used repeatedly as a stress test for sequence models.**\n",
    "\n",
    "I found the same formulation of the adding problem described by [Le and Hinton](https://arxiv.org/abs/1504.00941), who also say that the MSE of a prediction of 1 should be around 0.1767.\n",
    "\n",
    "It seemed like a simple-but-interesting-enough probability problem to derive that value of the MSE on my train ride home.  In doing so, I *think* that I get a very slightly different answer than these papers do."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "deletable": true,
    "editable": true
   },
   "source": [
    "Let's denote the two uniform random variables as $x_1$ and $x_2$ and expectation values by brackets.  Then:\n",
    "\n",
    "$$\n",
    "\\begin{equation}\n",
    "\\begin{split}\n",
    "MSE & = < (1 - (x_1 + x_2))^2> \\\\\n",
    " & = < (1 - x_1 - x_2)(1 - x_1 - x_2) > \\\\\n",
    " & = < 1 - x_1 - x_2 - x_1 + x_1^2 + x_1x_2 - x_2 + x_2x_1 + x_2^2 > \\\\\n",
    " & = 1 - 2< x_1 > - 2< x_2 > + 2<x_1x_2> + <x_1^2> + <x_2^2> \\\\\n",
    " & = 1 - 2*\\frac{1}{2} - 2*\\frac{1}{2} + 2*\\frac{1}{4} + \\frac{1}{3} + \\frac{1}{3} \\\\\n",
    " & = \\frac{1}{6} \\\\\n",
    " & \\approx 0.1667\n",
    "\\end{split}\n",
    "\\end{equation}\n",
    "$$\n",
    "\n",
    "**Note that this is not  0.1767 as the papers claim**.\n",
    "\n",
    "Let's do a simple simulation to confirm this result."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": [
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": false,
    "deletable": true,
    "editable": true
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MSE=0.16655704660644613\n",
      "CPU times: user 4min 7s, sys: 41.3 s, total: 4min 48s\n",
      "Wall time: 5min 28s\n"
     ]
    }
   ],
   "source": [
    "%%time\n",
    "\n",
    "n = 100\n",
    "sim_trials = int(1e7)\n",
    "\n",
    "# make a matrix where rows are trials and columns are our n uniform random numbers on [0, 1]\n",
    "M = np.random.rand(sim_trials, n)\n",
    "\n",
    "# for each row/trial, choose two random incides to sum\n",
    "cols_to_sum_indexer = np.array([np.random.choice(n, size=2, replace=False) for i in range(sim_trials)])\n",
    "assert cols_to_sum_indexer.shape == (sim_trials, 2)\n",
    "\n",
    "# we're going index into M, so make a row indexer that is just the row number with shape (sim_trials, 2)\n",
    "rows_indexer = np.repeat(np.arange(sim_trials).reshape(sim_trials, 1), 2, axis=1)\n",
    "assert rows_indexer.shape == (sim_trials, 2)\n",
    "\n",
    "# for each row/trial, pick out the two random numbers and sum them\n",
    "sums = M[rows_indexer, cols_to_sum_indexer].sum(axis=1)\n",
    "\n",
    "# calculate the MSE across rows/trials\n",
    "Z = (1-sums)**2\n",
    "MSE = Z.mean()\n",
    "\n",
    "print(\"MSE={}\".format(MSE))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true,
    "deletable": true,
    "editable": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.5"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": true,
	"editable": true
	},
	"source": [
	"# Baseline MSE of the Adding Problem\n",
	"Michelangelo D'Agostino<br>\n",
	"mdagost@gmail.com<br>\n",
	"August 2018<br>"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": true,
	"editable": true
	},
	"source": [
	"In [\"An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling\"](https://arxiv.org/abs/1803.01271), Bai et al. use the \"adding problem\" as a canonical test task. They describe the problem like this:\n",
	"\n",
	"In this task, each input consists of a length-n sequence of depth 2, with all values randomly chosen in [0, 1], and the second dimension being all zeros except for two elements that are marked by 1. The objective is to sum the two random values whose second dimensions are marked by 1. Simply predicting the sum to be 1 should give an MSE of about 0.1767. First introduced by Hochreiter & Schmidhuber (1997), the adding problem has been used repeatedly as a stress test for sequence models.\n",
	"\n",
	"I found the same formulation of the adding problem described by [Le and Hinton](https://arxiv.org/abs/1504.00941), who also say that the MSE of a prediction of 1 should be around 0.1767.\n",
	"\n",
	"It seemed like a simple-but-interesting-enough probability problem to derive that value of the MSE on my train ride home. In doing so, I think that I get a very slightly different answer than these papers do."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {
	"deletable": true,
	"editable": true
	},
	"source": [
	"Let's denote the two uniform random variables as $x_1$ and $x_2$ and expectation values by brackets. Then:\n",
	"\n",
	"$$\n",
	"\\begin{equation}\n",
	"\\begin{split}\n",
	"MSE & = < (1 - (x_1 + x_2))^2> \\\\\n",
	" & = < (1 - x_1 - x_2)(1 - x_1 - x_2) > \\\\\n",
	" & = < 1 - x_1 - x_2 - x_1 + x_1^2 + x_1x_2 - x_2 + x_2x_1 + x_2^2 > \\\\\n",
	" & = 1 - 2< x_1 > - 2< x_2 > + 2<x_1x_2> + <x_1^2> + <x_2^2> \\\\\n",
	" & = 1 - 2\\frac{1}{2} - 2\\frac{1}{2} + 2*\\frac{1}{4} + \\frac{1}{3} + \\frac{1}{3} \\\\\n",
	" & = \\frac{1}{6} \\\\\n",
	" & \\approx 0.1667\n",
	"\\end{split}\n",
	"\\end{equation}\n",
	"$$\n",
	"\n",
	"Note that this is not 0.1767 as the papers claim.\n",
	"\n",
	"Let's do a simple simulation to confirm this result."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true,
	"deletable": true,
	"editable": true
	},
	"outputs": [],
	"source": [
	"import numpy as np"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": false,
	"deletable": true,
	"editable": true
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"MSE=0.16655704660644613\n",
	"CPU times: user 4min 7s, sys: 41.3 s, total: 4min 48s\n",
	"Wall time: 5min 28s\n"
	]
	}
	],
	"source": [
	"%%time\n",
	"\n",
	"n = 100\n",
	"sim_trials = int(1e7)\n",
	"\n",
	"# make a matrix where rows are trials and columns are our n uniform random numbers on [0, 1]\n",
	"M = np.random.rand(sim_trials, n)\n",
	"\n",
	"# for each row/trial, choose two random incides to sum\n",
	"cols_to_sum_indexer = np.array([np.random.choice(n, size=2, replace=False) for i in range(sim_trials)])\n",
	"assert cols_to_sum_indexer.shape == (sim_trials, 2)\n",
	"\n",
	"# we're going index into M, so make a row indexer that is just the row number with shape (sim_trials, 2)\n",
	"rows_indexer = np.repeat(np.arange(sim_trials).reshape(sim_trials, 1), 2, axis=1)\n",
	"assert rows_indexer.shape == (sim_trials, 2)\n",
	"\n",
	"# for each row/trial, pick out the two random numbers and sum them\n",
	"sums = M[rows_indexer, cols_to_sum_indexer].sum(axis=1)\n",
	"\n",
	"# calculate the MSE across rows/trials\n",
	"Z = (1-sums)**2\n",
	"MSE = Z.mean()\n",
	"\n",
	"print(\"MSE={}\".format(MSE))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": null,
	"metadata": {
	"collapsed": true,
	"deletable": true,
	"editable": true
	},
	"outputs": [],
	"source": []
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.6.5"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 2
	}