mglerner/Optimizing MSD calculations.ipynb

## Optimizing MSD calculations.ipynb
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Writing a faster mean-squared-displacement function\n",
    "\n",
    "I'm trying to calculate the Mean Squared Displacement (MSD) of a particle trajectory. In reality, I have an array of positions for `numtrj` particles and `numpts` timepoints and `dim` dimensions: `pos[numtrj, numpts, dim]`. I think my question has the same answer if I just have the `x` trajectory of a single particle, though.\n",
    "\n",
    "In case you haven't done MSD calculations before, there's one cute way in which you get extra information out of a trajectory. Say I have just five time points, and my positions are\n",
    "\n",
    "```python\n",
    "In [83]: x\n",
    "\n",
    "Out[83]: array([ 0.        ,  1.74528704,  1.59639865,  2.59976219,  3.70852457])\n",
    "```\n",
    "\n",
    "You could just get squared displacement by looking at x\\*\\*2. However, you could also say that you have 4 different values for the displacement at one timestep:\n",
    "\n",
    "```python\n",
    "x[1] - x[0], x[2] - x[1], x[3] - x[2], x[4] - x[3]\n",
    "```\n",
    "\n",
    "Similarly, three values for displacement at two timesteps: x[2:] - x[:-2]. Etc. So, the way I'm calculating MSD at the moment is:\n",
    "\n",
    "```python\n",
    "def msd_1d(x):\n",
    "    result = np.zeros_like(x)\n",
    "    for i in range(1,len(x)):\n",
    "        result[i] = np.average((x[i:] - x[:-i])**2)\n",
    "    return result\n",
    "```\n",
    "or\n",
    "```python\n",
    "def get_msd_traj(pos):\n",
    "    result = np.zeros_like(pos)\n",
    "    for i in range(1,pos.shape[1]):\n",
    "        result[:,i,:] = np.average((pos[:,i:,:] - pos[:,:-i,:])**2,axis=1)\n",
    "    return result\n",
    "```\n",
    "\n",
    "(side note: often the data comes indexed like `pos[numpts, numtrj, dim]` for molecular dynamics trajectories, but that doesn't change anything here)\n",
    "\n",
    "So, I asked [Joshua Adelman](https://twitter.com/synapticarbors) if he had any quick thoughts.\n",
    "\n",
    "Josh cleared up some basic misconceptions for me and gave a couple of suggestions:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## From Josh\n",
    "\n",
    "> Hi Michael,\n",
    "\n",
    "> In general, numba is not going to speed up calculations that involve numpy built-in methods (unless they are things like np.exp, np.sin, np.cos) or array slicing. It works best if you have unrolled any vectorization into explicited nested loops and operate on arrays element-by-element. If you go that route, check to see if you can get the code to compile using @jit(nopython=True), which would be indicitive of numba being able to translate the code to llvm IR without using python objects (which tend to slow things down). I've also had a lot of success lately parallelizing numba code using threading if you aren't dealing with any python objects. If you can compile in nopython mode, then you can also specify:\n",
    "\n",
    "> ```python\n",
    "@jit(nopython=True, nogil=True)\n",
    "```\n",
    "\n",
    "> and use a ThreadPool to chop up the work, using something like:\n",
    "\n",
    "> http://numba.pydata.org/numba-doc/0.19.1/user/examples.html?highlight=nogil#multi-threading\n",
    "\n",
    "> it's undocumented, but for simple stuff you can use:\n",
    "\n",
    "> ```python\n",
    "> from multiprocessing.pool import ThreadPool\n",
    "> pool = ThreadPool(processes=nthreads)\n",
    "> results = pool.map(func, arg)\n",
    "> ```\n",
    "> It has the same API as the multiprocessing pool, and for embarassingly parallel tasks that don't have possible race conditions, it works quite nicely depending on the overhead involved in spawning off the task relative to the cost of the task. \n",
    "\n",
    "> Also, make sure you're using the latest version of numba since it's getting better rapidly. I'm not sure if you'll be able to call np.zeros_like in a numba-ized method in nopython mode, although you might. If not, you can pass in the results array as an argument. \n",
    "\n",
    "> Hope that helps. I can't think of any sneaky vectorization tricks in pure-numpy off the top of my head to make your calculation faster. Let me know if you have any other questions.\n",
    "\n",
    "> Josh"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Back to me\n",
    "\n",
    "So, I tried several things. You have to do some fiddling around to get `nopython=True` to work. In addition to not knowing about `np.average`, things like `x[1:] - x[:-1]` are doomed to failure. However, that last part isn't really trouble, because one of Josh's points was that I should write out all of my loops explicitly. Here are two versions of the 1D version:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "from numba import jit"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def msd_1d(x):\n",
    "    result = np.zeros_like(x)\n",
    "    for i in range(1,len(x)):\n",
    "        result[i] = np.average((x[i:] - x[:-i])**2)\n",
    "    return result\n",
    "\n",
    "@jit(nopython = True)\n",
    "def msd_1d_nb1(x):\n",
    "    result = np.zeros_like(x)\n",
    "    for delta in range(1,len(x)):\n",
    "        thisresult = 0\n",
    "        for i in range(delta,len(x)):\n",
    "            thisresult += (x[i] - x[i-delta])**2\n",
    "        result[delta] = thisresult / (len(x) - delta)\n",
    "    return result\n",
    "    \n",
    "@jit(nopython = True)\n",
    "def msd_1d_nb2(x):\n",
    "    result = np.zeros_like(x)\n",
    "    for delta in range(1,len(x)):\n",
    "        for i in range(delta,len(x)):\n",
    "            result[delta] += (x[i] - x[i-delta])**2\n",
    "        result[delta] = result[delta] / (len(x) - delta)\n",
    "    return result"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Note that the above *definitions* will work almost no matter what you do. You don't get the `numba` issues until you try to actually run the functions. Here are some timings:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The slowest run took 4.14 times longer than the fastest. This could mean that an intermediate result is being cached \n",
      "10000 loops, best of 3: 114 µs per loop\n",
      "The slowest run took 11972.39 times longer than the fastest. This could mean that an intermediate result is being cached \n",
      "100000 loops, best of 3: 7.02 µs per loop\n",
      "The slowest run took 14872.73 times longer than the fastest. This could mean that an intermediate result is being cached \n",
      "100000 loops, best of 3: 6.94 µs per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit msd_1d(np.random.randn(10))\n",
    "%timeit msd_1d_nb1(np.random.randn(10))\n",
    "%timeit msd_1d_nb2(np.random.randn(10))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1000 loops, best of 3: 1.19 ms per loop\n",
      "100000 loops, best of 3: 15.8 µs per loop\n",
      "100000 loops, best of 3: 18.5 µs per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit msd_1d(np.random.randn(100))\n",
    "%timeit msd_1d_nb1(np.random.randn(100))\n",
    "%timeit msd_1d_nb2(np.random.randn(100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "100 loops, best of 3: 13.3 ms per loop\n",
      "1000 loops, best of 3: 540 µs per loop\n",
      "1000 loops, best of 3: 594 µs per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit msd_1d(np.random.randn(1000))\n",
    "%timeit msd_1d_nb1(np.random.randn(1000))\n",
    "%timeit msd_1d_nb2(np.random.randn(1000))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, pretty big win there! Approximately two orders of magnitude. It looks like it scales well too."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The larger version\n",
    "\n",
    "So, it looks like there's not much of a difference between version 1 and 2. But it does look like version 1 is a bit faster."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "def get_msd_traj(pos):\n",
    "    result = np.zeros_like(pos)\n",
    "    for i in range(1,pos.shape[1]):\n",
    "        result[:,i,:] = np.average((pos[:,i:,:] - pos[:,:-i,:])**2,axis=1)\n",
    "    return result\n",
    "\n",
    "@jit(nopython = True)\n",
    "def get_msd_traj_nb1(pos):\n",
    "    result = np.zeros_like(pos)\n",
    "    deltastop = pos.shape[1]\n",
    "    for traj in range(pos.shape[0]):\n",
    "        for dim in range(pos.shape[2]):\n",
    "            for delta in range(1,deltastop):\n",
    "                thisresult = 0\n",
    "                for i in range(delta,deltastop):\n",
    "                    thisresult += (pos[traj,i,dim] - pos[traj,i-delta,dim])**2\n",
    "                result[traj,delta,dim] = thisresult / (deltastop - delta)\n",
    "    return result\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "First, let's do some due diligence and make sure the results are equivalent. Then, timings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "a = np.random.randn(5,3,2)\n",
    "np.all(get_msd_traj(a) == get_msd_traj_nb1(a))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1000 loops, best of 3: 200 µs per loop\n",
      "100000 loops, best of 3: 14.6 µs per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit get_msd_traj(np.random.randn(10,10,2))\n",
    "%timeit get_msd_traj_nb1(np.random.randn(10,10,2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "10 loops, best of 3: 12.3 ms per loop\n",
      "1000 loops, best of 3: 1.84 ms per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit get_msd_traj(np.random.randn(100,100,2))\n",
    "%timeit get_msd_traj_nb1(np.random.randn(100,100,2))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {
    "collapsed": false
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1 loops, best of 3: 27.8 s per loop\n",
      "1 loops, best of 3: 2.47 s per loop\n"
     ]
    }
   ],
   "source": [
    "%timeit get_msd_traj(np.random.randn(512,2001,2))\n",
    "%timeit get_msd_traj_nb1(np.random.randn(512,2001,2))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "So, for the actual data sizes I care about, I'm probably down to \"only\" an order of magnitude speedup. That's still pretty awesome."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.4.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
	{
	"cells": [
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"# Writing a faster mean-squared-displacement function\n",
	"\n",
	"I'm trying to calculate the Mean Squared Displacement (MSD) of a particle trajectory. In reality, I have an array of positions for `numtrj` particles and `numpts` timepoints and `dim` dimensions: `pos[numtrj, numpts, dim]`. I think my question has the same answer if I just have the `x` trajectory of a single particle, though.\n",
	"\n",
	"In case you haven't done MSD calculations before, there's one cute way in which you get extra information out of a trajectory. Say I have just five time points, and my positions are\n",
	"\n",
	"```python\n",
	"In [83]: x\n",
	"\n",
	"Out[83]: array([ 0. , 1.74528704, 1.59639865, 2.59976219, 3.70852457])\n",
	"```\n",
	"\n",
	"You could just get squared displacement by looking at x\\\\2. However, you could also say that you have 4 different values for the displacement at one timestep:\n",
	"\n",
	"```python\n",
	"x[1] - x[0], x[2] - x[1], x[3] - x[2], x[4] - x[3]\n",
	"```\n",
	"\n",
	"Similarly, three values for displacement at two timesteps: x[2:] - x[:-2]. Etc. So, the way I'm calculating MSD at the moment is:\n",
	"\n",
	"```python\n",
	"def msd_1d(x):\n",
	" result = np.zeros_like(x)\n",
	" for i in range(1,len(x)):\n",
	" result[i] = np.average((x[i:] - x[:-i])**2)\n",
	" return result\n",
	"```\n",
	"or\n",
	"```python\n",
	"def get_msd_traj(pos):\n",
	" result = np.zeros_like(pos)\n",
	" for i in range(1,pos.shape[1]):\n",
	" result[:,i,:] = np.average((pos[:,i:,:] - pos[:,:-i,:])**2,axis=1)\n",
	" return result\n",
	"```\n",
	"\n",
	"(side note: often the data comes indexed like `pos[numpts, numtrj, dim]` for molecular dynamics trajectories, but that doesn't change anything here)\n",
	"\n",
	"So, I asked [Joshua Adelman](https://twitter.com/synapticarbors) if he had any quick thoughts.\n",
	"\n",
	"Josh cleared up some basic misconceptions for me and gave a couple of suggestions:"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## From Josh\n",
	"\n",
	"> Hi Michael,\n",
	"\n",
	"> In general, numba is not going to speed up calculations that involve numpy built-in methods (unless they are things like np.exp, np.sin, np.cos) or array slicing. It works best if you have unrolled any vectorization into explicited nested loops and operate on arrays element-by-element. If you go that route, check to see if you can get the code to compile using @jit(nopython=True), which would be indicitive of numba being able to translate the code to llvm IR without using python objects (which tend to slow things down). I've also had a lot of success lately parallelizing numba code using threading if you aren't dealing with any python objects. If you can compile in nopython mode, then you can also specify:\n",
	"\n",
	"> ```python\n",
	"@jit(nopython=True, nogil=True)\n",
	"```\n",
	"\n",
	"> and use a ThreadPool to chop up the work, using something like:\n",
	"\n",
	"> http://numba.pydata.org/numba-doc/0.19.1/user/examples.html?highlight=nogil#multi-threading\n",
	"\n",
	"> it's undocumented, but for simple stuff you can use:\n",
	"\n",
	"> ```python\n",
	"> from multiprocessing.pool import ThreadPool\n",
	"> pool = ThreadPool(processes=nthreads)\n",
	"> results = pool.map(func, arg)\n",
	"> ```\n",
	"> It has the same API as the multiprocessing pool, and for embarassingly parallel tasks that don't have possible race conditions, it works quite nicely depending on the overhead involved in spawning off the task relative to the cost of the task. \n",
	"\n",
	"> Also, make sure you're using the latest version of numba since it's getting better rapidly. I'm not sure if you'll be able to call np.zeros_like in a numba-ized method in nopython mode, although you might. If not, you can pass in the results array as an argument. \n",
	"\n",
	"> Hope that helps. I can't think of any sneaky vectorization tricks in pure-numpy off the top of my head to make your calculation faster. Let me know if you have any other questions.\n",
	"\n",
	"> Josh"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## Back to me\n",
	"\n",
	"So, I tried several things. You have to do some fiddling around to get `nopython=True` to work. In addition to not knowing about `np.average`, things like `x[1:] - x[:-1]` are doomed to failure. However, that last part isn't really trouble, because one of Josh's points was that I should write out all of my loops explicitly. Here are two versions of the 1D version:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 1,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"import numpy as np\n",
	"from numba import jit"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 2,
	"metadata": {
	"collapsed": true
	},
	"outputs": [],
	"source": [
	"def msd_1d(x):\n",
	" result = np.zeros_like(x)\n",
	" for i in range(1,len(x)):\n",
	" result[i] = np.average((x[i:] - x[:-i])**2)\n",
	" return result\n",
	"\n",
	"@jit(nopython = True)\n",
	"def msd_1d_nb1(x):\n",
	" result = np.zeros_like(x)\n",
	" for delta in range(1,len(x)):\n",
	" thisresult = 0\n",
	" for i in range(delta,len(x)):\n",
	" thisresult += (x[i] - x[i-delta])**2\n",
	" result[delta] = thisresult / (len(x) - delta)\n",
	" return result\n",
	" \n",
	"@jit(nopython = True)\n",
	"def msd_1d_nb2(x):\n",
	" result = np.zeros_like(x)\n",
	" for delta in range(1,len(x)):\n",
	" for i in range(delta,len(x)):\n",
	" result[delta] += (x[i] - x[i-delta])**2\n",
	" result[delta] = result[delta] / (len(x) - delta)\n",
	" return result"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Note that the above definitions will work almost no matter what you do. You don't get the `numba` issues until you try to actually run the functions. Here are some timings:"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 3,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"The slowest run took 4.14 times longer than the fastest. This could mean that an intermediate result is being cached \n",
	"10000 loops, best of 3: 114 µs per loop\n",
	"The slowest run took 11972.39 times longer than the fastest. This could mean that an intermediate result is being cached \n",
	"100000 loops, best of 3: 7.02 µs per loop\n",
	"The slowest run took 14872.73 times longer than the fastest. This could mean that an intermediate result is being cached \n",
	"100000 loops, best of 3: 6.94 µs per loop\n"
	]
	}
	],
	"source": [
	"%timeit msd_1d(np.random.randn(10))\n",
	"%timeit msd_1d_nb1(np.random.randn(10))\n",
	"%timeit msd_1d_nb2(np.random.randn(10))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 4,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"1000 loops, best of 3: 1.19 ms per loop\n",
	"100000 loops, best of 3: 15.8 µs per loop\n",
	"100000 loops, best of 3: 18.5 µs per loop\n"
	]
	}
	],
	"source": [
	"%timeit msd_1d(np.random.randn(100))\n",
	"%timeit msd_1d_nb1(np.random.randn(100))\n",
	"%timeit msd_1d_nb2(np.random.randn(100))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 5,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"100 loops, best of 3: 13.3 ms per loop\n",
	"1000 loops, best of 3: 540 µs per loop\n",
	"1000 loops, best of 3: 594 µs per loop\n"
	]
	}
	],
	"source": [
	"%timeit msd_1d(np.random.randn(1000))\n",
	"%timeit msd_1d_nb1(np.random.randn(1000))\n",
	"%timeit msd_1d_nb2(np.random.randn(1000))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"So, pretty big win there! Approximately two orders of magnitude. It looks like it scales well too."
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"## The larger version\n",
	"\n",
	"So, it looks like there's not much of a difference between version 1 and 2. But it does look like version 1 is a bit faster."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 6,
	"metadata": {
	"collapsed": false
	},
	"outputs": [],
	"source": [
	"def get_msd_traj(pos):\n",
	" result = np.zeros_like(pos)\n",
	" for i in range(1,pos.shape[1]):\n",
	" result[:,i,:] = np.average((pos[:,i:,:] - pos[:,:-i,:])**2,axis=1)\n",
	" return result\n",
	"\n",
	"@jit(nopython = True)\n",
	"def get_msd_traj_nb1(pos):\n",
	" result = np.zeros_like(pos)\n",
	" deltastop = pos.shape[1]\n",
	" for traj in range(pos.shape[0]):\n",
	" for dim in range(pos.shape[2]):\n",
	" for delta in range(1,deltastop):\n",
	" thisresult = 0\n",
	" for i in range(delta,deltastop):\n",
	" thisresult += (pos[traj,i,dim] - pos[traj,i-delta,dim])**2\n",
	" result[traj,delta,dim] = thisresult / (deltastop - delta)\n",
	" return result\n"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"First, let's do some due diligence and make sure the results are equivalent. Then, timings."
	]
	},
	{
	"cell_type": "code",
	"execution_count": 7,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"data": {
	"text/plain": [
	"True"
	]
	},
	"execution_count": 7,
	"metadata": {},
	"output_type": "execute_result"
	}
	],
	"source": [
	"a = np.random.randn(5,3,2)\n",
	"np.all(get_msd_traj(a) == get_msd_traj_nb1(a))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 8,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"1000 loops, best of 3: 200 µs per loop\n",
	"100000 loops, best of 3: 14.6 µs per loop\n"
	]
	}
	],
	"source": [
	"%timeit get_msd_traj(np.random.randn(10,10,2))\n",
	"%timeit get_msd_traj_nb1(np.random.randn(10,10,2))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 9,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"10 loops, best of 3: 12.3 ms per loop\n",
	"1000 loops, best of 3: 1.84 ms per loop\n"
	]
	}
	],
	"source": [
	"%timeit get_msd_traj(np.random.randn(100,100,2))\n",
	"%timeit get_msd_traj_nb1(np.random.randn(100,100,2))"
	]
	},
	{
	"cell_type": "code",
	"execution_count": 10,
	"metadata": {
	"collapsed": false
	},
	"outputs": [
	{
	"name": "stdout",
	"output_type": "stream",
	"text": [
	"1 loops, best of 3: 27.8 s per loop\n",
	"1 loops, best of 3: 2.47 s per loop\n"
	]
	}
	],
	"source": [
	"%timeit get_msd_traj(np.random.randn(512,2001,2))\n",
	"%timeit get_msd_traj_nb1(np.random.randn(512,2001,2))"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"So, for the actual data sizes I care about, I'm probably down to \"only\" an order of magnitude speedup. That's still pretty awesome."
	]
	}
	],
	"metadata": {
	"kernelspec": {
	"display_name": "Python 3",
	"language": "python",
	"name": "python3"
	},
	"language_info": {
	"codemirror_mode": {
	"name": "ipython",
	"version": 3
	},
	"file_extension": ".py",
	"mimetype": "text/x-python",
	"name": "python",
	"nbconvert_exporter": "python",
	"pygments_lexer": "ipython3",
	"version": "3.4.3"
	}
	},
	"nbformat": 4,
	"nbformat_minor": 0
	}