moble/MakingNumpyFaster.ipynb

## MakingNumpyFaster.ipynb
{
 "metadata": {
  "name": "",
  "signature": "sha256:b99ad5d473337f218c8e6aae148bb4d3552bb1392306fbf35697573ee93818d7"
 },
 "nbformat": 3,
 "nbformat_minor": 0,
 "worksheets": [
  {
   "cells": [
    {
     "cell_type": "heading",
     "level": 1,
     "metadata": {},
     "source": [
      "Making numpy faster"
     ]
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "The following is the original code in in [this stackoverflow question](http://stackoverflow.com/q/28357897/1194883).  I have just added one line\n",
      "\n",
      "    np.random.seed(123)\n",
      "\n",
      "which ensures that the results are reproducible.  Also, I've added the cell magic `%%time` to see how long it takes."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%time\n",
      "\n",
      "from __future__ import division\n",
      "import numpy as np\n",
      "\n",
      "ngridx = 128\n",
      "ngridy = 128    \n",
      "ngridz = 128\n",
      "\n",
      "maxK = max(ngridx,ngridy,ngridz)\n",
      "\n",
      "#making input file\n",
      "f = np.zeros((ngridx*ngridy*ngridz,4))\n",
      "\n",
      "np.random.seed(123)\n",
      "i = 0\n",
      "for i in np.arange(len(f)):\n",
      "    f[i][0] = int(i/(ngridy*ngridz))\n",
      "    f[i][1] = int((i/ngridz))%ngridy\n",
      "    f[i][2] = int(i%ngridz)\n",
      "    f[i][3] = np.random.rand(1)\n",
      "#     if i%1000000 ==0:\n",
      "#         print i\n",
      "#This takes forever\n",
      "#end making input file\n",
      "\n",
      "#Thanks to Mike,\n",
      "a = f[:,3].reshape(ngridx,ngridy,ngridz)\n",
      "\n",
      "avg =np.sum(f[:,3])/len(f)\n",
      "a /= avg\n",
      "p = np.fft.fftn(a)\n",
      "#This part is much much faster than before (Original Post).\n",
      "\n",
      "#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p\n",
      "#This is just a convension on fourier transformation so you can ignore this part\n",
      "kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )\n",
      "kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )\n",
      "kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )\n",
      "kx = np.zeros((ngridx,ngridy,ngridz))\n",
      "ky = np.zeros((ngridx,ngridy,ngridz))\n",
      "kz = np.zeros((ngridx,ngridy,ngridz))\n",
      "rangecolx = np.arange(ngridx)\n",
      "rangecoly = np.arange(ngridy)\n",
      "rangecolz = np.arange(ngridz)\n",
      "for row in np.arange(ngridx):\n",
      "    for column in np.arange(ngridy):\n",
      "        for height in np.arange(ngridz):\n",
      "            kx[row][column][height] = (kValx[row])\n",
      "            ky[row][column][height] = (kValy[column])\n",
      "            kz[row][column][height] = (kValz[height])\n",
      "#     if row%10 == 0:\n",
      "#         print row\n",
      "# print 'wavenumber generate complete!'\n",
      "\n",
      "#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)\n",
      "#by taking the spherical shell of thickness 1 and averaging out the values inside it.\n",
      "#I am sure that this process can be optimised somehow, but I gave up.\n",
      "\n",
      "qlen = maxK/2 #Nyquist frequency\n",
      "q = np.zeros(((qlen),4),dtype=complex)\n",
      "#q is a four column array with length maxK/2.\n",
      "#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))\n",
      "#q[:,1] is the sum of square of the fourier transformed value \n",
      "#q[:,2] is the sum of the fourier transformed value, \n",
      "#and q[:,3] is the total number of samples with K=q[:,0]\n",
      "\n",
      "for i in  np.arange(len(q)):\n",
      "    q[i][0] = i\n",
      "i = 0\n",
      "for i in np.arange(len(p)):\n",
      "    for r in np.arange(len(p[0])):\n",
      "        for s in np.arange(len(p[0,0])):\n",
      "            K = np.around(np.sqrt(kx[i,r,s]**2+ky[i,r,s]**2+kz[i,r,s]**2))\n",
      "            if K < qlen:\n",
      "                q[K][1]=q[K][1]+np.abs(p[i,r,s])**2\n",
      "                q[K][2]=q[K][2]+p[i,r,s]\n",
      "                q[K][3]=q[K][3]+1   \n",
      "#     if i%10 ==0:\n",
      "#         print 'i = ',i,' !'\n",
      "# print q"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 32 s, sys: 213 ms, total: 32.2 s\n",
        "Wall time: 32.2 s\n"
       ]
      }
     ],
     "prompt_number": 1
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "As we can see, it takes a little over 30 seconds.\n",
      "\n",
      "I'll make a copy of the main result, `q`, so we can compare it to the result from my version of the code and ensure that I haven't screwed anything up."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "q2 = np.copy(q)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [],
     "prompt_number": 2
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "Now, here's my somewhat optimized version of the same code."
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "%%time\n",
      "\n",
      "from __future__ import division\n",
      "import numpy as np\n",
      "\n",
      "ngridx = 128\n",
      "ngridy = 128\n",
      "ngridz = 128\n",
      "\n",
      "maxK = max(ngridx,ngridy,ngridz)\n",
      "\n",
      "#making input file\n",
      "f = np.zeros((ngridx*ngridy*ngridz,4))\n",
      "np.random.seed(123)\n",
      "# Make `i` into an array, then use numpy to operate on\n",
      "# the whole array at once.  This is about 35 times\n",
      "# faster than the loop method.\n",
      "i = np.arange(len(f))\n",
      "f[:,0] = i//(ngridy*ngridz)\n",
      "f[:,1] = i//(ngridz) % ngridy\n",
      "f[:,2] = i%ngridz\n",
      "f[:,3] = np.random.rand(f.shape[0])\n",
      "#end making input file\n",
      "\n",
      "a = f[:,3].reshape(ngridx,ngridy,ngridz)\n",
      "\n",
      "avg = np.sum(f[:,3])/len(f)\n",
      "a /= avg\n",
      "p = np.fft.fftn(a)\n",
      "\n",
      "#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p\n",
      "#This is just a convension on fourier transformation so you can ignore this part\n",
      "kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )\n",
      "kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )\n",
      "kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )\n",
      "kx = np.zeros((ngridx,ngridy,ngridz))\n",
      "ky = np.zeros((ngridx,ngridy,ngridz))\n",
      "kz = np.zeros((ngridx,ngridy,ngridz))\n",
      "# Get rid of as many loops as you can.  This is about 10 times\n",
      "# faster than the nested loops for this number of grid points,\n",
      "# but scales as N instead of N^3 (where N=ngridx*ngridy*ngridz).\n",
      "for row in range(ngridx):\n",
      "    kx[row,:,:] = kValx[row]\n",
      "for column in range(ngridy):\n",
      "    ky[:,column,:] = kValy[column]\n",
      "for height in range(ngridz):\n",
      "    kz[:,:,height] = kValz[height]\n",
      "\n",
      "#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)\n",
      "#by taking the spherical shell of thickness 1 and averaging out the values inside it.\n",
      "#I am sure that this process can be optimised somehow, but I gave up.\n",
      "\n",
      "qlen = maxK//2 #Nyquist frequency\n",
      "q = np.zeros(((qlen),4),dtype=complex)\n",
      "#q is a four column array with length maxK/2.\n",
      "#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))\n",
      "#q[:,1] is the sum of square of the fourier transformed value \n",
      "#q[:,2] is the sum of the fourier transformed value, \n",
      "#and q[:,3] is the total number of samples with K=q[:,0]\n",
      "\n",
      "# Again, we get rid of nested loops, and get a large improvement in speed and scaling\n",
      "K = np.around(np.sqrt(kx**2+ky**2+kz**2)).astype(int)\n",
      "for i in range(qlen):\n",
      "    indices = (K==i)\n",
      "    q[i,0] = i\n",
      "    q[i,1] = sum(np.abs(p[indices])**2)\n",
      "    q[i,2] = sum(p[indices])\n",
      "    q[i,3] = sum(indices)\n",
      "#print q"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "output_type": "stream",
       "stream": "stdout",
       "text": [
        "CPU times: user 805 ms, sys: 113 ms, total: 918 ms\n",
        "Wall time: 916 ms\n"
       ]
      }
     ],
     "prompt_number": 3
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "This version takes just under 1 second.  We can check that the 0 and 3 columns of `q` and `q2` are identical (they're just integers, so the math will be exact):"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.array_equal(q[:,0::3], q2[:,0::3])"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 4,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 4
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We also check that all elements of `q` and `q2` are within roughly roundoff of each other.  (They're not identical because the operations are done in slightly different orders, but they're as close as the computer knows how to get.)"
     ]
    },
    {
     "cell_type": "code",
     "collapsed": false,
     "input": [
      "np.allclose(q, q2, rtol=1e-12, atol=1e-16)"
     ],
     "language": "python",
     "metadata": {},
     "outputs": [
      {
       "metadata": {},
       "output_type": "pyout",
       "prompt_number": 5,
       "text": [
        "True"
       ]
      }
     ],
     "prompt_number": 5
    },
    {
     "cell_type": "markdown",
     "metadata": {},
     "source": [
      "We can glean two key lessons about making numpy faster:\n",
      "\n",
      "  1. Avoid loops as much as possible -- especially loops within loops.  The main reason numpy exists is to move loops inside C code, so you don't have to loop in python.  It has all sorts of functions like `sum`, `abs`, `sqrt`, etc., which operate on entire arrays at once.\n",
      "  2. Avoid copying data.  Just use [slices](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) of numpy arrays unless you need to change the data."
     ]
    }
   ],
   "metadata": {}
  }
 ]
}
	{
	"metadata": {
	"name": "",
	"signature": "sha256:b99ad5d473337f218c8e6aae148bb4d3552bb1392306fbf35697573ee93818d7"
	},
	"nbformat": 3,
	"nbformat_minor": 0,
	"worksheets": [
	{
	"cells": [
	{
	"cell_type": "heading",
	"level": 1,
	"metadata": {},
	"source": [
	"Making numpy faster"
	]
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"The following is the original code in in [this stackoverflow question](http://stackoverflow.com/q/28357897/1194883). I have just added one line\n",
	"\n",
	" np.random.seed(123)\n",
	"\n",
	"which ensures that the results are reproducible. Also, I've added the cell magic `%%time` to see how long it takes."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%time\n",
	"\n",
	"from __future__ import division\n",
	"import numpy as np\n",
	"\n",
	"ngridx = 128\n",
	"ngridy = 128 \n",
	"ngridz = 128\n",
	"\n",
	"maxK = max(ngridx,ngridy,ngridz)\n",
	"\n",
	"#making input file\n",
	"f = np.zeros((ngridxngridyngridz,4))\n",
	"\n",
	"np.random.seed(123)\n",
	"i = 0\n",
	"for i in np.arange(len(f)):\n",
	" f[i][0] = int(i/(ngridy*ngridz))\n",
	" f[i][1] = int((i/ngridz))%ngridy\n",
	" f[i][2] = int(i%ngridz)\n",
	" f[i][3] = np.random.rand(1)\n",
	"# if i%1000000 ==0:\n",
	"# print i\n",
	"#This takes forever\n",
	"#end making input file\n",
	"\n",
	"#Thanks to Mike,\n",
	"a = f[:,3].reshape(ngridx,ngridy,ngridz)\n",
	"\n",
	"avg =np.sum(f[:,3])/len(f)\n",
	"a /= avg\n",
	"p = np.fft.fftn(a)\n",
	"#This part is much much faster than before (Original Post).\n",
	"\n",
	"#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p\n",
	"#This is just a convension on fourier transformation so you can ignore this part\n",
	"kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )\n",
	"kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )\n",
	"kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )\n",
	"kx = np.zeros((ngridx,ngridy,ngridz))\n",
	"ky = np.zeros((ngridx,ngridy,ngridz))\n",
	"kz = np.zeros((ngridx,ngridy,ngridz))\n",
	"rangecolx = np.arange(ngridx)\n",
	"rangecoly = np.arange(ngridy)\n",
	"rangecolz = np.arange(ngridz)\n",
	"for row in np.arange(ngridx):\n",
	" for column in np.arange(ngridy):\n",
	" for height in np.arange(ngridz):\n",
	" kx[row][column][height] = (kValx[row])\n",
	" ky[row][column][height] = (kValy[column])\n",
	" kz[row][column][height] = (kValz[height])\n",
	"# if row%10 == 0:\n",
	"# print row\n",
	"# print 'wavenumber generate complete!'\n",
	"\n",
	"#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)\n",
	"#by taking the spherical shell of thickness 1 and averaging out the values inside it.\n",
	"#I am sure that this process can be optimised somehow, but I gave up.\n",
	"\n",
	"qlen = maxK/2 #Nyquist frequency\n",
	"q = np.zeros(((qlen),4),dtype=complex)\n",
	"#q is a four column array with length maxK/2.\n",
	"#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))\n",
	"#q[:,1] is the sum of square of the fourier transformed value \n",
	"#q[:,2] is the sum of the fourier transformed value, \n",
	"#and q[:,3] is the total number of samples with K=q[:,0]\n",
	"\n",
	"for i in np.arange(len(q)):\n",
	" q[i][0] = i\n",
	"i = 0\n",
	"for i in np.arange(len(p)):\n",
	" for r in np.arange(len(p[0])):\n",
	" for s in np.arange(len(p[0,0])):\n",
	" K = np.around(np.sqrt(kx[i,r,s]2+ky[i,r,s]2+kz[i,r,s]**2))\n",
	" if K < qlen:\n",
	" q[K][1]=q[K][1]+np.abs(p[i,r,s])**2\n",
	" q[K][2]=q[K][2]+p[i,r,s]\n",
	" q[K][3]=q[K][3]+1 \n",
	"# if i%10 ==0:\n",
	"# print 'i = ',i,' !'\n",
	"# print q"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"CPU times: user 32 s, sys: 213 ms, total: 32.2 s\n",
	"Wall time: 32.2 s\n"
	]
	}
	],
	"prompt_number": 1
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"As we can see, it takes a little over 30 seconds.\n",
	"\n",
	"I'll make a copy of the main result, `q`, so we can compare it to the result from my version of the code and ensure that I haven't screwed anything up."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"q2 = np.copy(q)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [],
	"prompt_number": 2
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"Now, here's my somewhat optimized version of the same code."
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"%%time\n",
	"\n",
	"from __future__ import division\n",
	"import numpy as np\n",
	"\n",
	"ngridx = 128\n",
	"ngridy = 128\n",
	"ngridz = 128\n",
	"\n",
	"maxK = max(ngridx,ngridy,ngridz)\n",
	"\n",
	"#making input file\n",
	"f = np.zeros((ngridxngridyngridz,4))\n",
	"np.random.seed(123)\n",
	"# Make `i` into an array, then use numpy to operate on\n",
	"# the whole array at once. This is about 35 times\n",
	"# faster than the loop method.\n",
	"i = np.arange(len(f))\n",
	"f[:,0] = i//(ngridy*ngridz)\n",
	"f[:,1] = i//(ngridz) % ngridy\n",
	"f[:,2] = i%ngridz\n",
	"f[:,3] = np.random.rand(f.shape[0])\n",
	"#end making input file\n",
	"\n",
	"a = f[:,3].reshape(ngridx,ngridy,ngridz)\n",
	"\n",
	"avg = np.sum(f[:,3])/len(f)\n",
	"a /= avg\n",
	"p = np.fft.fftn(a)\n",
	"\n",
	"#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p\n",
	"#This is just a convension on fourier transformation so you can ignore this part\n",
	"kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )\n",
	"kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )\n",
	"kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )\n",
	"kx = np.zeros((ngridx,ngridy,ngridz))\n",
	"ky = np.zeros((ngridx,ngridy,ngridz))\n",
	"kz = np.zeros((ngridx,ngridy,ngridz))\n",
	"# Get rid of as many loops as you can. This is about 10 times\n",
	"# faster than the nested loops for this number of grid points,\n",
	"# but scales as N instead of N^3 (where N=ngridxngridyngridz).\n",
	"for row in range(ngridx):\n",
	" kx[row,:,:] = kValx[row]\n",
	"for column in range(ngridy):\n",
	" ky[:,column,:] = kValy[column]\n",
	"for height in range(ngridz):\n",
	" kz[:,:,height] = kValz[height]\n",
	"\n",
	"#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)\n",
	"#by taking the spherical shell of thickness 1 and averaging out the values inside it.\n",
	"#I am sure that this process can be optimised somehow, but I gave up.\n",
	"\n",
	"qlen = maxK//2 #Nyquist frequency\n",
	"q = np.zeros(((qlen),4),dtype=complex)\n",
	"#q is a four column array with length maxK/2.\n",
	"#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))\n",
	"#q[:,1] is the sum of square of the fourier transformed value \n",
	"#q[:,2] is the sum of the fourier transformed value, \n",
	"#and q[:,3] is the total number of samples with K=q[:,0]\n",
	"\n",
	"# Again, we get rid of nested loops, and get a large improvement in speed and scaling\n",
	"K = np.around(np.sqrt(kx2+ky2+kz**2)).astype(int)\n",
	"for i in range(qlen):\n",
	" indices = (K==i)\n",
	" q[i,0] = i\n",
	" q[i,1] = sum(np.abs(p[indices])**2)\n",
	" q[i,2] = sum(p[indices])\n",
	" q[i,3] = sum(indices)\n",
	"#print q"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"output_type": "stream",
	"stream": "stdout",
	"text": [
	"CPU times: user 805 ms, sys: 113 ms, total: 918 ms\n",
	"Wall time: 916 ms\n"
	]
	}
	],
	"prompt_number": 3
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"This version takes just under 1 second. We can check that the 0 and 3 columns of `q` and `q2` are identical (they're just integers, so the math will be exact):"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"np.array_equal(q[:,0::3], q2[:,0::3])"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 4,
	"text": [
	"True"
	]
	}
	],
	"prompt_number": 4
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We also check that all elements of `q` and `q2` are within roughly roundoff of each other. (They're not identical because the operations are done in slightly different orders, but they're as close as the computer knows how to get.)"
	]
	},
	{
	"cell_type": "code",
	"collapsed": false,
	"input": [
	"np.allclose(q, q2, rtol=1e-12, atol=1e-16)"
	],
	"language": "python",
	"metadata": {},
	"outputs": [
	{
	"metadata": {},
	"output_type": "pyout",
	"prompt_number": 5,
	"text": [
	"True"
	]
	}
	],
	"prompt_number": 5
	},
	{
	"cell_type": "markdown",
	"metadata": {},
	"source": [
	"We can glean two key lessons about making numpy faster:\n",
	"\n",
	" 1. Avoid loops as much as possible -- especially loops within loops. The main reason numpy exists is to move loops inside C code, so you don't have to loop in python. It has all sorts of functions like `sum`, `abs`, `sqrt`, etc., which operate on entire arrays at once.\n",
	" 2. Avoid copying data. Just use [slices](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) of numpy arrays unless you need to change the data."
	]
	}
	],
	"metadata": {}
	}
	]
	}