Skip to content

Instantly share code, notes, and snippets.

@moble
Created February 7, 2015 19:07
Show Gist options
  • Save moble/820999c8901968bbcd29 to your computer and use it in GitHub Desktop.
Save moble/820999c8901968bbcd29 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "",
"signature": "sha256:b99ad5d473337f218c8e6aae148bb4d3552bb1392306fbf35697573ee93818d7"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Making numpy faster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The following is the original code in in [this stackoverflow question](http://stackoverflow.com/q/28357897/1194883). I have just added one line\n",
"\n",
" np.random.seed(123)\n",
"\n",
"which ensures that the results are reproducible. Also, I've added the cell magic `%%time` to see how long it takes."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%time\n",
"\n",
"from __future__ import division\n",
"import numpy as np\n",
"\n",
"ngridx = 128\n",
"ngridy = 128 \n",
"ngridz = 128\n",
"\n",
"maxK = max(ngridx,ngridy,ngridz)\n",
"\n",
"#making input file\n",
"f = np.zeros((ngridx*ngridy*ngridz,4))\n",
"\n",
"np.random.seed(123)\n",
"i = 0\n",
"for i in np.arange(len(f)):\n",
" f[i][0] = int(i/(ngridy*ngridz))\n",
" f[i][1] = int((i/ngridz))%ngridy\n",
" f[i][2] = int(i%ngridz)\n",
" f[i][3] = np.random.rand(1)\n",
"# if i%1000000 ==0:\n",
"# print i\n",
"#This takes forever\n",
"#end making input file\n",
"\n",
"#Thanks to Mike,\n",
"a = f[:,3].reshape(ngridx,ngridy,ngridz)\n",
"\n",
"avg =np.sum(f[:,3])/len(f)\n",
"a /= avg\n",
"p = np.fft.fftn(a)\n",
"#This part is much much faster than before (Original Post).\n",
"\n",
"#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p\n",
"#This is just a convension on fourier transformation so you can ignore this part\n",
"kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )\n",
"kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )\n",
"kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )\n",
"kx = np.zeros((ngridx,ngridy,ngridz))\n",
"ky = np.zeros((ngridx,ngridy,ngridz))\n",
"kz = np.zeros((ngridx,ngridy,ngridz))\n",
"rangecolx = np.arange(ngridx)\n",
"rangecoly = np.arange(ngridy)\n",
"rangecolz = np.arange(ngridz)\n",
"for row in np.arange(ngridx):\n",
" for column in np.arange(ngridy):\n",
" for height in np.arange(ngridz):\n",
" kx[row][column][height] = (kValx[row])\n",
" ky[row][column][height] = (kValy[column])\n",
" kz[row][column][height] = (kValz[height])\n",
"# if row%10 == 0:\n",
"# print row\n",
"# print 'wavenumber generate complete!'\n",
"\n",
"#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)\n",
"#by taking the spherical shell of thickness 1 and averaging out the values inside it.\n",
"#I am sure that this process can be optimised somehow, but I gave up.\n",
"\n",
"qlen = maxK/2 #Nyquist frequency\n",
"q = np.zeros(((qlen),4),dtype=complex)\n",
"#q is a four column array with length maxK/2.\n",
"#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))\n",
"#q[:,1] is the sum of square of the fourier transformed value \n",
"#q[:,2] is the sum of the fourier transformed value, \n",
"#and q[:,3] is the total number of samples with K=q[:,0]\n",
"\n",
"for i in np.arange(len(q)):\n",
" q[i][0] = i\n",
"i = 0\n",
"for i in np.arange(len(p)):\n",
" for r in np.arange(len(p[0])):\n",
" for s in np.arange(len(p[0,0])):\n",
" K = np.around(np.sqrt(kx[i,r,s]**2+ky[i,r,s]**2+kz[i,r,s]**2))\n",
" if K < qlen:\n",
" q[K][1]=q[K][1]+np.abs(p[i,r,s])**2\n",
" q[K][2]=q[K][2]+p[i,r,s]\n",
" q[K][3]=q[K][3]+1 \n",
"# if i%10 ==0:\n",
"# print 'i = ',i,' !'\n",
"# print q"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 32 s, sys: 213 ms, total: 32.2 s\n",
"Wall time: 32.2 s\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As we can see, it takes a little over 30 seconds.\n",
"\n",
"I'll make a copy of the main result, `q`, so we can compare it to the result from my version of the code and ensure that I haven't screwed anything up."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"q2 = np.copy(q)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now, here's my somewhat optimized version of the same code."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%%time\n",
"\n",
"from __future__ import division\n",
"import numpy as np\n",
"\n",
"ngridx = 128\n",
"ngridy = 128\n",
"ngridz = 128\n",
"\n",
"maxK = max(ngridx,ngridy,ngridz)\n",
"\n",
"#making input file\n",
"f = np.zeros((ngridx*ngridy*ngridz,4))\n",
"np.random.seed(123)\n",
"# Make `i` into an array, then use numpy to operate on\n",
"# the whole array at once. This is about 35 times\n",
"# faster than the loop method.\n",
"i = np.arange(len(f))\n",
"f[:,0] = i//(ngridy*ngridz)\n",
"f[:,1] = i//(ngridz) % ngridy\n",
"f[:,2] = i%ngridz\n",
"f[:,3] = np.random.rand(f.shape[0])\n",
"#end making input file\n",
"\n",
"a = f[:,3].reshape(ngridx,ngridy,ngridz)\n",
"\n",
"avg = np.sum(f[:,3])/len(f)\n",
"a /= avg\n",
"p = np.fft.fftn(a)\n",
"\n",
"#Keeping track of corresponding wavenumbers (k_x, k_y,k_z) for each element in p\n",
"#This is just a convension on fourier transformation so you can ignore this part\n",
"kValx = np.fft.fftfreq( ngridx , (1.0 / ngridx ) )\n",
"kValy = np.fft.fftfreq( ngridy , (1.0 / ngridy ) )\n",
"kValz = np.fft.fftfreq( ngridz , (1.0 / ngridz ) )\n",
"kx = np.zeros((ngridx,ngridy,ngridz))\n",
"ky = np.zeros((ngridx,ngridy,ngridz))\n",
"kz = np.zeros((ngridx,ngridy,ngridz))\n",
"# Get rid of as many loops as you can. This is about 10 times\n",
"# faster than the nested loops for this number of grid points,\n",
"# but scales as N instead of N^3 (where N=ngridx*ngridy*ngridz).\n",
"for row in range(ngridx):\n",
" kx[row,:,:] = kValx[row]\n",
"for column in range(ngridy):\n",
" ky[:,column,:] = kValy[column]\n",
"for height in range(ngridz):\n",
" kz[:,:,height] = kValz[height]\n",
"\n",
"#Calculating the average powerspectrum in terms of fixed K (Distance from origin to a point in fourier space)\n",
"#by taking the spherical shell of thickness 1 and averaging out the values inside it.\n",
"#I am sure that this process can be optimised somehow, but I gave up.\n",
"\n",
"qlen = maxK//2 #Nyquist frequency\n",
"q = np.zeros(((qlen),4),dtype=complex)\n",
"#q is a four column array with length maxK/2.\n",
"#q[:,0] is integer wavenumber (K, which is the distance from the origin = sqrt(kx^2+ky^2+kz^2))\n",
"#q[:,1] is the sum of square of the fourier transformed value \n",
"#q[:,2] is the sum of the fourier transformed value, \n",
"#and q[:,3] is the total number of samples with K=q[:,0]\n",
"\n",
"# Again, we get rid of nested loops, and get a large improvement in speed and scaling\n",
"K = np.around(np.sqrt(kx**2+ky**2+kz**2)).astype(int)\n",
"for i in range(qlen):\n",
" indices = (K==i)\n",
" q[i,0] = i\n",
" q[i,1] = sum(np.abs(p[indices])**2)\n",
" q[i,2] = sum(p[indices])\n",
" q[i,3] = sum(indices)\n",
"#print q"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"CPU times: user 805 ms, sys: 113 ms, total: 918 ms\n",
"Wall time: 916 ms\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"This version takes just under 1 second. We can check that the 0 and 3 columns of `q` and `q2` are identical (they're just integers, so the math will be exact):"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"np.array_equal(q[:,0::3], q2[:,0::3])"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 4,
"text": [
"True"
]
}
],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We also check that all elements of `q` and `q2` are within roughly roundoff of each other. (They're not identical because the operations are done in slightly different orders, but they're as close as the computer knows how to get.)"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"np.allclose(q, q2, rtol=1e-12, atol=1e-16)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 5,
"text": [
"True"
]
}
],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can glean two key lessons about making numpy faster:\n",
"\n",
" 1. Avoid loops as much as possible -- especially loops within loops. The main reason numpy exists is to move loops inside C code, so you don't have to loop in python. It has all sorts of functions like `sum`, `abs`, `sqrt`, etc., which operate on entire arrays at once.\n",
" 2. Avoid copying data. Just use [slices](http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html) of numpy arrays unless you need to change the data."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment