Skip to content

Instantly share code, notes, and snippets.

@minrk
Created February 19, 2013 02:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save minrk/4982709 to your computer and use it in GitHub Desktop.
Save minrk/4982709 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "parallel_tweaking3"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Diagnosing Slow Parallel Inner Products"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.parallel import Client, require, interactive"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rc = Client()\n",
"dv = rc.direct_view()\n",
"lv = rc.load_balanced_view()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"with dv.sync_imports():\n",
" import numpy"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"importing numpy on engine(s)\n"
]
}
],
"prompt_number": 3
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"mat = numpy.random.random_sample((800, 800))\n",
"mat = numpy.asfortranarray(mat)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def simple_inner(i):\n",
" column = mat[:, i]\n",
" # have to use a list comprehension to prevent closure\n",
" return sum([numpy.inner(column, mat[:, j]) for j in xrange(i + 1, mat.shape[1])])"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 5
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Local, serial performance."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit sum(simple_inner(i) for i in xrange(mat.shape[1] - 1))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 1.44 s per loop\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dv.push(dict(mat=mat), block=True);"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parallel implementation using a `DirectView`."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit sum(dv.map(simple_inner, range(mat.shape[1] - 1), block=False))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 3.34 s per loop\n"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parallel implementation using a `LoadBalancedView` with a large `chunksize` and unordered results."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit sum(lv.map(simple_inner, range(mat.shape[1] - 1), ordered=False, chunksize=(mat.shape[1] - 1) // len(lv), block=False))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 2.79 s per loop\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But those are super slow! Why?"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"amr = dv.map(simple_inner, range(mat.shape[1] - 1), block=False)\n",
"amr.get()\n",
"s = sum(amr)\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print \"serial time: %.3f\" % amr.serial_time\n",
"print \" wall time: %.3f\" % amr.wall_time"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"serial time: 10.576\n",
" wall time: 4.898\n"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"But that's weird, the total computation time was over ten seconds.\n",
"That says that maybe the computation itself is slow on the engines for some reason.\n",
"\n",
"Let's try running the local code *exactly* on one of the engines."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"e0 = rc[0]\n",
"e0.block = True\n",
"e0.activate('0') # for %px0 magic\n",
"e0.push(dict(simple_inner=simple_inner));"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 15
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"# execute the timeit line on engine zero, *exactly* as we typed it above\n",
"%px0 %timeit sum(simple_inner(i) for i in xrange(mat.shape[1] - 1))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 11.4 s per loop\n"
]
}
],
"prompt_number": 16
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now that's super slow, even though the code is identical to the first run!\n",
"IPython.parallel isn't getting in the way at all, here,\n",
"so something must be up.\n",
"\n",
"The only optimization we have made is the `asfortranarray`, so let's check `mat.flags`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print 'local'\n",
"print mat.flags\n",
"print 'engine 0:'\n",
"%px0 print mat.flags"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"local\n",
" C_CONTIGUOUS : False\n",
" F_CONTIGUOUS : True\n",
" OWNDATA : True\n",
" WRITEABLE : True\n",
" ALIGNED : True\n",
" UPDATEIFCOPY : False\n",
"engine 0:\n",
" C_CONTIGUOUS : False\n",
" F_CONTIGUOUS : True\n",
" OWNDATA : True\n",
" WRITEABLE : True\n",
" ALIGNED : True\n",
" UPDATEIFCOPY : False\n"
]
}
],
"prompt_number": 22
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Aha! `mat` on the engines is somehow not FORTRAN-contiguous.\n",
"Maybe we will get our performance back if we re-apply the transformation on the engines\n",
"after the push."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%px mat = numpy.asfortranarray(mat)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 19
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And re-run the timings, to check:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit sum(dv.map(simple_inner, range(mat.shape[1] - 1), block=False))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 470 ms per loop\n"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%timeit sum(lv.map(simple_inner, range(mat.shape[1] - 1), ordered=False, chunksize=(mat.shape[1] - 1) // len(lv), block=False))"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"1 loops, best of 3: 375 ms per loop\n"
]
}
],
"prompt_number": 21
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Yes, that's much more sensible than eleven seconds."
]
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment