Skip to content

Instantly share code, notes, and snippets.

Show Gist options
  • Save tritemio/5498153 to your computer and use it in GitHub Desktop.
Save tritemio/5498153 to your computer and use it in GitHub Desktop.
An IPython Notebook as an Interactive Parallel Computing Tutorial
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"name": "Interactive IPython Parallel Computing Tutorial"
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Interactive IPython Parallel Computing Tutorial\n",
"============================\n",
"Introduction\n",
"-----------\n",
"This tutorial shows some a basic examples on how to use the powerful Parallel Computing functionality of [IPython](http://ipython.org/).\n",
"\n",
"The tutorial itself is written in IPython Notebook of which you are reading is a static HTML representation. To execute this notebook on your computer download it and drag&drop the file on the Notebook Dashboard. You can find a \"Download Notebook\" link in the upper right part of this page.\n",
"\n",
"Other interesting resources:\n",
" \n",
"- [Running Code in the IPython Notebook](http://nbviewer.ipython.org/urls/github.com/ipython/ipython/raw/master/examples/notebooks/Part%25201%2520-%2520Running%2520Code.ipynb)\n",
"- [Offician IPython Documentation](http://ipython.org/documentation.html)\n",
"\n",
"\n",
"**DISCLAIMER**: Part of this tutorial is shamelessly copied from the [Official IPython Parallel Computing Documentation](http://ipython.org/ipython-doc/stable/parallel/index.html).\n",
"\n",
"Installation Requirements\n",
"-----------\n",
"To run this tutorial you have to install a recent version of [IPython](http://ipython.org/). Some commands will also require Numpy. \n",
"\n",
"It is recommended however to install a complete scientific python environment.\n",
"\n",
"On Windows, my favorite scientific python distribution is [WinPython](http://code.google.com/p/winpython/): it has 64bit support, includes a wonderful IDE called [Spyder](http://code.google.com/p/spyderlib/), and the installation folder can be moved anywhere.\n",
"\n",
"After installing WinPython, to launch the IPython Notebook click on **WinPython Command Prompt** and type:\n",
"\n",
" ipython notebook --pylab inline\n",
"\n",
"At this point a web browser should automagically open showing the **IPython Notebook Dashboard**."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting the the cluster\n",
"------------------\n",
"\n",
"For the purposes this tutorial you can start a cluster on your local machine. Just go to the \"Notebook Dashboard\" tab in your browser, click on the *Cluster* tab and specify the number of \"parallel python sessions\" (called **ipengines**) to start. A good number is the number of your cores. After clicking **Start** your local cluster should be running.\n",
"\n",
"**NOTE:** To setup a more complex cluster you can follow the official IPython documentation [here](http://ipython.org/ipython-doc/stable/parallel/parallel_process.html). \n",
"\n",
"Once the cluster is started (doesn't matter if locally or on remote machines) the following tutorial can be followed and re-executed."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Starting a parallel session\n",
"--------------------\n",
"\n",
"Once the cluster is started we ca oper a new ipython notebook and run this command to connect to the running engines:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython.parallel import Client"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 1
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rc = Client()"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `rc` variable will contain all the running enignes. With the `.ids` attribute we can see the ID associated with each engine. If the list is empty no engine is running. In our case:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"rc.ids"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 3,
"text": [
"[0, 1]"
]
}
],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To use the engine we first have to select them. The selection is done through python indexing or slicing. For example to select all the running engines just do:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview = rc[:]"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 4
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now `dview` contains a DirectView object that can be used to send/receive code and data back an forth between our session and the engines."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute code on the enignes\n",
"---------------------------\n",
"Our enignes are basically multiple ipython process running in parallel. To run a command an all our engines we can use the **`%px`** magic command. Let see some examples:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%px print 'ciao'"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[stdout:0] ciao\n",
"[stdout:1] ciao\n"
]
}
],
"prompt_number": 5
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%px import os\n",
"%px print os.getpid()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[stdout:0] 3375\n",
"[stdout:1] 3376\n"
]
}
],
"prompt_number": 6
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"%px from numpy.random import randint\n",
"%px a = rand(5)\n",
"%px print a"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"[stdout:0] [ 0.15690218 0.56782216 0.92297292 0.19870273 0.39490221]\n",
"[stdout:1] [ 0.06350091 0.94723982 0.21775028 0.2323376 0.19959411]\n"
]
}
],
"prompt_number": 7
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE:** Under the hood the **%px** command uses the method [`dview.execute()`](http://ipython.org/ipython-doc/stable/api/generated/IPython.parallel.client.view.html?highlight=view.execute#IPython.parallel.client.view.DirectView.execute) to run the command. This method returns an [AsyncResult object](http://ipython.org/ipython-doc/stable/parallel/asyncresult.html) that is used to see the output. The magic **%px** convenientily shows the output right away."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Transferring data: Push/Pull\n",
"-----------------------------\n",
"\n",
"With the last command we created a variable **a** on each engine. To transfer it to our local session, we **pull** it using the **`dview`** object:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview['a']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 8,
"text": [
"[array([ 0.15690218, 0.56782216, 0.92297292, 0.19870273, 0.39490221]),\n",
" array([ 0.06350091, 0.94723982, 0.21775028, 0.2323376 , 0.19959411])]"
]
}
],
"prompt_number": 8
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Basically, the python dictionary syntax is used on the dview object to **pull** data from the engines. We see that the command return a list in which each element is the requested object (in this case a numpy array).\n",
"\n",
"Similarly, in order to **push** data to the remote engines we can use the dictionary assignment syntax:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview['b'] = 3"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 9
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**NOTE:** The same functionality can be obtained through the methods `dview.push()` and `dview.pull()`. See [here](http://ipython.org/ipython-doc/stable/parallel/parallel_multiengine.html#moving-python-objects-around) for more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"###Push/Pull Numpy arrays\n",
"\n",
"\n",
"When moving Numpy arrays we must be aware that the data at destination is always read-only. To modify the array we must make a copy.\n",
"\n",
"See [Details of Parallel Computing with IPython](ipython.org/ipython-doc/stable/parallel/parallel_details.html) for more information."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Parallel map: map_syn(), map_asynch()\n",
"------------\n",
"As a first example we use the parallel map that returns a list.\n",
"This example apply the scatter/gather method to split an array/list, send the fragments to the engines (all apply the same function but on different data), and finally recollect (gather) the result in a single list."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"parallel_result = dview.map_sync(lambda x: x**10, arange(32))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 10
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"serial_result = map(lambda x:x**10, arange(32))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 11
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"(parallel_result == serial_result)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 12,
"text": [
"True"
]
}
],
"prompt_number": 12
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Execute functions on the engines: apply*\n",
"-------------------------------\n",
"This will call the same function on all the engines"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview.block=True\n",
"dview['a'] = 5\n",
"dview['b'] = 10\n",
"dview.apply(lambda x: a+b+x, 27)"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 13,
"text": [
"[42, 42]"
]
}
],
"prompt_number": 13
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ar = dview.apply_async(lambda x: a+b+x, 33)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 14
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"ar.get()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 15,
"text": [
"[48, 48]"
]
}
],
"prompt_number": 15
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"We can send different functions to different engines using the target property:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview.targets = 0\n",
"ar0 = dview.apply_async(lambda x: a+b+x, 27)\n",
"dview.targets = 1\n",
"ar1 = dview.apply_async(lambda x: a+b+x, 33)"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 16
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"print ar0.get(), ar1.get()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"42 48\n"
]
}
],
"prompt_number": 17
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Alternatively one can create different DirectViews (dview) by slicing rc and apply a different function to each of them."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Scatter/Gather\n",
"-------------"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview.targets = [0,1]\n",
"dview.scatter('a',arange(16))"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 18
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview.gather('a')"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 19,
"text": [
"array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15])"
]
}
],
"prompt_number": 19
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"dview['a']"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "pyout",
"prompt_number": 20,
"text": [
"[array([0, 1, 2, 3, 4, 5, 6, 7]), array([ 8, 9, 10, 11, 12, 13, 14, 15])]"
]
}
],
"prompt_number": 20
},
{
"cell_type": "code",
"collapsed": false,
"input": [],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 20
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment