Skip to content

Instantly share code, notes, and snippets.

@minrk
Created August 11, 2013 02:55
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save minrk/6203173 to your computer and use it in GitHub Desktop.
Save minrk/6203173 to your computer and use it in GitHub Desktop.
Display the source blob
Display the rendered blob
Raw
{
"metadata": {
"gist_id": "6203173",
"name": ""
},
"nbformat": 3,
"nbformat_minor": 0,
"worksheets": [
{
"cells": [
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"IPython.parallel and generators"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Sometimes you have a generator for tasks,\n",
"but don't want to load all tasks from the generator all at once.\n",
"Reasons for this could include tasks involving lots of memory,\n",
"or taking a long time to load.\n",
"\n",
"In some of these cases, you can stop submitting tasks once you see a certain result.\n",
"For demonstration purposes, we will use a dumb implementation of determining whether a number is prime."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"import IPython\n",
"print IPython.sys_info()"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"{'codename': 'An Afternoon Hack',\n",
" 'commit_hash': 'a886d6e',\n",
" 'commit_source': 'repository',\n",
" 'default_encoding': 'UTF-8',\n",
" 'ipython_path': '/Users/minrk/dev/ip/mine/IPython',\n",
" 'ipython_version': '2.0.0-dev',\n",
" 'os_name': 'posix',\n",
" 'platform': 'Darwin-12.4.0-x86_64-i386-64bit',\n",
" 'sys_executable': '/usr/bin/python',\n",
" 'sys_platform': 'darwin',\n",
" 'sys_version': '2.7.2 (default, Oct 11 2012, 20:14:37) \\n[GCC 4.2.1 Compatible Apple Clang 4.0 (tags/Apple/clang-418.0.60)]'}\n"
]
}
],
"prompt_number": 1
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Our example 'data generator' is the possible factors for a given number.\n",
"This generator yields the number 2, and every odd integer \u2264 $\\sqrt{N}$."
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from math import sqrt\n",
"\n",
"def generate_possible_factors(N):\n",
" \"\"\"generator for iterating through possible factors for N\n",
" \n",
" yields 2, every odd integer <= sqrt(N)\n",
" \"\"\"\n",
" if N <= 3:\n",
" return\n",
" yield 2\n",
" f = 3\n",
" last = int(sqrt(N))\n",
" while f <= last:\n",
" yield f\n",
" f += 2\n",
"\n",
"for f in generate_possible_factors(300):\n",
" print f"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"2\n",
"3\n",
"5\n",
"7\n",
"9\n",
"11\n",
"13\n",
"15\n",
"17\n"
]
}
],
"prompt_number": 2
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now our trivial function that we will use as a task with `IPython.parallel`"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def is_factor(f, N):\n",
" \"\"\"is f a factor of N?\"\"\"\n",
" return (N % f) == 0"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 3
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"And a complete implementation of prime check using the generator and our factor function:"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def dumb_prime(N):\n",
" \"\"\"dumb implementation of is N prime?\"\"\"\n",
" for f in generate_possible_factors(N):\n",
" if is_factor(f, N):\n",
" return False\n",
" return True\n",
"\n",
"for N in range(900,1000):\n",
" if dumb_prime(N):\n",
" print N"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"907\n",
"911\n",
"919\n",
"929\n",
"937\n",
"941\n",
"947\n",
"953\n",
"967\n",
"971\n",
"977\n",
"983\n",
"991\n",
"997\n"
]
}
],
"prompt_number": 4
},
{
"cell_type": "heading",
"level": 1,
"metadata": {},
"source": [
"Now in Parallel"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"from IPython import parallel\n",
"\n",
"rc = parallel.Client()\n",
"dv = rc[:]\n",
"view = rc.load_balanced_view()\n",
"dv"
],
"language": "python",
"metadata": {},
"outputs": [
{
"metadata": {},
"output_type": "pyout",
"prompt_number": 6,
"text": [
"<DirectView [0, 1, 2, 3]>"
]
}
],
"prompt_number": 6
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Now we have a function that does prime checking in parallel,\n",
"with a limited number of tasks loaded from the generator at any given point in time.\n",
"\n",
"The logic of the function:\n",
"\n",
"1. submit up to `max_outstanding` tasks\n",
"2. if no more tasks to submit, stop submitting, skip to 7\n",
"3. wait a little bit, and check if any tasks are done.\n",
"4. if any are done, check if one found a factor for N\n",
"5. if we found a factor, we are done. Stop submitting and return False.\n",
"6. if we haven't found a factor yet, go back to 1. and submit more tasks to replace the once that have finished.\n",
"7. if we have submitted every task, wait for the last few to finish, and see if any found a factor\n",
"8. if no task found a factor, N is prime: return True"
]
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"def parallel_dumb_prime(N, v, max_outstanding=10, dt=0.1):\n",
" \"\"\"dumb_prime where each factor is checked remotely\n",
" \n",
" Up to `max_outstanding` factors will be checked in parallel.\n",
" \n",
" Submission will halt as soon as we know that N is not prime.\n",
" \"\"\"\n",
" tasks = set()\n",
" # factors is a generator\n",
" factors = generate_possible_factors(N)\n",
" while True:\n",
" try:\n",
" # submit a batch of tasks, with a maximum of `max_outstanding`\n",
" for i in range(max_outstanding-len(tasks)):\n",
" f = factors.next()\n",
" tasks.add(v.apply_async(is_factor, f, N))\n",
" except StopIteration:\n",
" # no more factors to test, stop submitting\n",
" break\n",
" # get the tasks that are done\n",
" ready = set(task for task in tasks if task.ready())\n",
" while not ready:\n",
" # wait a little bit for some tasks to finish\n",
" v.wait(tasks, timeout=dt)\n",
" ready = set(task for task in tasks if task.ready())\n",
" \n",
" for t in ready:\n",
" # get the result - if True, N is not prime, we are done\n",
" if t.get():\n",
" return False\n",
" # update tasks to only those that are still pending,\n",
" # and submit the next batch\n",
" tasks.difference_update(ready)\n",
" # check the last few outstanding tasks\n",
" for task in tasks:\n",
" if t.get():\n",
" return False\n",
" # checked all candidates, none are factors, so N is prime\n",
" return True\n"
],
"language": "python",
"metadata": {},
"outputs": [],
"prompt_number": 7
},
{
"cell_type": "code",
"collapsed": false,
"input": [
"for N in range(900,1000):\n",
" if parallel_dumb_prime(N, view, 10):\n",
" print N"
],
"language": "python",
"metadata": {},
"outputs": [
{
"output_type": "stream",
"stream": "stdout",
"text": [
"907\n",
"911"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"919"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"929"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"937"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"941"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"943"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"947"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"953"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"961"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"967"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"971"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"977"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"983"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"989"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"991"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n",
"997"
]
},
{
"output_type": "stream",
"stream": "stdout",
"text": [
"\n"
]
}
],
"prompt_number": 8
}
],
"metadata": {}
}
]
}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment