Created
February 5, 2015 15:41
-
-
Save joezuntz/7c590a8652a1da6dc4c9 to your computer and use it in GitHub Desktop.
Simplest possible mpi4py job splitting
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
#Suppose you have a collection of tasks, which in this example I'll assume is just running a function f. | |
#If these tasks are completely separate and independent the most then you can parallelize them easily. | |
#In this gist I'll show the simplest possible way to do this using mpi4py. | |
#There are better ways to do this, in particular if the tasks vary significantly in time taken to run. | |
import mpi4py.MPI | |
def f(i): | |
"A fake task - in this case let just open a file and write a number to it" | |
#open file with name based on task number | |
f=open("%d.txt"%i,"w") | |
#write some info to it | |
f.write("%d * 10 = %d\n"%(i, 10*i)) | |
#close the file | |
f.close() | |
#A list of all the tasks to do. In your case you will probably build this task list in a more complex way. | |
#You don't even need to build it in advance for this approach to work | |
task_list = range(100) | |
#main program loop. This is the unparallelized verion, for comparison | |
for task in task_list: | |
f(task) | |
#And now moving on the parallel version | |
#mpi4py has the notion of a "communicator" - a collection of processors | |
#all operating together, usually on the same program. Each processor | |
#in the communicator is identified by a number, its rank, We'll use that | |
#number to split the tasks | |
#find out which number processor this particular instance is, | |
#and how many there are in total | |
rank = mpi4py.MPI.COMM_WORLD.Get_rank() | |
size = mpi4py.MPI.COMM_WORLD.Get_size() | |
#parallelized version | |
#the enumerate function gives us a number i in addition | |
#to the task. (In this specific case i is the same as task! But that's | |
#not true usually) | |
for i,task in enumerate(task_list): | |
#This is how we split up the jobs. | |
#The % sign is a modulus, and the "continue" means | |
#"skip the rest of this bit and go to the next time | |
#through the loop" | |
# If we had e.g. 4 processors, this would mean | |
# that proc zero did tasks 0, 4, 8, 12, 16, ... | |
# and proc one did tasks 1, 5, 9, 13, 17, ... | |
# and do on. | |
if i%size!=rank: continue | |
print "Task number %d (%d) being done by processor %d of %d" % (i, task, rank, size) | |
f(task) |
this helped me today. Thanks!
Hi, if my function returns a array and the input is the list of array if I want to return following the same order as the input, like, suppose we have 20 task, 4 processor. 1, 2,3, 4, 5, tasks are done by first processor, 6, 7, 8, 9, 10 are done by second one, and so on. the final result is list of the returned array.
how can I do that? sry, maybe its dumb question, I'm new to the mpi.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Thanks for that nice little script!