Skip to content

Instantly share code, notes, and snippets.

@brantfaircloth
Created April 13, 2012 19:43
Show Gist options
  • Save brantfaircloth/2379572 to your computer and use it in GitHub Desktop.
Save brantfaircloth/2379572 to your computer and use it in GitHub Desktop.
mpi4py "file-passing" scatter/gather example
import os
import tempfile
from mpi4py import MPI
comm = MPI.COMM_WORLD
size = comm.Get_size()
rank = comm.Get_rank()
mode = MPI.MODE_RDONLY
if rank == 0:
# get a list of the files to scatter
#for f in glob.glob(args.alignments):
work = ['test1.txt', 'test2.txt']
else:
work = None
# scatter work units to nodes
unit = comm.scatter(work, root=0)
# ===================================
# This should be node-related work
# ===================================
# open the file on a node
f = MPI.File.Open(comm, unit, mode)
# create a buffer for the data of size f.Get_size()
ba = bytearray(f.Get_size())
# read the contents into a byte array
f.Iread(ba)
# close the file
f.Close()
# write buffer to a tempfile
descriptor, path = tempfile.mkstemp(suffix='mpi.txt')
print path
tf = os.fdopen(descriptor, 'w')
tf.write(ba)
tf.close()
# get contents of tempfile
contents = open(path, 'rU').read() + str(comm.Get_rank())
os.remove(path)
# ===================================
# End of node-related work
# ===================================
# gather results
result = comm.gather(contents, root=0)
# do something with result
if rank == 0:
print result
else:
result = None
@brantfaircloth
Copy link
Author

yes, mpi4py works well and encapsulates most of the spec, but non-trivial examples of using mpi4py are very hard to come by. I'm still not sure this works well across multiple nodes without having the files shared over NFS or similar. will be trying that out soon as we need to port the Phyluce code to work on MPI clusters sooner rather than later.

also might like to gin up mpi4py version of cloudforest code to run using starcluster.

@brantfaircloth
Copy link
Author

The scatter-gather approach works fine as long as the you divide jobs equally amongst nodes. What I really wanted was an MPI equivalent of map() or Pool.map() (from multiprocessing). You can find a lightweight version of that here:

https://github.com/twiecki/mpi4py_map

@brantfaircloth
Copy link
Author

In addition to mpi4py_map, see

http://code.google.com/p/deap/

particularly, another map() implementation that's a bit fancier:

http://deap.gel.ulaval.ca/doc/0.8/api/dtm.html#module-deap.dtm

@brantfaircloth
Copy link
Author

Also, mpi4py_map() example here:

https://gist.github.com/2606999

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment