-
-
Save apdavison/36126ee26067592ee69bf51b57fd3f31 to your computer and use it in GitHub Desktop.
""" | |
Creates an HDF5 file with a single dataset of shape (channels, n), | |
filled with random numbers. | |
Writing to the different channels (rows) is parallelized using MPI. | |
Usage: | |
mpirun -np 8 python demo.py | |
Small shell script to run timings with different numbers of MPI processes: | |
for np in 1 2 4 8 12 16 20 24 28 32; do | |
echo -n "$np "; | |
/usr/bin/time --format="%e" mpirun -np $np python demo.py; | |
done | |
""" | |
from mpi4py import MPI | |
import h5py | |
import numpy as np | |
n = 100000000 | |
channels = 32 | |
num_processes = MPI.COMM_WORLD.size | |
rank = MPI.COMM_WORLD.rank # The process ID (integer 0-3 for 4-process run) | |
np.random.seed(746574366 + rank) | |
f = h5py.File('parallel_test.hdf5', 'w', driver='mpio', comm=MPI.COMM_WORLD) | |
dset = f.create_dataset('test', (channels, n), dtype='f') | |
for i in range(channels): | |
if i % num_processes == rank: | |
#print("rank = {}, i = {}".format(rank, i)) | |
data = np.random.uniform(size=n) | |
dset[i] = data | |
f.close() | |
""" | |
Some example timings on my workstation (32 cores): | |
1 61.98 70.05 64.61 63.47 | |
2 33.22 33.53 34.85 33.45 | |
4 44.6 20.38 20.3 19 | |
8 13.3 13.76 14.5 13.55 | |
12 14.62 14.98 12.75 33.24 | |
16 12 13.19 14.76 13.68 | |
20 14.75 14.82 14.46 14.33 | |
24 16.69 15.81 16.94 15.98 | |
28 17.61 18 17.56 17.78 | |
32 35.31 35.7 16.16 39.88 | |
""" |
I'm not an h5py or mpi4py expert, I mostly just posted this here as an aide memoire for myself, so probably you'd be better off reading the documentation for those projects and/or experimenting. I guess, however, that it doesn't matter how many datasets you have, e.g. dset1
, dset2
.
I am very new to h5py/mpi4py. I am trying to write some data to a single .h5 file in such a way that 2 processes are being in if rank == 0 positive test data(positive dataset) values will be written and if rank == 1 negative test data(negative dataset) values will be written. But when I triy to run with mpiexec -n 2 python parallel_exec.py I'm getting IOError: Unable to create file (unable to lock file, errno = 11, error message = 'Resource temporarily unavailable'). Could you suggest some insights that could help me out. Thanks in advance.
I'm sorry I don't have any idea how to fix that problem. Maybe ask on Stack Overflow?
What if I have multiple(more than one) create_dataset like dset = f.create_dataset('test', (channels, n), dtype='f') in line 42 and one more f.create_dataset('test_2', (channels, n), dtype='f'). In that case how should I modify the code