-
-
Save pantaray/6c3f17ef005d2c7b33f17bc6d6841508 to your computer and use it in GitHub Desktop.
# Consider the following function | |
def arr_func(x, y, z=3): | |
return x * (y + z) | |
# We want to evaluate `arr_func` for three different values of `x` and the same `y`: | |
xList = [np.array([1, 2, 3]), np.array([-1, -2, -3]), np.array([-1, 2, -3])] | |
y = np.array([4, 5, 6]) | |
# Firing off `ParallelMap` and collecting results in memory yields | |
with ParallelMap(arr_func, xList, y, z=0.5, write_worker_results=False) as pmap: | |
res = pmap.compute() | |
>>> res | |
[array([ 7, 14, 21]), array([ -8, -16, -24]), array([ -9, 18, -27])] | |
# Now, with `write_worker_results = True` (the default) ACME currently | |
# creates `n_inputs` HDF5 containers: | |
with ParallelMap(arr_func, xList, y, z=0.5) as pmap: | |
res = pmap.compute() | |
>>> res | |
['/cs/home/username/ACME_20220906-1135-448825/arr_func_0.h5', | |
'/cs/home/username/ACME_20220906-1135-448825/arr_func_1.h5', | |
'/cs/home/username/ACME_20220906-1135-448825/arr_func_2.h5'] | |
# How about, ACME as a new default (additionally) generates a single HDF5 | |
# container that collects all worker-results: | |
>>> res | |
'/cs/home/username/ACME_20220906-1135-448825/arr_func.h5' | |
# Each worker-result can be accessed using a by-worker HDF group: | |
>>> h5f = h5py.File('/cs/home/username/ACME_20220906-1135-448825/arr_func.h5', 'r') | |
>>> h5f.keys() | |
<KeysViewHDF5 ['worker_0', 'worker_1', 'worker_2']> | |
>>> h5f['worker_0'].keys() | |
<KeysViewHDF5 ['result_0']> | |
>>> h5f['worker_1'].keys() | |
<KeysViewHDF5 ['result_0']> | |
>>> h5f['worker_2'].keys() | |
<KeysViewHDF5 ['result_0']> | |
# The actual results can then be accessed directly from the single container | |
>>> h5f['worker_0']['result_0'][()] | |
array([ 7, 14, 21]) | |
>>> h5f['worker_1']['result_0'][()] | |
array([ -8, -16, -24]) | |
>>> h5f['worker_2']['result_0'][()] | |
array([ -9, 18, -27]) | |
# In addition, ACME could (optionally) support a `outShape` keyword to not | |
# store results in different HDF groups/datasets but collect everything in one | |
# array-like structure. By specifying `outShape = (None, 3)` you could tell | |
# ACME to stack all resulting 3-element arrays along the first dimension: | |
with ParallelMap(arr_func, xList, y, z=0.5, outShape=(None, 3)) as pmap: | |
res = pmap.compute() | |
>>> h5f = h5py.File('/cs/home/username/ACME_20220906-1135-448825/arr_func.h5', 'r') | |
>>> h5f['result_0'][()] | |
array([[ 7, 14, 21], | |
[ -8, -16, -24], | |
[ -9, 18, -27]]) |
Great! Thanks for the quick feedback!
Haha, yes I agree it's all so confusing... How about calling them computations instead? comp_0, comp_1, etc
Sold! comp_*
it is :)
Sorry I'm a bit late to replay here, I really like the idea! comp_*
seems like a good naming convention too. Just a small worry, what will happen if not all the computations complete? Will the whole file be corrupted or only the parts that didn't return?
Hi Katharine! Thanks for taking a look at this and the feedback!
In the prototype implementation (results_collection
branch in the repo) file crash handling is more or less unchanged: if some computations crash, the "symlinks" in the results file pointing to the incomplete calculations just grasp at nothing. If your "emergency pickling" mechanism kicks in, the results file is removed (since HDF5 files cannot contain links to pickled data) and it's basically back to the current state (ACME generates a bunch of .pickle
and .h5
files).
Only the case single_file=True
(i.e., all workers really write to the same file, i.e., the results container really is just one regular file not a collection of symlinks) might run into trouble. If a worker dies while writing to the file, it might end up being corrupted and unreadable - I've included a warning message for this case.
Haha, yes I agree it's all so confusing... How about calling them computations instead? comp_0, comp_1, etc
Great! Then I think it makes a lot of sense to throw them all in 1 container! I still like the
single_file = True
option though, just to be flexible.I'm excited about the new features, sounds all great!