Make sure you're using the latest spack
from the develop
branch and the
latest mochi-margo
from the main
branch. Note: The mochi-margo
package is
contained in the default spack
repo, but we can't use that one. We must use
the version at https://github.com/mochi-hpc/mochi-spack-packages. That is the
only way we can get a new enough margo
and mercury
that have the GPU
capabilities we're testing. For reference, the GPU capabilites were added to
Mercury in this PR, and
wrappers for the feature were added to margo in this PR.
Use the system (preinstalled) MPI instead of letting spack build it. See the
documentation on System packages
and External packages.
You can try the spack external find
command to auto-populate some external
packages, although it doesn't seem to work for MPI.
git clone https://github.com/spack/spack
source spack/share/spack/setup-env.sh
git clone https://github.com/mochi-hpc/mochi-spack-packages
spack repo add mochi-spack-packages
spack install mochi-margo@develop
# mochi-ssg is needed for the margo-p2p-bw benchmark (see below)
# Pass whatever MPI implementation ThetaGPU has as a dependency to this command.
# Spack preferes OpenMPI by default.
# Check the log and make sure spack doesn't try to build MPI.
spack install mochi-ssg@develop ^<mpich | openmpi | ...>
Here is a basic tutorial. Make sure you can get that working on ThetaGPU.
Here is the libfabric example code that Jerome recommended. That should be all the CUDA necessary for us.
The margo-p2p-bw
benchmark can be used as a starting point for the GPU test.
Hi Chris,
Sorry for the delay, but I just landed the margo PR to add support for the bulk
attr function present in Mercury 2.2.0rc1:
https://github.com/mochi-hpc/mochi-margo/pull/185
Here is the existing margo point to point bandwidth benchmark that you might
want to consider as a starting point (it doesn't have to be, but it could be
helpful at least for boilerplate on the Margo initialization side of things):
https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/margo-p2p-bw.c
That repository is intended to be configured and built manually outside of spack
(because it also serves as a sanity check that packages like Margo and Mercury
that are built with spack are presented properly to user applications). It
requires at least mochi-margo, mochi-ssg, and mpi to build the margo benchmarks.
There are subdirectories under "perf-regression" with scripts for a few
platforms, but we'll need a new one for theta-gpu. To get the GPU support you'll
further need mercury@2.2.0rc1 and mochi-margo@main, rather than the default
versions.
Let me know who needs theta-gpu access for testing and I'll help get them
through the account request process. To recap, that's our main production system
with GPUs in it (Nvidia), but we haven't done significant testing there yet. I'm
happy to help if there are any problems generally getting anything up and
running, system or test program related.
thanks,
-Phil
I would start by getting this benchmark working on ThetaGPU before modifying it for GPU testing.
- Move data from GPU memory of node A to main memory of node B. Verify correctness.
- Move data from main memory of node A to GPU memory of node B. Verify correctness.
- Use
cudaMalloc
andcudaMemcpy
to set up GPU memory. - Modify
margo-p2p-bw
to use themargo_bulk_create_attr
function - Need to pass an
hg_bulk_attr
tomargo_bulk_create_attr
withmem_type = HG_MEM_TYPE_CUDA
anddevice = <CUDA device ID>
. I'm not entirely positive how to get the device ID, but I assume there is a CUDA API call. - Add a
thetaGPU
folder inmochi-tests/perf-regression
with scripts similar to those inmochi-tests/perf-regression/theta
that run your GPU test.
Set up a nightly cron job to run the GPU tests from step 4.