ChristopherHogan/mercury_gpu_test_plan.md

## mercury_gpu_test_plan.md

      
    Raw
  

              mercury_gpu_test_plan.md
            
          
    1. Install mochi components

Mochi Documentation
Make sure you're using the latest spack from the develop branch and the
latest mochi-margo from the main branch. Note: The mochi-margo package is
contained in the default spack repo, but we can't use that one. We must use
the version at https://github.com/mochi-hpc/mochi-spack-packages. That is the
only way we can get a new enough margo and mercury that have the GPU
capabilities we're testing. For reference, the GPU capabilites were added to
Mercury in this PR, and
wrappers for the feature were added to margo in this PR.
Use the system (preinstalled) MPI instead of letting spack build it. See the
documentation on System packages
and External packages.
You can try the spack external find command to auto-populate some external
packages, although it doesn't seem to work for MPI.
git clone https://github.com/spack/spack
source spack/share/spack/setup-env.sh
git clone https://github.com/mochi-hpc/mochi-spack-packages
spack repo add mochi-spack-packages
spack install mochi-margo@develop

# mochi-ssg is needed for the margo-p2p-bw benchmark (see below)
# Pass whatever MPI implementation ThetaGPU has as a dependency to this command.
# Spack preferes OpenMPI by default.
# Check the log and make sure spack doesn't try to build MPI.
spack install mochi-ssg@develop ^<mpich | openmpi | ...>
2. Run A Basic CUDA Program on ThetaGPU

Here is a basic tutorial.
Make sure you can get that working on ThetaGPU.
Here is the libfabric example code
that Jerome recommended. That should be all the CUDA necessary for us.
3. Build and run margo-p2p-bw

The margo-p2p-bw benchmark can be used as a starting point for the GPU test.
Email from Phil

Hi Chris,

Sorry for the delay, but I just landed the margo PR to add support for the bulk
attr function present in Mercury 2.2.0rc1:

https://github.com/mochi-hpc/mochi-margo/pull/185

Here is the existing margo point to point bandwidth benchmark that you might
want to consider as a starting point (it doesn't have to be, but it could be
helpful at least for boilerplate on the Margo initialization side of things):

https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/margo-p2p-bw.c

That repository is intended to be configured and built manually outside of spack
(because it also serves as a sanity check that packages like Margo and Mercury
that are built with spack are presented properly to user applications). It
requires at least mochi-margo, mochi-ssg, and mpi to build the margo benchmarks.
There are subdirectories under "perf-regression" with scripts for a few
platforms, but we'll need a new one for theta-gpu. To get the GPU support you'll
further need mercury@2.2.0rc1 and mochi-margo@main, rather than the default
versions.

Let me know who needs theta-gpu access for testing and I'll help get them
through the account request process. To recap, that's our main production system
with GPUs in it (Nvidia), but we haven't done significant testing there yet. I'm
happy to help if there are any problems generally getting anything up and
running, system or test program related.

thanks,

-Phil

I would start by getting this benchmark working on ThetaGPU before modifying it
for GPU testing.
4. Implement the GPU tests

2 Basic Tests (requires 2 nodes)


Move data from GPU memory of node A to main memory of node B. Verify correctness.
Move data from main memory of node A to GPU memory of node B. Verify correctness.

Details


Use cudaMalloc and cudaMemcpy to set up GPU memory.
Modify margo-p2p-bw to use the margo_bulk_create_attr function
Need to pass an hg_bulk_attr to margo_bulk_create_attr with mem_type = HG_MEM_TYPE_CUDA and device = <CUDA device ID>. I'm not entirely positive how
to get the device ID, but I assume there is a CUDA API call.
Add a thetaGPU folder in mochi-tests/perf-regression with scripts similar
to those in mochi-tests/perf-regression/theta that run your GPU test.

5. Add ThetaGPU folder to mochi-tests/perf-regression

Set up a nightly cron job to run the GPU tests from step 4.