Skip to content

Instantly share code, notes, and snippets.

@ChristopherHogan
Last active August 5, 2022 15:41
Show Gist options
  • Save ChristopherHogan/8a65eb0f31740f71a801f509c7fe7ce4 to your computer and use it in GitHub Desktop.
Save ChristopherHogan/8a65eb0f31740f71a801f509c7fe7ce4 to your computer and use it in GitHub Desktop.
Mercury GPU Test Plan

1. Install mochi components

Mochi Documentation

Make sure you're using the latest spack from the develop branch and the latest mochi-margo from the main branch. Note: The mochi-margo package is contained in the default spack repo, but we can't use that one. We must use the version at https://github.com/mochi-hpc/mochi-spack-packages. That is the only way we can get a new enough margo and mercury that have the GPU capabilities we're testing. For reference, the GPU capabilites were added to Mercury in this PR, and wrappers for the feature were added to margo in this PR.

Use the system (preinstalled) MPI instead of letting spack build it. See the documentation on System packages and External packages. You can try the spack external find command to auto-populate some external packages, although it doesn't seem to work for MPI.

git clone https://github.com/spack/spack
source spack/share/spack/setup-env.sh
git clone https://github.com/mochi-hpc/mochi-spack-packages
spack repo add mochi-spack-packages
spack install mochi-margo@develop

# mochi-ssg is needed for the margo-p2p-bw benchmark (see below)
# Pass whatever MPI implementation ThetaGPU has as a dependency to this command.
# Spack preferes OpenMPI by default.
# Check the log and make sure spack doesn't try to build MPI.
spack install mochi-ssg@develop ^<mpich | openmpi | ...>

2. Run A Basic CUDA Program on ThetaGPU

Here is a basic tutorial. Make sure you can get that working on ThetaGPU.

Here is the libfabric example code that Jerome recommended. That should be all the CUDA necessary for us.

3. Build and run margo-p2p-bw

The margo-p2p-bw benchmark can be used as a starting point for the GPU test.

Email from Phil

Hi Chris,

Sorry for the delay, but I just landed the margo PR to add support for the bulk
attr function present in Mercury 2.2.0rc1:

https://github.com/mochi-hpc/mochi-margo/pull/185

Here is the existing margo point to point bandwidth benchmark that you might
want to consider as a starting point (it doesn't have to be, but it could be
helpful at least for boilerplate on the Margo initialization side of things):

https://github.com/mochi-hpc-experiments/mochi-tests/blob/main/perf-regression/margo-p2p-bw.c

That repository is intended to be configured and built manually outside of spack
(because it also serves as a sanity check that packages like Margo and Mercury
that are built with spack are presented properly to user applications). It
requires at least mochi-margo, mochi-ssg, and mpi to build the margo benchmarks.
There are subdirectories under "perf-regression" with scripts for a few
platforms, but we'll need a new one for theta-gpu. To get the GPU support you'll
further need mercury@2.2.0rc1 and mochi-margo@main, rather than the default
versions.

Let me know who needs theta-gpu access for testing and I'll help get them
through the account request process. To recap, that's our main production system
with GPUs in it (Nvidia), but we haven't done significant testing there yet. I'm
happy to help if there are any problems generally getting anything up and
running, system or test program related.

thanks,

-Phil

I would start by getting this benchmark working on ThetaGPU before modifying it for GPU testing.

4. Implement the GPU tests

2 Basic Tests (requires 2 nodes)

  1. Move data from GPU memory of node A to main memory of node B. Verify correctness.
  2. Move data from main memory of node A to GPU memory of node B. Verify correctness.

Details

  • Use cudaMalloc and cudaMemcpy to set up GPU memory.
  • Modify margo-p2p-bw to use the margo_bulk_create_attr function
  • Need to pass an hg_bulk_attr to margo_bulk_create_attr with mem_type = HG_MEM_TYPE_CUDA and device = <CUDA device ID>. I'm not entirely positive how to get the device ID, but I assume there is a CUDA API call.
  • Add a thetaGPU folder in mochi-tests/perf-regression with scripts similar to those in mochi-tests/perf-regression/theta that run your GPU test.

5. Add ThetaGPU folder to mochi-tests/perf-regression

Set up a nightly cron job to run the GPU tests from step 4.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment