Skip to content

Instantly share code, notes, and snippets.

@betatim
Created February 29, 2024 14:53
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save betatim/ef751786f8aba5246b5a0bd32290b8e8 to your computer and use it in GitHub Desktop.
Save betatim/ef751786f8aba5246b5a0bd32290b8e8 to your computer and use it in GitHub Desktop.

Building for bare metal

Checkout https://github.com/rapidsai/gpu-xb-ai:

git clone https://github.com/rapidsai/gpu-xb-ai

Create a conda environment from conda/environments/gpu-xb-ai-legate-all.yaml:

conda env create -f conda/environments/gpu-xb-ai-legate-all.yaml

This environment should contain all the dependencies needed. If not, please report what is missing.

Make sure you can access the following repositories (if needed, ask in #swrapids-legate to get access):

You will need to checkout a copy of each of these repositories, set the contents to a particular commit for each and then build them. To find the commit ID for each repository check the Dockerfile.legate file. A way to checkout one of the repositories and set it to a particular commit is:

mkdir legate.core.internal
cd legate.core.internal
git init
git remote add origin <GitHub URL of repo>
git fetch origin <commitID>
git reset --hard FETCH_HEAD

After cloning each repostiory you will heave to build them. The best order to build them in is the order in which they are listed above. To see the exact command to build each project with take a look at Dockerfile.legate after line 165. This is how things are built inside the docker image.

Make sure to activate the gpu-xb-ai-legate-all conda environment before building.

Once everything is built you should be able to run the benchmark with

legate  use_cases/uc10/legate.py --workdir /tmp --stage=training --iterations=4 /opt/gpu-xb-ai/data/sf-800.0/uc10/train/ /lustre/fsw/nvr_legate/sebastianb/gpu-xb-ai/data/sf-800.0/uc10/train/

The last argument is the directory that contains the input data. /lustre/fsw/nvr_legate/sebastianb/gpu-xb-ai/data/sf-800.0/uc10/train/ is the path that you can use on EOS, but if you are somewhere else you need to put the data somewhere else.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment