Skip to content

Instantly share code, notes, and snippets.

@anjohan
Created December 3, 2020 02:37
Show Gist options
  • Star 6 You must be signed in to star a gist
  • Fork 2 You must be signed in to fork a gist
  • Save anjohan/9ee746295ea1a00d9ca69415f40fafc9 to your computer and use it in GitHub Desktop.
Save anjohan/9ee746295ea1a00d9ca69415f40fafc9 to your computer and use it in GitHub Desktop.
Build Clang with OpenMP Target Offloading for NVIDIA GPUs

Note to self

This works with Clang 12 on Manjaro, and maybe other Linux versions. CUDA 11.1 is installed with pacman in /opt/cuda.

Download

git clone https://github.com/llvm/llvm-project.git

First build

This uses whatever C and C++ compilers you have on your system (I have GCC 10.2).

cd llvm-project
mkdir build
cd build
cmake ../llvm/ -DCMAKE_BUILD_TYPE=Release -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61 -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=61 -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_ENABLE_PROJECTS="clang;openmp"
make -j$(nproc)

Replace 61 with the compute capability of your GPU (e.g. 70 for a V100). Note that this will take forever.

The resulting Clang binary and OpenMP library can be used, but it will give a warning like

clang-12: warning: No library 'libomptarget-nvptx-sm_61.bc' found in the default clang lib directory or in LIBRARY_PATH. Expect degraded performance due to no inlining of runtime functions on target devices. [-Wopenmp-target]

This is why I do the next step, called "bootstrapping".

Second build

Repeat the build procedure, but now using the compilers built in the previous step.

cd ..
mkdir build2
cd build2
CC=../build/bin/clang CXX=../build/bin/clang++ cmake ../llvm/ -DCMAKE_BUILD_TYPE=Release -DCLANG_OPENMP_NVPTX_DEFAULT_ARCH=sm_61 -DLIBOMPTARGET_NVPTX_COMPUTE_CAPABILITIES=61 -DLLVM_TARGETS_TO_BUILD="X86;NVPTX" -DLLVM_ENABLE_PROJECTS="clang;openmp" -DCMAKE_INSTALL_PREFIX=$HOME/.local
make -j$(nproc)
make install

Again, remember to replace 61. Also, pick an install prefix that is in your PATH and LD_LIBRARY_PATH. If you have other Clangs lying around, e.g. from pacman/apt, the install prefix should be prepended to the paths.

Example

A stupid example:

#include <cstdio>

int main(){
  int N = 1e8;
  float *x = new float[N];
  float *y = new float[N];

  #pragma omp target teams distribute parallel for map(to: x[0:N]) map(from: y[0:N])
  for(int i = 0; i < N; i++){
    for(int j = 0; j < 1000; j++){
      y[i] += 3*x[i];
    }
  }

  printf("%g\n", y[5]);

  delete [] x;
  delete [] y;
}

Compile for GPU:

clang++ -fopenmp -fopenmp-targets=nvptx64 -O3 run.cpp && time ./a.out

Compile for CPU (threads):

clang++ -fopenmp -O3 run.cpp && time ./a.out

This example runs roughly 6 times faster on my P2000 than on my i7-8850H. The difference increases with the number of inner repetitions as the data transfer becomes less important.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment