Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
Script to build gcc with OpenMP offloading to Nvidia devices (via nvptx)
#!/bin/bash
#
# Build GCC with support for offloading to NVIDIA GPUs.
#
set -o nounset -o errexit
# Location of the installed CUDA toolkit
cuda=/usr/local/cuda
# directory of this script
MYDIR="$( cd -P "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
work_dir=$MYDIR/gcc-offload
install_dir=$work_dir/install
rm -rf $work_dir
# Build assembler and linking tools
mkdir -p $work_dir
cd $work_dir
git clone https://github.com/MentorEmbedded/nvptx-tools
cd nvptx-tools
./configure \
--with-cuda-driver-include=$cuda/include \
--with-cuda-driver-lib=$cuda/lib64 \
--prefix=$install_dir
make
make install
cd ..
# Set up the GCC source tree
git clone https://github.com/MentorEmbedded/nvptx-newlib
wget -c http://gnu.mirror.globo.tech/gcc/gcc-7.3.0/gcc-7.3.0.tar.gz
tar xf gcc-7.3.0.tar.gz
cd gcc-7.3.0
contrib/download_prerequisites
ln -s ../nvptx-newlib/newlib newlib
target=$(./config.guess)
cd ..
# Build nvptx GCC
mkdir build-nvptx-gcc
cd build-nvptx-gcc
../gcc-7.3.0/configure \
--target=nvptx-none \
--with-build-time-tools=$install_dir/nvptx-none/bin \
--enable-as-accelerator-for=$target \
--disable-sjlj-exceptions \
--enable-newlib-io-long-long \
--enable-languages="c,c++,fortran,lto" \
--prefix=$install_dir
make -j4
make install
cd ..
# Build host GCC
mkdir build-host-gcc
cd build-host-gcc
../gcc-7.3.0/configure \
--enable-offload-targets=nvptx-none \
--with-cuda-driver-include=$cuda/include \
--with-cuda-driver-lib=$cuda/lib64 \
--disable-bootstrap \
--disable-multilib \
--enable-languages="c,c++,fortran,lto" \
--prefix=$install_dir
make -j4
make install
cd ..
@pguthrey

This comment has been minimized.

Copy link

@pguthrey pguthrey commented May 9, 2019

Matthias,

This script seems like just the thing I need to get OpenMP GPU computing to work on my HPC cluster. I was wondering if you had additional instructions? So far I have run this script which seemed to get all the way through the install process. Then, I tried I tried compiling my program by modifying my makefile to use

CXX = /mnt/home/guthreyp/gcc-offload/install/bin/x86_64-pc-linux-gnu-accel-nvptx-none-gcc -foffload=nvptx-none

LINK = time $(CXX) -fopenmp -fopenmp-targets=nvptx64-nvidia-cuda

And I have the following error:
x86_64-pc-linux-gnu-accel-nvptx-none-gcc: fatal error: GCC is not configured to support nvptx-none as offload target

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Nov 7, 2019

Hi @pguthrey ,

I just saw your message now. The way I compile offloading code with this is by just specifying -fopenmp to the main gcc executable, ie something like CXX=gcc-offload/install/bin/gcc -fopenmp. I think there is no need for -foffload=nvptx-none or -fopenmp-targets=nvptx64-nvidia-cuda. I've got a small test project for offloading set up that might be helpful: https://github.com/matthiasdiener/omptest/

@nagendraverma

This comment has been minimized.

Copy link

@nagendraverma nagendraverma commented Nov 28, 2019

Hello Mattiasdiener,

Thanks for your script, I used your revised script for OpenMP offloading to Nvidia devices.
The build is successfull.
Before testing I export the library path
export LD_LIBRARY_PATH=/home/nagg/gcc-9.1/offload-gcc-7.3/gcc-offload/install/lib64:$LD_LIBRARY_PATH
In my test program omp_get_num_devices() is printing 4 that is correct (I have 4 GPU in my system) but when I am profiling using nvprof to see the loop is running on GPU or not, nvprof shows No kernels were profiled. its means the code is still runing on cpu.
I am giving here my test program, could you please check why it is not running on GPU as intended.

$gcc vector.c -o vector -fopenmp -foffload=nvptx-none
$nvprof ./vector

==291177== NVPROF is profiling process 291177, command: ./vector
num devices=4
==291177== Profiling application: ./vector
==291177== Profiling result:
No kernels were profiled.

==291177== API calls:
Time(%) Time Calls Avg Min Max Name
44.79% 3.2770us 16 204ns 114ns 310ns cuDeviceGetAttribute
27.76% 2.0310us 3 677ns 131ns 1.6140us cuDeviceGetCount
27.46% 2.0090us 8 251ns 118ns 508ns cuDeviceGet

$cat vector.c

#include <omp.h>
#include <stdio.h>
#define N 1000
#define CHUNKSIZE 100
int main(int argc, char *argv[]) {
int i, chunk;
float a[N], b[N], c[N];
int num=-1;
num=omp_get_num_devices();
printf("num devices=%d\n",num);
for (i=0; i < N; i++)
a[i] = b[i] = i * 1.0;
chunk = CHUNKSIZE;
#pragma acc kernels loop gang(32), vector (16)
for (i=0; i < N; i++)
c[i] = a[i] + b[i];

return 0;
}

$ /home/nagg/gcc-9.1/offload-gcc-7.3/gcc-offload/install/bin/gcc -v
Using built-in specs.
COLLECT_GCC=/home/nagg/gcc-9.1/offload-gcc-7.3/gcc-offload/install/bin/gcc
COLLECT_LTO_WRAPPER=/home/nagg/gcc-9.1/offload-gcc-7.3/gcc-offload/install/libexec/gcc/x86_64-pc-linux-gnu/7.3.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-7.3.0/configure --enable-offload-targets=nvptx-none --with-cuda-driver-include=/usr/local/cuda-8.0/include --with-cuda-driver-lib=/usr/local/cuda-8.0/lib64 --disable-bootstrap --disable-multilib --enable-languages=c,c++,fortran,lto --prefix=/nishome/nag/gcc-9.1/offload-gcc-7.3/gcc-offload/install
Thread model: posix
gcc version 7.3.0 (GCC)

Thank You

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Nov 28, 2019

@nagendraverma

#pragma acc kernels loop gang(32), vector (16)
[...]

I don't think you can mix OpenMP and OpenAcc in the same application (at the very minimum, you would need to add -fopenacc to your compilation options). Try to offload an OpenMP kernel (#pragma omp target).

@nagendraverma

This comment has been minimized.

Copy link

@nagendraverma nagendraverma commented Nov 28, 2019

Hello Matthiasdiener,

I tried
$ gcc vector.c -o vector -fopenacc
gcc: warning: ‘-x lto’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
compilation terminated.
collect2: fatal error: lto-wrapper returned 1 exit status
compilation terminated.

then I added -flto
$ gcc vector.c -o vector -fopenacc -flto
$ ./vector
num devices=4

libgomp: target function wasn't mapped

+++++++++++++++++++++++++++++++++++
I tried target also
the above code I change like
#pragma omp target map(to:a,b) map(from:c)
#pragma omp parallel for
for (i=0; i < N; i++)
c[i] = a[i] + b[i];

$ gcc vector_v2.c -o vector_v2 -fopenmp -foffload=nvptx-none
gcc: warning: ‘-x lto’ after last input file has no effect
gcc: fatal error: no input files
compilation terminated.
lto-wrapper: fatal error: gcc returned 1 exit status
compilation terminated.
collect2: fatal error: lto-wrapper returned 1 exit status
compilation terminated.
$ gcc vector_v2.c -o vector_v2 -fopenmp -foffload=nvptx-none -flto
$ ./vector_v2
num devices=4

$ nvprof ./vector_v2
==304053== NVPROF is profiling process 304053, command: ./vector_v2
num devices=4
==304053== Profiling application: ./vector_v2
==304053== Profiling result:
No kernels were profiled.

==304053== API calls:
Time(%) Time Calls Avg Min Max Name
76.12% 277.32ms 1 277.32ms 277.32ms 277.32ms cuCtxCreate
23.35% 85.088ms 1 85.088ms 85.088ms 85.088ms cuCtxDestroy
0.29% 1.0624ms 1 1.0624ms 1.0624ms 1.0624ms cuMemAllocHost
0.13% 483.10us 1 483.10us 483.10us 483.10us cuMemFreeHost
0.10% 368.14us 27 13.634us 118ns 362.71us cuDeviceGetAttribute
0.00% 2.1580us 2 1.0790us 577ns 1.5810us cuCtxGetDevice
0.00% 2.1110us 4 527ns 127ns 1.5160us cuDeviceGetCount
0.00% 2.0500us 9 227ns 118ns 433ns cuDeviceGet
0.00% 1.9560us 1 1.9560us 1.9560us 1.9560us cuMemHostGetDevicePointer
0.00% 648ns 1 648ns 648ns 648ns cuInit
0.00% 205ns 1 205ns 205ns 205ns cuCtxGetCurrent

It is not running on GPU still running on CPU.
It seems related to LTO.
Could you please have a look.

Thanks You

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Nov 28, 2019

Try

#pragma omp target teams distribute parallel for map(tofrom:a[:N],b[:N],c[:N])
for (i=0; i < N; i++)
  c[i] = a[i] + b[i];

The lto error might indicate your binutils are too old or so.

@nagendraverma

This comment has been minimized.

Copy link

@nagendraverma nagendraverma commented Nov 28, 2019

modify with
#pragma omp target distribute parallel for map(tofrom:a[:N],b[:N],c[:N])
for (i=0; i < N; i++)
c[i] = a[i] + b[i];

$ gcc vector_v2.c -o vector_v2 -fopenmp -foffload=nvptx-none -flto

vector_v2.c: In function ‘main’:
vector_v2.c:34:23: error: expected ‘#pragma omp’ clause before ‘distribute’
#pragma omp target distribute parallel for map(tofrom:a[:N],b[:N],c[:N])
^~~~~~~~~~

again modify the code a bit, then compilation succesfull.
#pragma omp target map(tofrom:a[:N],b[:N],c[:N])
#pragma omp teams
#pragma omp distribute parallel for
for (i=0; i < N; i++)
c[i] = a[i] + b[i];

$ gcc vector_v2.c -o vector_v2 -fopenmp -foffload=nvptx-none -flto
$ nvprof ./vector_v2
==306421== NVPROF is profiling process 306421, command: ./vector_v2
num devices=4
==306421== Profiling application: ./vector_v2
==306421== Profiling result:
No kernels were profiled.

==306421== API calls:
Time(%) Time Calls Avg Min Max Name
75.08% 262.94ms 1 262.94ms 262.94ms 262.94ms cuCtxCreate
.....
.....
but still not running on GPU.

I checked my ld version
$ ld -v
GNU ld version 2.20.51.0.2-5.48.el6 20100205

should I need to upgrade my binutils ?

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Nov 28, 2019

#pragma omp target distribute parallel for map(tofrom:a[:N],b[:N],c[:N])

Sorry, I meant target teams distribute...

I checked my ld version
$ ld -v
GNU ld version 2.20.51.0.2-5.48.el6 20100205

should I need to upgrade my binutils ?

Yeah, that is unlikely to work, but upgrading binutils will possibly not be enough. You need to update your OS...

@nagendraverma

This comment has been minimized.

Copy link

@nagendraverma nagendraverma commented Nov 28, 2019

$ lsb_release -a
LSB Version: :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: CentOS
Description: CentOS release 6.10 (Final)
Release: 6.10
Codename: Final

should I really need to upgrade my OS ? or something else I am missing like linking something ?

@nagendraverma

This comment has been minimized.

Copy link

@nagendraverma nagendraverma commented Nov 28, 2019

Can you please try the same code on your machine and verify with nvprof to check it is really running on GPU.

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Nov 28, 2019

should I really need to upgrade my OS ? or something else I am missing like linking something ?

CentOS v6 is 10 years old.... Any modern OS (released in the previous 12 months or so) will likely have a package for offloading so that you don't need to build the compiler yourself (Ubuntu has for example https://packages.ubuntu.com/search?keywords=gcc-7-offload-nvptx). If you really want to stick with CentOS 6, try at least upgrading binutils (maybe with spack).

Can you please try the same code on your machine and verify with nvprof to check it is really running on GPU.

Sorry, I can't. I haven't used this in a long time. The code is correct, so the issue is somewhere else.

@nagendraverma

This comment has been minimized.

Copy link

@nagendraverma nagendraverma commented Nov 29, 2019

Hello Matthiasdiener,

I have used the script to build gcc-offload on another machine with CentOS Linux release 7.6.1810 (Core), now I can run on GPU.
It confirms the problem was due to old OS.

Thanks

@dhidas

This comment has been minimized.

Copy link

@dhidas dhidas commented Mar 18, 2020

Hi Matthais,

Thank you very much for this. I was able to compile binaries that offload correctly for nvidia gpus. I was wondering if you have had any success building a shared library capable of offloading? As soon as I introduce the -shared flag, although it will compile and run, it no longer offloads. Do you have any suggestions in that regard?

Thanks again

@ochubar

This comment has been minimized.

Copy link

@ochubar ochubar commented Mar 18, 2020

Hello Matthiasdiener,

Can the gcc built using your build-gcc-offload-nvptx.sh script compile a shared library / Python extension written in C++ (rather than a stand-alone application) where we make use of OMP targeting GPU?

Thanks for any help / comment / example !

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Mar 19, 2020

@dhidas and @ochubar: As far as I know, the offloaded code must be statically linked into the main binary to be actually running on a device. This seems to be a limitation across all compilers I have tried (not only gcc) and is unfortunately not well-publicized (I have not seen this limitation in any compiler documentation whatsoever, and it is not mentioned in the OpenMP standard).

Note that your binary can link to shared libraries, just the offloaded code needs to be statically linked to the binary itself.

Also note that putting the offloaded code into a static library (i.e., .a), and then linking that library statically to your binary seems to work fine.

@ofmla

This comment has been minimized.

Copy link

@ofmla ofmla commented Dec 15, 2020

Hi @matthiasdiener, I got the following error while building Host GCC with your script

configure: error: CUDA driver package required for nvptx support
make[1]: *** [configure-target-libgomp] Error 1
make[1]: Leaving directory `/scratch/oscarm/gcc-offload/build-host-gcc'

I am using CUDA 10.1, and I was unlucky in my search for solutions (w google). Do you have any idea about was going on? any help is appreciated!

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Dec 15, 2020

Hi @matthiasdiener, I got the following error while building Host GCC with your script

configure: error: CUDA driver package required for nvptx support
make[1]: *** [configure-target-libgomp] Error 1
make[1]: Leaving directory `/scratch/oscarm/gcc-offload/build-host-gcc'

I am using CUDA 10.1, and I was unlucky in my search for solutions (w google). Do you have any idea about was going on? any help is appreciated!

Have you adjusted the path to your cuda installation (line 10)?

@ofmla

This comment has been minimized.

Copy link

@ofmla ofmla commented Dec 15, 2020

thx for your quick reply. Yes, I set the path as cuda=/usr/local/cuda-10.1

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Dec 16, 2020

thx for your quick reply. Yes, I set the path as cuda=/usr/local/cuda-10.1

Hmm, then I'm not sure. Have you checked that /usr/local/cuda-10.1/include and /usr/local/cuda-10.1/lib64 contain the CUDA files?

@ofmla

This comment has been minimized.

Copy link

@ofmla ofmla commented Dec 17, 2020

After using the ls command I realized that there is a paste named stubs inside /usr/local/cuda-10.1/lib64 where is located libcuda.so, so I change the lines --with-cuda-driver-lib=$cuda/lib64 for --with-cuda-driver-lib=$cuda/lib64/stubs and the installation was successful. The question is now to know if it works as expected. Thanks.

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Dec 18, 2020

After using the ls command I realized that there is a paste named stubs inside /usr/local/cuda-10.1/lib64 where is located libcuda.so, so I change the lines --with-cuda-driver-lib=$cuda/lib64 for --with-cuda-driver-lib=$cuda/lib64/stubs and the installation was successful. The question is now to know if it works as expected. Thanks.

That's great. You can try running any of the examples on this gist to see if it works (maybe check with nvprof that code is actually offloaded).

@ofmla

This comment has been minimized.

Copy link

@ofmla ofmla commented Dec 29, 2020

After using the ls command I realized that there is a paste named stubs inside /usr/local/cuda-10.1/lib64 where is located libcuda.so, so I change the lines --with-cuda-driver-lib=$cuda/lib64 for --with-cuda-driver-lib=$cuda/lib64/stubs and the installation was successful. The question is now to know if it works as expected. Thanks.

That's great. You can try running any of the examples on this gist to see if it works (maybe check with nvprof that code is actually offloaded).

It works properly! Thanks for sharing the script and the example codes :D

@changseok

This comment has been minimized.

Copy link

@changseok changseok commented Mar 8, 2021

Hello @matthiasdiener, I am following your script to use OpenMP. I am new to programming so there is something I am not sure about...
Currently, I am using WSL2 Ubuntu. And I have already installed gcc(version 9.3.0). I was wondering if I should remove this GCC first and then follow your script.

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Mar 8, 2021

Hello @matthiasdiener, I am following your script to use OpenMP. I am new to programming so there is something I am not sure about...
Currently, I am using WSL2 Ubuntu. And I have already installed gcc(version 9.3.0). I was wondering if I should remove this GCC first and then follow your script.

I don't have experience with WSL, so I'm not sure if this is going to work. I would probably recommend not removing the other gcc.

@Panjaksli

This comment has been minimized.

Copy link

@Panjaksli Panjaksli commented Apr 23, 2021

I'm kind off really new to Linux and building compilers...
Is the script supposed to end with
make: *** [Makefile:906: all] Error 2 ?

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Apr 23, 2021

I'm kind off really new to Linux and building compilers...
Is the script supposed to end with
make: *** [Makefile:906: all] Error 2 ?

No, this means compilation failed. You need to check for other error messages before that

@Panjaksli

This comment has been minimized.

Copy link

@Panjaksli Panjaksli commented Apr 24, 2021

I'm kind off really new to Linux and building compilers...
Is the script supposed to end with
make: *** [Makefile:906: all] Error 2 ?

No, this means compilation failed. You need to check for other error messages before that

Well it was something along the lines "couldn't change directory"...
Anyways Ubuntu is not exactly my cup of tea, I've wiped it and I'm going to install another distro and try it in there.

@kalasagarb

This comment has been minimized.

Copy link

@kalasagarb kalasagarb commented Apr 30, 2021

i want to install gcc10 for offload of nvptx-none target. How to do it ?? please help me

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Apr 30, 2021

i want to install gcc10 for offload of nvptx-none target. How to do it ?? please help me

Just follow the script above, replacing the version numbers with your desired version. Or use a package provided by your distro.

@kalasagarb

This comment has been minimized.

Copy link

@kalasagarb kalasagarb commented Apr 30, 2021

i am getting following error while using your script

/bin/sh: line 3: cd: x86_64-pc-linux-gnu/libstdc++-v3: No such file or directory
make[1]: *** [install-target-libstdc++-v3] Error 1

whats d problem ??

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented Apr 30, 2021

i am getting following error while using your script

/bin/sh: line 3: cd: x86_64-pc-linux-gnu/libstdc++-v3: No such file or directory
make[1]: *** [install-target-libstdc++-v3] Error 1

whats d problem ??

Sorry, no idea.

@kalasagarb

This comment has been minimized.

Copy link

@kalasagarb kalasagarb commented May 6, 2021

is gpu offloading possible with mpi+openmp 4.5 + ??

@matthiasdiener

This comment has been minimized.

Copy link
Owner Author

@matthiasdiener matthiasdiener commented May 6, 2021

is gpu offloading possible with mpi+openmp 4.5 + ??

I think so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment