Skip to content

Instantly share code, notes, and snippets.

Embed
What would you like to do?
CUDA Compilers

In general, check the crt/host_config.h file to find out which versions are supported. Sometimes it is possible to hack the requirements there to get some newer versions working, too :)

Thrust version can be found in $CUDA_ROOT/include/thrust/version.h.

Download Archives: https://developer.nvidia.com/cuda-toolkit-archive

Release notes for CUDA:

nvcc

Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

CUDA version SM Arch g++ icpc pgc++ xlC MSVC clang++ Linux driver thrust note
1.0 1.0-1.1 ? ? ?
1.1 1.0-1.1 ? ? ?
2.0 1.0-1.1 ? ? ?
2.1 1.0-1.3 ? ? ?
2.3.1 1.0-1.3 ? ? ?
3.0 1.0-2.0 ? ? ?
3.1 1.0-2.0 ? ? ?
3.2 1.0-2.1 ? 11.1 ?
4.0 1.0-2.1 <=4.4 11.1 ?
4.1 1.0-2.1 <=4.5 11.1 ?
4.2 1.0-2.1 <=4.6 11.1 ?
5.0 1.0-3.? <=4.6 11.1 ? ? 1.5.3
5.5 1.0-3.? <=4.8 12.1 ? ? 1.7.0 C++11 on host side supported; ICC fixed to build 20110811
6.0 1.0-5.0 <=4.8 13.1 ? 331.62 1.7.1
6.5 1.1-5.X <=4.8 14.0 ? ? ? 1.7.2 experimenal device side C++11 support; including this version, <thrust/sort.h> skrews up __CUDA_ARCH__ (must be undefined on host); deprecation of SM 11-13 (10 removed)
7.0.17 (RC) s. below <=4.9 15.0 >=14.9 13.1.1 ? 346.29 1.8.0 first official PGI support, first xlc string found; powerpc64 w. little endian supported
7.0.27 2.0-5.X <=4.9 15.0 >=14.9 13.1.1 2010-13 346.46 1.8.1 official C++11 support on device side
7.5 <=4.9 15.0 15.4 13.1 2010-13 3.5-3.6 352.41? 1.8.2 clang (host) on linux supported, __CUDACC_VER__ macros added
7.5.18 2.0-5.X <=4.9 15.0 15.4 13.1 2010-13 352.39 1.8.2
8.0.44 2.0-6.X <=5.3 15.0(.4)-16.0 16(.3)+ 13.1(.2) 2012-15 3.8-3.9 367.48 1.8.3-patch2 sm_60 (pascal) support added
8.0.61 2.0-6.X <=5.3 15.0(.4)-17.0 16(.3)+ 13.1(.2) 2012-15 3.8-3.9 375.26 1.8.3-patch2 nvcc 8 is incompatible with std::tuple in gcc 5.4+
9.0.69 (RC) 3.0-7.0 <=5.5 (<=6) 15.0(.4)-17.0 17 13.1(.2) 2012-17 3.8-3.9 ???.?? 1.9.0-patch4 device-side C++14; __CUDACC_VER__ deprecated for __CUDACC_VER_MAJOR/MINOR/BUILD__
9.0.103 (RC) 3.0-7.0 <=5.5 (<=6) 15.0(.4)-17.0 17 13.1(.2) 2012-17 3.8-3.9 384.59 1.9.0-patch4 same as above, __CUDACC_VER__ defined as #error rendering it fully broken
9.0.176 3.0-7.0 <=5.5 (<=6) (15.0-)17.0 17.1 13.1(.5) 2012-17 (3.8-)3.9 384.81 1.9.0-patch5 same as above
9.1.85 3.0-7.2 <=5.5 (<=6) (15.0-)17.0 17.X 13.1(.6) 2012-17 (3.8-)4.0 390.46 1.9.1-patch2 math_functions.hpp moved to crt/
9.1.85.1 cuBLAS 9.1.128: Volta GEMM kernels optimized
9.1.85.2 ptxas: fix address calculations using large immediate operands
9.1.85.3 cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.
9.0-9.1 nvcc 9.0-9.1 is incompatible with std::tuple in gcc 6+
9.2.88 3.0-7.2 <=7.3.0 (<=7) (15.0-)17.0 17-18.X 13.1(.6),16.1 2012-17 (3.8-)5.0 396.26 1.9.2 CUTLASS 1.0 added; std::tuple fixed (prior GCC 6 issues)
9.2.148 396.37 1.9.2
10.0.130 3.0-7.5 <=7 (15.0-)18.0 17-18.X 13.1, 16.1 2013-17 (3.8-)6.0 410.48 1.9.3
10.1.105 3.0-7.5 <=8 (15.0-)19.0 17-19.X 2013-19 (3.8-)7.0 418.39 1.9.4
10.1.168 (3.8-)8.0 418.67 10.1 "Update 1"
10.1.243 418.87 10.1 "Update 2"
10.2.89 3.0-7.5 <=8 (15.0-)19.0 18-19.X 13.1, 16.1 2015-19 (3.3-)8.* 440.33.01 1.9.7 sm_30,35,37,50 deprecated
11.0.1 (RC) NVCC:11.0.167 3.5-8.0 (5-)9.* (15.0-)19.1 18-20.1 13.1, 16.1 2015-19 3.2-9.0.0 450.36.06 1.9.9 macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support
11.0.2-1 NVCC:11.0.194 (6-)9.0.0 450.51.05 adds --Wext-lambda-captures-this
11.0.3 NVCC:11.0.221 ? ? ? ? ? ? ? 450.51.06 ? 11.0 "Update 1"; nvcc: --forward-unknown-to-host-compiler and --forward-unknown-to-host-linker flags
11.1.0 NVCC:11.1.74 3.5-8.6 3.5-10.0 ? ? ? ? (6-)10.0.0 455.23.05 ? Ubuntu@ppc64le deprecated
CUDA version SM Arch g++ icpc pgc++ xlC MSVC clang++ Linux driver thrust note

Note: empty cells generally mean "same as above" for readability.

As of 7.0, clang seems to be the only supported compiler on OSX (but no version check found). CUDA 10.1.243 adds support for Xcode 10.2 .

Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.

Dynamic parallelism was added with sm_35 and CUDA 5.0.

clang++ -x cuda

clang++ can compile CUDA C++ to ptx as well. Give it a whirl!

clang++ supported CUDA release supported SMs
3.9-5.0 7.0-8.0 2.0-(5.0)6.0
6.0 7.0-9.0 (2.0)3.0-7.0
7.0 7.0-9.2 (2.0)3.0-7.2
8.0 7.0-10.0 (2.0)3.0-7.5
9.0 7.0-10.1 (2.0)3.0-7.5
10.0 7.0-10.1 (2.0)3.0-7.5
trunk 7.0-11.0 (2.0)3.0-8.0

https://llvm.org/docs/CompileCudaWithLLVM.html

Device-Side C++ Standard Support

C++ core language features:

supported C++ standard notes
nvcc -6.0 c++03
nvcc 6.5 c++03, exp. c++11 undocumented
nvcc 7.0-8.0 c++03,11 only c++11 switch
nvcc 9.0-10.2 c++03,11,14 10.2 adds libcu++ (atomics)
nvcc 11.0.167+ c++03,11,14,17 C++11 host compiler needed for math libs; ships C++11-compatible backport of the C++20 synchronization library; device LTO added; starting with CUDA Toolkit 11.0.1, nvcc and CUDA Toolkit versions are not equivalent anymore
clang 5+ c++03,11,14,17
clang 6+ c++03,11,14,17,2a
clang 10+ c++03,11,14,17,20
clang trunk c++03,11,14,17,20 status

CUDA-enabled C++ standard library libcu++ (based on LLVM's libc++):

introduced components notes
CUDA 10.2 <atomic> (SM6.0+), <type_traits> introduction of libcu++
CUDA 11.0 atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional> w/o function anticipated with GTC 2020 slides

Incremental libcu++ release goals (GTC 2020):

  • Version 1 (CUDA 10.2): <atomic>(SM6.0+), <type_traits>.
  • Version 2 (CUDA next): atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional>minus function.
  • Future priorities: atomic_ref<T>, <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, ...

NVC++

NVC++ is a unified C++ compiler and GPU-accelerated STL for the CUDA platform. It also seems to support OpenACC. NVC++ does currently not support the CUDA C++ language.

supported C++ standard notes
nvc++ 11.0 ...,c++17 initial release, ships C++11-compatible backport of the C++20 synchronization library

All GPU compilers are cheese.

@Artem-B

This comment has been minimized.

Copy link

@Artem-B Artem-B commented Nov 28, 2018

@ax3l

This comment has been minimized.

Copy link
Owner Author

@ax3l ax3l commented Dec 10, 2018

Thanks! They seem to change erratically between releases. Likely because they manually upload them to some CMS as it looks.

@boris-kolpackov

This comment has been minimized.

Copy link

@boris-kolpackov boris-kolpackov commented Mar 12, 2020

Thanks for collecting all this information. We are currently in the process of deciding which approach (NVCC or Clang) is the future and which we should support in the build2 build system. The Clang way definitely seems saner from the build system's POV but a bit of googling suggests NVCC is still the predominantly used approach while the Clang CUDA page hasn't seen any updates in a while. Is my impression accurate?

@ax3l

This comment has been minimized.

Copy link
Owner Author

@ax3l ax3l commented Mar 18, 2020

Ideally a build system should support both. Nvcc's approach is the current (03-2020) official one and significantly harder than clang's, e.g. to propagate compilation options to depending projects. Clang is usually updating a few months after a CUDA release, so far only 10.2 is lacking a bit longer. Maybe @Artem-B knows more on this? We definitely use clang-cuda downstream for direct compile as well as CUDA JIT in Cling.

@Artem-B

This comment has been minimized.

Copy link

@Artem-B Artem-B commented Mar 18, 2020

I've just updated Clang's docs a bit. My guess is that both NVCC and clang will stay around.
As it happens, CMake is about to add support for clang to its CUDA compilation, so they are going to support both clang and nvcc.

NVCC has all the bells and whistles, and will always be ahead of clang in terms of support for new GPU architectures. On the other hand, for large projects like Tensorflow, nvcc is a rather heavy maintenance burden. We are constantly fighting all sorts of corner cases that pop up due to quirks of NVCC,s front-end, the host compiler used by NVCC and multiple source code projects with various degrees of compiler portability. It's hard enough to make code portable for one compiler on multiple platforms. Making it portable for all combinations of {clang|nvcc{clang,gcc,msvc}},{windows, linux} is a constant game of whack-a-mole. Reducing it to clang everywhere makes things simpler and much more robust.

Clang only targets essential CUDA functionality, so for things like textures, __managed__, etc. one will need nvcc (though AMD folks have just sent a patch to improve surface/texture support: https://reviews.llvm.org/D76365). Things like nvcc -rdc are not integrated into compiler and have to be done by the build system (which is conceptually the place to do it, but it's currently a burden on the end-user).
On the positive side, in addition to simplifying maintenance, Clang also has a huge advantage of being open-source. If there are bugs, they are possible to fix. Our round-trip time for detect-fix-integrate ends up being as short as O(days). Being able to piggy-back on all the latest C++ features is also a plus. NVCC and MSVC are lagging behind in that respect.

@ax3l

This comment has been minimized.

Copy link
Owner Author

@ax3l ax3l commented Mar 18, 2020

Thanks, I agree with all points and thank you for the insights. Thank you for advancing Clang-CUDA, we appreciate this effort a lot for exactly the reasons mentioned.

@boris-kolpackov

This comment has been minimized.

Copy link

@boris-kolpackov boris-kolpackov commented Mar 19, 2020

@ax3l, @Artem-B thanks for the feedback.

I agree with Artem's points, Clang's compilation model looks a lot saner from the build system's POV. With NVCC we will most likely need to decompose all the steps that it performs under the hood (like invoking the host compiler) and perform them ourselves if we have any chance of having proper header dependency extraction with support for auto-generated headers (and thinking about C++20 modules in this context just makes my head hurt). With CUDA-Clang it seems like we could just use our standard logic that we use for the Vanilla-Clang. Unfortunately, however, the feedback I am hearing from the potential users is that they have to use NVCC for various reasons (see the build2 issue I linked above for details).

One question about the "Reducing it to clang everywhere makes things simpler and much more robust" remark: the documentation page say the "Compilation on MacOS and Windows may or may not work and currently have no maintainers". Is this still accurate?

@Artem-B

This comment has been minimized.

Copy link

@Artem-B Artem-B commented Mar 19, 2020

One question about the "Reducing it to clang everywhere makes things simpler and much more robust" remark: the documentation page say the "Compilation on MacOS and Windows may or may not work and currently have no maintainers". Is this still accurate?

Yes.

CUDA-10.2 is the last release to support MacOS, so it's probably the end of the road for it in clang, too.

Clang on windows is largely driven by the Chrome team, but it only covers C++ compilation. If/when Tensorflow switches to clang, we'll likely put more resources into CUDA compilation on windows, too, but at the moment nobody's in charge.

@mabraham

This comment has been minimized.

Copy link

@mabraham mabraham commented Apr 24, 2020

so it's probably the end of the road for it in clang, too

I doubt it. Apple didn't drive LLVM supporting a PTX back end, Google did, precisely so that they are not dependent on nvcc.

@ax3l

This comment has been minimized.

Copy link
Owner Author

@ax3l ax3l commented Apr 24, 2020

Hi Mark, let me rephrase what the clang +cuda (prior: gpucc) author and maintainer from Google that you quote wrote: this is as of today the end for any newer CUDA on macOS. You cannot run or even build a CUDA software stack with a compiler alone on macOS. clang +cuda integrates with several parts of the CUDA-Toolkit. And there are since years no new Apple computers with Nvidia GPU to run it on.

Just to correct the history: Nvidia provided the PTX backend for LLVM, Google the CUDA frontend.

@ax3l

This comment has been minimized.

Copy link
Owner Author

@ax3l ax3l commented Apr 24, 2020

Good news everyone, CMake's integration of Clang as a CUDA compiler is moving forward.

@Artem-B

This comment has been minimized.

Copy link

@Artem-B Artem-B commented Apr 24, 2020

@mabraham PTX is just text assembly. At the very least you need ptxas, libcuda and libcudart in order to compile it to the actual GPU binary and launch it. libcuda comes with the driver where NVIDIA supports CUDA and it is likely to go away once NVIDIA drops CUDA support on mac. ptxas and libcudart come with CUDA, so they will also be gone. The end result is that even though clang will still be able to generate PTX, it will be nearly useless. MacOS will end up in the same situation where FreeBSD is now -- they have clang, they've had NVIDIA drivers for a pretty long time, they can run Linux's CUDA apps, but there's no native libcuda or libcudart there (linux's ptxas could be run under linux emulation), so they can't use clang to create native CUDA apps. :-(

Just to correct the history: Nvidia provided the PTX backend for LLVM, Google the CUDA frontend.

Google has been the major contributor to NVPTX back-end development in LLVM, too. NVIDIA itself has been conspicuously missing.

@ax3l

This comment has been minimized.

Copy link
Owner Author

@ax3l ax3l commented Apr 24, 2020

Thank you for the details and clarification, really appreciate your work and insights.

@mabraham

This comment has been minimized.

Copy link

@mabraham mabraham commented Apr 24, 2020

Agree CUDA on Mac is dead sometime soon. Compiling CUDA with clang is a viable proposition on at least Linux now and moving forward :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.