Skip to content

Instantly share code, notes, and snippets.

@samcom12
Forked from ax3l/CUDA_Compilers.md
Created September 1, 2021 07:33
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save samcom12/ae72f21db808fe26590d42c8413cc5c9 to your computer and use it in GitHub Desktop.
Save samcom12/ae72f21db808fe26590d42c8413cc5c9 to your computer and use it in GitHub Desktop.
CUDA Compilers

In general, check the crt/host_config.h file to find out which versions are supported. Sometimes it is possible to hack the requirements there to get some newer versions working, too :)

Thrust version can be found in $CUDA_ROOT/include/thrust/version.h.

Download Archives: https://developer.nvidia.com/cuda-toolkit-archive

Release notes for CUDA Toolkit (CTK):

Version notes Nvidia HPC SDK:

Compatibility Guarantees

Quote:

  • CUDA 10.0: First introduced in CUDA 10, the CUDA Forward Compatible Upgrade is designed to allow users to get access to new CUDA features and run applications built with new CUDA releases on systems with older installations of the NVIDIA datacenter GPU driver.
  • CUDA 11.1: First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:
    • By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x).
    • CUDA has relaxed the minimum driver version check and thus no longer requires a driver upgrade with minor releases of the CUDA Toolkit.

nvcc

Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html

CUDA version SM Arch g++ icpc pgc++ xlC MSVC clang++ Linux driver thrust note
1.0 1.0-1.1 ? ? ?
1.1 1.0-1.1 ? ? ?
2.0 1.0-1.1 ? ? ?
2.1 1.0-1.3 ? ? ?
2.3.1 1.0-1.3 ? ? ?
3.0 1.0-2.0 ? ? ?
3.1 1.0-2.0 ? ? ?
3.2 1.0-2.1 ? 11.1 ?
4.0 1.0-2.1 <=4.4 11.1 ?
4.1 1.0-2.1 <=4.5 11.1 ?
4.2 1.0-2.1 <=4.6 11.1 ?
5.0 1.0-3.? <=4.6 11.1 ? ? 1.5.3
5.5 1.0-3.? <=4.8 12.1 ? ? 1.7.0 C++11 on host side supported; ICC fixed to build 20110811
6.0 1.0-5.0 <=4.8 13.1 ? 331.62 1.7.1
6.5 1.1-5.X <=4.8 14.0 ? ? ? 1.7.2 experimenal device side C++11 support; including this version, <thrust/sort.h> skrews up __CUDA_ARCH__ (must be undefined on host); deprecation of SM 11-13 (10 removed)
7.0.17 (RC) s. below <=4.9 15.0 >=14.9 13.1.1 ? 346.29 1.8.0 first official PGI support, first xlc string found; powerpc64 w. little endian supported
7.0.27 2.0-5.X <=4.9 15.0 >=14.9 13.1.1 2010-13 346.46 1.8.1 official C++11 support on device side
7.5 <=4.9 15.0 15.4 13.1 2010-13 3.5-3.6 352.41? 1.8.2 clang (host) on linux supported, __CUDACC_VER__ macros added
7.5.18 2.0-5.X <=4.9 15.0 15.4 13.1 2010-13 352.39 1.8.2
8.0.44 2.0-6.X <=5.3 15.0(.4)-16.0 16(.3)+ 13.1(.2) 2012-15 3.8-3.9 367.48 1.8.3-patch2 sm_60 (pascal) support added
8.0.61 2.0-6.X <=5.3 15.0(.4)-17.0 16(.3)+ 13.1(.2) 2012-15 3.8-3.9 375.26 1.8.3-patch2 nvcc 8 is incompatible with std::tuple in gcc 5.4+
9.0.69 (RC) 3.0-7.0 <=5.5 (<=6) 15.0(.4)-17.0 17 13.1(.2) 2012-17 3.8-3.9 ???.?? 1.9.0-patch4 device-side C++14; __CUDACC_VER__ deprecated for __CUDACC_VER_MAJOR/MINOR/BUILD__
9.0.103 (RC) 3.0-7.0 <=5.5 (<=6) 15.0(.4)-17.0 17 13.1(.2) 2012-17 3.8-3.9 384.59 1.9.0-patch4 same as above, __CUDACC_VER__ defined as #error rendering it fully broken
9.0.176 3.0-7.0 <=5.5 (<=6) (15.0-)17.0 17.1 13.1(.5) 2012-17 (3.8-)3.9 384.81 1.9.0-patch5 same as above
9.1.85 3.0-7.2 <=5.5 (<=6) (15.0-)17.0 17.X 13.1(.6) 2012-17 (3.8-)4.0 390.46 1.9.1-patch2 math_functions.hpp moved to crt/
9.1.85.1 cuBLAS 9.1.128: Volta GEMM kernels optimized
9.1.85.2 ptxas: fix address calculations using large immediate operands
9.1.85.3 cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.
9.0-9.1 nvcc 9.0-9.1 is incompatible with std::tuple in gcc 6+
9.2.88 3.0-7.2 <=7.3.0 (<=7) (15.0-)17.0 17-18.X 13.1(.6),16.1 2012-17 (3.8-)5.0 396.26 1.9.2 CUTLASS 1.0 added; std::tuple fixed (prior GCC 6 issues)
9.2.148 396.37 1.9.2
10.0.130 3.0-7.5 <=7 (15.0-)18.0 17-18.X 13.1, 16.1 2013-17 (3.8-)6.0 410.48 1.9.3 CUDA Forward Compatible Upgrade
10.1.105 3.0-7.5 <=8 (15.0-)19.0 17-19.X 2013-19 (3.8-)7.0 418.39 1.9.4
10.1.168 (3.8-)8.0 418.67 10.1 "Update 1"
10.1.243 418.87 10.1 "Update 2"
10.2.89 3.0-7.5 <=8 (15.0-)19.0 18-19.X 13.1, 16.1 2015-19 (3.3-)8.* 440.33.01 1.9.7 sm_30,35,37,50 deprecated; nvcc: -allow-unsupported-compiler
11.0.1 (RC) NVCC:11.0.167 3.5-8.0 (5-)6-9.* (15.0-)19.1 18-20.1 13.1, 16.1 2015-19 3.2-9.0.0 450.36.06 1.9.9 macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support
11.0.2-1 NVCC:11.0.194 (3.3/)6-9.0.0 450.51.05 nvcc: --Wext-lambda-captures-this
11.0.3 NVCC:11.0.221 ? ? ? ? ? ? ? 450.51.06 ? 11.0 "Update 1"; nvcc: --forward-unknown-to-host-compiler, --forward-unknown-to-host-linker flags
11.1.0 NVCC:11.1.74 3.5-8.6 (5-)6-10.0 (15.0-)19.1 18-20.1 13.1, 16.1 2017-19 (3.3/)6-10.0.0 455.23.05 1.9.10-1 Ubuntu@ppc64le deprecated; CUDA Enhanced Compatibility
11.1.1 NVCC:11.1.? ? ? ?
11.2.0 NVCC:11.2.67 460.27.04 1.10.0
11.2.1 NVCC:....... 460.32.03 ? "Update 1"
11.2.2 NVCC:....... 460.32.03 ? "Update 2"
11.3.0 NVCC:.... 465.19.01 ? cu++flt added, Python Driver/RT bindings, alloca()
11.4.0 NVCC:11.4.48 6.0-... 470.42.01 ? sm30,32 and Ubuntu 16.04 dropped, C++11 stdlib for math
11.4.1 NVCC:11.4.100 6.0-11.0 ...-12.0 470.57.02 ? 11.4 "Update 1", fix g++ 10 issues with chrono headers of libstdc++; Ubuntu 16.04 dropped (x86)
CUDA version SM Arch g++ icpc pgc++ xlC MSVC clang++ Linux driver thrust note

Note: empty cells generally mean "same as above" for readability.

macOS: As of 7.0, clang seems to be the only supported compiler on OSX (but no version check found). CUDA 10.1.243 adds support for Xcode 10.2 . CUDA 11.0 dropped macOS support.

Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.

Dynamic parallelism was added with sm_35 and CUDA 5.0.

Newer CUDA releases have a per-release support matrix for compilers, which also lists supported kernel and glibc versions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements

clang++ -x cuda

clang++ can compile CUDA C++ to ptx as well. Give it a whirl!

clang++ supported CUDA release supported SMs
3.9-5.0 7.0-8.0 2.0-(5.0)6.0
6.0 7.0-9.0 (2.0)3.0-7.0
7.0 7.0-9.2 (2.0)3.0-7.2
8.0 7.0-10.0 (2.0)3.0-7.5
9.0 7.0-10.1 (2.0)3.0-7.5
10.0 7.0-10.1 (2.0)3.0-7.5
11.0 7.0-11.0 (2.0)3.0-8.0
12.0rc5 7.0-11.0 (2.0)3.0-8.0
main 7.0-11.2 (2.0)3.0-8.6

https://llvm.org/docs/CompileCudaWithLLVM.html

Device-Side C++ Standard Support

C++ core language features:

supported C++ standard notes
nvcc -6.0 c++03
nvcc 6.5 c++03, exp. c++11 undocumented
nvcc 7.0-8.0 c++03,11 only c++11 switch
nvcc 9.0-10.2 c++03,11,14 10.2 adds libcu++ (atomics); open repository: https://github.com/NVIDIA/libcudacxx/releases
nvcc 11.0.167+ c++03,11,14,17 C++11 host compiler needed for math libs; ships C++11-compatible backport of the C++20 synchronization library; device LTO added; starting with CUDA Toolkit 11.0.1, nvcc and CUDA Toolkit versions are not equivalent anymore
clang 5+ c++03,11,14,17
clang 6+ c++03,11,14,17,2a
clang 10+ c++03,11,14,17,20
clang trunk c++03,11,14,17,20 status

CUDA-enabled C++ standard library libcu++, based on LLVM's libc++ (docs):

introduced components notes
CUDA 10.2 <atomic> (SM6.0+), <type_traits> introduction of libcu++
CUDA 11.0 atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional> w/o function anticipated with GTC 2020 slides
CUDA 11.2 cuda::std::tuple,pair notes
CUDA next cuda::std::complex, backports: chrono, type_traits notes
newer see the release notes and api docs all open source now

Incremental libcu++ release goals (GTC 2020):

  • Version 1 (CUDA 10.2): <atomic>(SM6.0+), <type_traits>.
  • Version 2 (CUDA next): atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional>minus function.
  • Future priorities: atomic_ref<T>, <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, ...

NVC++

NVC++ is a unified C++ compiler and GPU-accelerated STL for the CUDA platform. It also seems to support OpenACC. NVC++ does currently not support the CUDA C++ language.

supported C++ standard notes
nvc++ 11.0 ...,c++17 initial release, ships C++11-compatible backport of the C++20 synchronization library

All GPU compilers are cheese.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment