samcom12/CUDA_Compilers.md

## CUDA_Compilers.md

      
    Raw
  

              CUDA_Compilers.md
            
          
    In general, check the crt/host_config.h file to find out which versions are supported.
Sometimes it is possible to hack the requirements there to get some newer versions working, too :)
Thrust version can be found in $CUDA_ROOT/include/thrust/version.h.
Download Archives: https://developer.nvidia.com/cuda-toolkit-archive
Release notes for CUDA Toolkit (CTK):

11.4:     https://docs.nvidia.com/cuda/cuda-toolkit-release-notes/index.html
11.3:     https://docs.nvidia.com/cuda/archive/11.3.0/index.html
11.2:     https://docs.nvidia.com/cuda/archive/11.2.2/index.html
11.1:     https://docs.nvidia.com/cuda/archive/11.1.1/index.html
11.0:     https://docs.nvidia.com/cuda/archive/11.0/cuda-toolkit-release-notes/index.html
10.2:     https://developer.download.nvidia.com/compute/cuda/10.2/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
10.1:     https://developer.download.nvidia.com/compute/cuda/10.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
10.0:     https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
9.2:      https://developer.download.nvidia.com/compute/cuda/9.2/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
9.1:      https://developer.download.nvidia.com/compute/cuda/9.1/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
9.0:      https://developer.download.nvidia.com/compute/cuda/9.0/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
8.0:      https://developer.nvidia.com/compute/cuda/8.0/Prod2/docs/sidebar/CUDA_Toolkit_Release_Notes-pdf
7.5:      http://developer.download.nvidia.com/compute/cuda/7.5/Prod/docs/sidebar/CUDA_Toolkit_Release_Notes.pdf
7.0:      http://developer.download.nvidia.com/compute/cuda/7_0/Prod/doc/CUDA_Toolkit_Release_Notes.pdf
6.5:      http://developer.download.nvidia.com/compute/cuda/6_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf
6.0:      http://developer.download.nvidia.com/compute/cuda/6_0/rel/docs/CUDA_Toolkit_Release_Notes.pdf
5.5:      http://developer.download.nvidia.com/compute/cuda/5_5/rel/docs/CUDA_Toolkit_Release_Notes.pdf

Version notes Nvidia HPC SDK:

https://docs.nvidia.com/hpc-sdk/hpc-sdk-release-notes/index.html

Compatibility Guarantees

Quote:

CUDA 10.0: First introduced in CUDA 10, the CUDA Forward Compatible Upgrade is designed to allow users to get access to new CUDA features and run applications built with new CUDA releases on systems with older installations of the NVIDIA datacenter GPU driver.
CUDA 11.1: First introduced in CUDA 11.1, CUDA Enhanced Compatibility provides two benefits:

By leveraging semantic versioning across components in the CUDA Toolkit, an application can be built for one CUDA minor release (such as 11.1) and work across all future minor releases within the major family (such as 11.x).
CUDA has relaxed the minimum driver version check and thus no longer requires a driver upgrade with minor releases of the CUDA Toolkit.


nvcc

Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html


CUDA version
SM Arch
g++
icpc
pgc++
xlC
MSVC
clang++
Linux driver
thrust
note


1.0
1.0-1.1
?
?
?


1.1
1.0-1.1
?
?
?


2.0
1.0-1.1
?
?
?


2.1
1.0-1.3
?
?
?


2.3.1
1.0-1.3
?
?
?


3.0
1.0-2.0
?
?
?


3.1
1.0-2.0
?
?
?


3.2
1.0-2.1
?
11.1
?


4.0
1.0-2.1
<=4.4
11.1
?


4.1
1.0-2.1
<=4.5
11.1
?


4.2
1.0-2.1
<=4.6
11.1
?


5.0
1.0-3.?
<=4.6
11.1
?


?
1.5.3


5.5
1.0-3.?
<=4.8
12.1
?


?
1.7.0
C++11 on host side supported; ICC fixed to build 20110811


6.0
1.0-5.0
<=4.8
13.1
?


331.62
1.7.1


6.5
1.1-5.X
<=4.8
14.0
?

?

?
1.7.2
experimenal device side C++11 support; including this version, <thrust/sort.h> skrews up __CUDA_ARCH__ (must be undefined on host); deprecation of SM 11-13 (10 removed)


7.0.17 (RC)
s. below
<=4.9
15.0
>=14.9
13.1.1
?

346.29
1.8.0
first official PGI support, first xlc string found; powerpc64 w. little endian supported


7.0.27
2.0-5.X
<=4.9
15.0
>=14.9
13.1.1
2010-13

346.46
1.8.1
official C++11 support on device side


7.5

<=4.9
15.0
15.4
13.1
2010-13
3.5-3.6
352.41?
1.8.2
clang (host) on linux supported, __CUDACC_VER__ macros added


7.5.18
2.0-5.X
<=4.9
15.0
15.4
13.1
2010-13

352.39
1.8.2


8.0.44
2.0-6.X
<=5.3
15.0(.4)-16.0
16(.3)+
13.1(.2)
2012-15
3.8-3.9
367.48
1.8.3-patch2
sm_60 (pascal) support added


8.0.61
2.0-6.X
<=5.3
15.0(.4)-17.0
16(.3)+
13.1(.2)
2012-15
3.8-3.9
375.26
1.8.3-patch2
nvcc 8 is incompatible with std::tuple in gcc 5.4+


9.0.69 (RC)
3.0-7.0
<=5.5 (<=6)
15.0(.4)-17.0
17
13.1(.2)
2012-17
3.8-3.9
???.??
1.9.0-patch4
device-side C++14; __CUDACC_VER__ deprecated for __CUDACC_VER_MAJOR/MINOR/BUILD__


9.0.103 (RC)
3.0-7.0
<=5.5 (<=6)
15.0(.4)-17.0
17
13.1(.2)
2012-17
3.8-3.9
384.59
1.9.0-patch4
same as above, __CUDACC_VER__ defined as #error rendering it fully broken


9.0.176
3.0-7.0
<=5.5 (<=6)
(15.0-)17.0
17.1
13.1(.5)
2012-17
(3.8-)3.9
384.81
1.9.0-patch5
same as above


9.1.85
3.0-7.2
<=5.5 (<=6)
(15.0-)17.0
17.X
13.1(.6)
2012-17
(3.8-)4.0
390.46
1.9.1-patch2
math_functions.hpp moved to crt/


9.1.85.1


cuBLAS 9.1.128: Volta GEMM kernels optimized


9.1.85.2


ptxas: fix address calculations using large immediate operands


9.1.85.3


cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.


9.0-9.1


nvcc 9.0-9.1 is incompatible with std::tuple in gcc 6+


9.2.88
3.0-7.2
<=7.3.0 (<=7)
(15.0-)17.0
17-18.X
13.1(.6),16.1
2012-17
(3.8-)5.0
396.26
1.9.2
CUTLASS 1.0 added; std::tuple fixed (prior GCC 6 issues)


9.2.148


396.37
1.9.2


10.0.130
3.0-7.5
<=7
(15.0-)18.0
17-18.X
13.1, 16.1
2013-17
(3.8-)6.0
410.48
1.9.3
CUDA Forward Compatible Upgrade


10.1.105
3.0-7.5
<=8
(15.0-)19.0
17-19.X

2013-19
(3.8-)7.0
418.39
1.9.4


10.1.168


(3.8-)8.0
418.67

10.1 "Update 1"


10.1.243


418.87

10.1 "Update 2"


10.2.89
3.0-7.5
<=8
(15.0-)19.0
18-19.X
13.1, 16.1
2015-19
(3.3-)8.*
440.33.01
1.9.7
sm_30,35,37,50 deprecated; nvcc: -allow-unsupported-compiler


11.0.1 (RC) NVCC:11.0.167
3.5-8.0
(5-)6-9.*
(15.0-)19.1
18-20.1
13.1, 16.1
2015-19
3.2-9.0.0
450.36.06
1.9.9
macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support


11.0.2-1 NVCC:11.0.194


(3.3/)6-9.0.0
450.51.05

nvcc: --Wext-lambda-captures-this


11.0.3 NVCC:11.0.221
?
?
?
?
?
?
?
450.51.06
?
11.0 "Update 1"; nvcc: --forward-unknown-to-host-compiler, --forward-unknown-to-host-linker flags


11.1.0 NVCC:11.1.74
3.5-8.6
(5-)6-10.0
(15.0-)19.1
18-20.1
13.1, 16.1
2017-19
(3.3/)6-10.0.0
455.23.05
1.9.10-1
Ubuntu@ppc64le deprecated; CUDA Enhanced Compatibility


11.1.1 NVCC:11.1.?


?
?
?


11.2.0 NVCC:11.2.67


460.27.04
1.10.0


11.2.1 NVCC:.......


460.32.03
?
"Update 1"


11.2.2 NVCC:.......


460.32.03
?
"Update 2"


11.3.0 NVCC:....


465.19.01
?
cu++flt added, Python Driver/RT bindings, alloca()


11.4.0 NVCC:11.4.48

6.0-...


470.42.01
?
sm30,32 and Ubuntu 16.04 dropped, C++11 stdlib for math


11.4.1 NVCC:11.4.100

6.0-11.0


...-12.0
470.57.02
?
11.4 "Update 1", fix g++ 10 issues with chrono headers of libstdc++; Ubuntu 16.04 dropped (x86)


CUDA version
SM Arch
g++
icpc
pgc++
xlC
MSVC
clang++
Linux driver
thrust
note


Note: empty cells generally mean "same as above" for readability.
macOS: As of 7.0, clang seems to be the only supported compiler on OSX (but no version check found).
CUDA 10.1.243 adds support for Xcode 10.2 . CUDA 11.0 dropped macOS support.
Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.
Dynamic parallelism was added with sm_35 and CUDA 5.0.
Newer CUDA releases have a per-release support matrix for compilers, which also lists supported kernel and glibc versions: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#system-requirements
clang++ -x cuda

clang++ can compile CUDA C++ to ptx as well.
Give it a whirl!


clang++
supported CUDA release
supported SMs


3.9-5.0
7.0-8.0
2.0-(5.0)6.0


6.0
7.0-9.0
(2.0)3.0-7.0


7.0
7.0-9.2
(2.0)3.0-7.2


8.0
7.0-10.0
(2.0)3.0-7.5


9.0
7.0-10.1
(2.0)3.0-7.5


10.0
7.0-10.1
(2.0)3.0-7.5


11.0
7.0-11.0
(2.0)3.0-8.0


12.0rc5
7.0-11.0
(2.0)3.0-8.0


main
7.0-11.2
(2.0)3.0-8.6


https://llvm.org/docs/CompileCudaWithLLVM.html
Device-Side C++ Standard Support

C++ core language features:


supported C++ standard
notes


nvcc    -6.0
c++03


nvcc 6.5
c++03, exp. c++11
undocumented


nvcc 7.0-8.0
c++03,11
only c++11 switch


nvcc 9.0-10.2
c++03,11,14
10.2 adds libcu++ (atomics); open repository: https://github.com/NVIDIA/libcudacxx/releases


nvcc 11.0.167+
c++03,11,14,17
C++11 host compiler needed for math libs; ships C++11-compatible backport of the C++20 synchronization library; device LTO added; starting with CUDA Toolkit 11.0.1, nvcc and CUDA Toolkit versions are not equivalent anymore


clang 5+
c++03,11,14,17


clang 6+
c++03,11,14,17,2a


clang 10+
c++03,11,14,17,20


clang trunk
c++03,11,14,17,20
status


CUDA-enabled C++ standard library libcu++, based on LLVM's libc++ (docs):


introduced components
notes


CUDA 10.2
<atomic> (SM6.0+), <type_traits>
introduction of libcu++


CUDA 11.0
atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional> w/o function
anticipated with GTC 2020 slides


CUDA 11.2
cuda::std::tuple,pair
notes


CUDA next
cuda::std::complex, backports: chrono, type_traits
notes


newer
see the release notes and api docs
all open source now


Incremental libcu++ release goals (GTC 2020):

Version 1 (CUDA 10.2): <atomic>(SM6.0+), <type_traits>.
Version 2 (CUDA next): atomic<T>::wait/notify, <barrier>, <latch>, <counting_semaphore>(SM7.0+), <chrono>, <ratio>, <functional>minus function.
Future priorities: atomic_ref<T>, <complex>, <tuple>, <array>, <utility>, <cmath>, string processing, ...

NVC++

NVC++ is a unified C++ compiler and GPU-accelerated STL for the CUDA platform.
It also seems to support OpenACC.
NVC++ does currently not support the CUDA C++ language.


supported C++ standard
notes


nvc++  11.0
...,c++17
initial release, ships C++11-compatible backport of the C++20 synchronization library


All GPU compilers are cheese.
CUDA version	SM Arch	g++	icpc	pgc++	xlC	MSVC	clang++	Linux driver	thrust	note
1.0	1.0-1.1	?	?	?
1.1	1.0-1.1	?	?	?
2.0	1.0-1.1	?	?	?
2.1	1.0-1.3	?	?	?
2.3.1	1.0-1.3	?	?	?
3.0	1.0-2.0	?	?	?
3.1	1.0-2.0	?	?	?
3.2	1.0-2.1	?	11.1	?
4.0	1.0-2.1	<=4.4	11.1	?
4.1	1.0-2.1	<=4.5	11.1	?
4.2	1.0-2.1	<=4.6	11.1	?
5.0	1.0-3.?	<=4.6	11.1	?				?	1.5.3
5.5	1.0-3.?	<=4.8	12.1	?				?	1.7.0	C++11 on host side supported; ICC fixed to build `20110811`
6.0	1.0-5.0	<=4.8	13.1	?				331.62	1.7.1
6.5	1.1-5.X	<=4.8	14.0	?		?		?	1.7.2	experimenal device side C++11 support; including this version, `<thrust/sort.h>` skrews up `__CUDA_ARCH__` (must be undefined on host); deprecation of SM 11-13 (10 removed)
7.0.17 (RC)	s. below	<=4.9	15.0	>=14.9	13.1.1	?		346.29	1.8.0	first official PGI support, first xlc string found; powerpc64 w. little endian supported
7.0.27	2.0-5.X	<=4.9	15.0	>=14.9	13.1.1	2010-13		346.46	1.8.1	official C++11 support on device side
7.5		<=4.9	15.0	15.4	13.1	2010-13	3.5-3.6	352.41?	1.8.2	clang (host) on linux supported, `__CUDACC_VER__` macros added
7.5.18	2.0-5.X	<=4.9	15.0	15.4	13.1	2010-13		352.39	1.8.2
8.0.44	2.0-6.X	<=5.3	15.0(.4)-16.0	16(.3)+	13.1(.2)	2012-15	3.8-3.9	367.48	1.8.3-patch2	sm_60 (pascal) support added
8.0.61	2.0-6.X	<=5.3	15.0(.4)-17.0	16(.3)+	13.1(.2)	2012-15	3.8-3.9	375.26	1.8.3-patch2	nvcc 8 is incompatible with `std::tuple` in gcc 5.4+
9.0.69 (RC)	3.0-7.0	<=5.5 (<=6)	15.0(.4)-17.0	17	13.1(.2)	2012-17	3.8-3.9	???.??	1.9.0-patch4	device-side C++14; `__CUDACC_VER__` deprecated for `__CUDACC_VER_MAJOR/MINOR/BUILD__`
9.0.103 (RC)	3.0-7.0	<=5.5 (<=6)	15.0(.4)-17.0	17	13.1(.2)	2012-17	3.8-3.9	384.59	1.9.0-patch4	same as above, `__CUDACC_VER__` defined as `#error` rendering it fully broken
9.0.176	3.0-7.0	<=5.5 (<=6)	(15.0-)17.0	17.1	13.1(.5)	2012-17	(3.8-)3.9	384.81	1.9.0-patch5	same as above
9.1.85	3.0-7.2	<=5.5 (<=6)	(15.0-)17.0	17.X	13.1(.6)	2012-17	(3.8-)4.0	390.46	1.9.1-patch2	`math_functions.hpp` moved to `crt/`
9.1.85.1										cuBLAS 9.1.128: Volta GEMM kernels optimized
9.1.85.2										ptxas: fix address calculations using large immediate operands
9.1.85.3										cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.
9.0-9.1										nvcc 9.0-9.1 is incompatible with `std::tuple` in gcc 6+
9.2.88	3.0-7.2	<=7.3.0 (<=7)	(15.0-)17.0	17-18.X	13.1(.6),16.1	2012-17	(3.8-)5.0	396.26	1.9.2	CUTLASS 1.0 added; `std::tuple` fixed (prior GCC 6 issues)
9.2.148								396.37	1.9.2
10.0.130	3.0-7.5	<=7	(15.0-)18.0	17-18.X	13.1, 16.1	2013-17	(3.8-)6.0	410.48	1.9.3	CUDA Forward Compatible Upgrade
10.1.105	3.0-7.5	<=8	(15.0-)19.0	17-19.X		2013-19	(3.8-)7.0	418.39	1.9.4
10.1.168							(3.8-)8.0	418.67		10.1 "Update 1"
10.1.243								418.87		10.1 "Update 2"
10.2.89	3.0-7.5	<=8	(15.0-)19.0	18-19.X	13.1, 16.1	2015-19	(3.3-)8.*	440.33.01	1.9.7	sm_30,35,37,50 deprecated; `nvcc`: `-allow-unsupported-compiler`
11.0.1 (RC) NVCC:11.0.167	3.5-8.0	(5-)6-9.*	(15.0-)19.1	18-20.1	13.1, 16.1	2015-19	3.2-9.0.0	450.36.06	1.9.9	macOS dropped; libs drop pre-C++11, deprecate pre-C++14 (GCC < 5, Clang < 6, and MSVC < 2017); Arm C/C++ 19.2 support
11.0.2-1 NVCC:11.0.194							(3.3/)6-9.0.0	450.51.05		`nvcc`: `--Wext-lambda-captures-this`
11.0.3 NVCC:11.0.221	?	?	?	?	?	?	?	450.51.06	?	11.0 "Update 1"; `nvcc`: `--forward-unknown-to-host-compiler`, `--forward-unknown-to-host-linker` flags
11.1.0 NVCC:11.1.74	3.5-8.6	(5-)6-10.0	(15.0-)19.1	18-20.1	13.1, 16.1	2017-19	(3.3/)6-10.0.0	455.23.05	1.9.10-1	Ubuntu@ppc64le deprecated; CUDA Enhanced Compatibility
11.1.1 NVCC:11.1.?								?	?	?
11.2.0 NVCC:11.2.67								460.27.04	1.10.0
11.2.1 NVCC:.......								460.32.03	?	"Update 1"
11.2.2 NVCC:.......								460.32.03	?	"Update 2"
11.3.0 NVCC:....								465.19.01	?	`cu++flt` added, Python Driver/RT bindings, `alloca()`
11.4.0 NVCC:11.4.48		6.0-...						470.42.01	?	sm30,32 and Ubuntu 16.04 dropped, C++11 stdlib for math
11.4.1 NVCC:11.4.100		6.0-11.0					...-12.0	470.57.02	?	11.4 "Update 1", fix g++ 10 issues with chrono headers of libstdc++; Ubuntu 16.04 dropped (x86)
CUDA version	SM Arch	g++	icpc	pgc++	xlC	MSVC	clang++	Linux driver	thrust	note
clang++	supported CUDA release	supported SMs
3.9-5.0	7.0-8.0	2.0-(5.0)6.0
6.0	7.0-9.0	(2.0)3.0-7.0
7.0	7.0-9.2	(2.0)3.0-7.2
8.0	7.0-10.0	(2.0)3.0-7.5
9.0	7.0-10.1	(2.0)3.0-7.5
10.0	7.0-10.1	(2.0)3.0-7.5
11.0	7.0-11.0	(2.0)3.0-8.0
12.0rc5	7.0-11.0	(2.0)3.0-8.0
main	7.0-11.2	(2.0)3.0-8.6
	supported C++ standard	notes
nvcc -6.0	c++03
nvcc 6.5	c++03, exp. c++11	undocumented
nvcc 7.0-8.0	c++03,11	only c++11 switch
nvcc 9.0-10.2	c++03,11,14	10.2 adds `libcu++` (atomics); open repository: https://github.com/NVIDIA/libcudacxx/releases
nvcc 11.0.167+	c++03,11,14,17	C++11 host compiler needed for math libs; ships C++11-compatible backport of the C++20 synchronization library; device LTO added; starting with CUDA Toolkit 11.0.1, `nvcc` and CUDA Toolkit versions are not equivalent anymore
clang 5+	c++03,11,14,17
clang 6+	c++03,11,14,17,2a
clang 10+	c++03,11,14,17,20
clang trunk	c++03,11,14,17,20	status
	introduced components	notes
CUDA 10.2	`<atomic>` (SM6.0+), `<type_traits>`	introduction of `libcu++`
CUDA 11.0	`atomic<T>::wait/notify`, `<barrier>`, `<latch>`, `<counting_semaphore>`(SM7.0+), `<chrono>`, `<ratio>`, `<functional>` w/o `function`	anticipated with GTC 2020 slides
CUDA 11.2	`cuda::std::tuple`,`pair`	notes
CUDA next	`cuda::std::complex`, backports: `chrono`, `type_traits`	notes
newer	see the release notes and api docs	all open source now