fabiencastan/CUDA_Compilers.md

## CUDA_Compilers.md

      
    Raw
  

              CUDA_Compilers.md
            
          
    In general, check the host_config.h file to find out which versions are supported.
Sometimes it is possible to hack the requirements there to get some newer versions working, too :)
Thrust version can be found in $CUDA_ROOT/include/thrust/version.h.
Release notes for CUDA >= 5.5 are stored under http://developer.download.nvidia.com/compute/cuda/X_Y/rel/docs/CUDA_Toolkit_Release_Notes.pdf
nvcc

Latest, officical Compiler requirements: http://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html


CUDA version
SM Arch
g++
icpc
pgc++
xlC
MSC
clang++
driver
thrust
note


3.2
1.0-?
?
11.1
?


4.0
1.0-?
<=4.4
11.1
?


4.1
1.0-?
<=4.5
11.1
?


4.2
1.0-?
<=4.6
11.1
?


5.0
1.0-3.X
<=4.6
11.1
?


?
1.5.3


5.5
1.0-?
<=4.8
12.1
?


?
1.7.0
C++11 on host side supported; ICC fixed to build 20110811


6.0
1.0-5.X
<=4.8
13.1
?


331.62
1.7.1


6.5
1.1-5.X
<=4.8
14.0
?

?

?
1.7.2
experimenal device side C++11 support; including this version, <thrust/sort.h> skrews up __CUDA_ARCH__ (must be undefined on host)


7.0.17 (RC)
?
<=4.9
15.0
>=14.9
13.1.1
?

346.29
1.8.0
first official PGI support, first xlc string found; powerpc64 w. little endian supported


7.0.27
2.0-5.X
<=4.9
15.0
>=14.9
13.1.1
2010-13

346.46
1.8.1
official C++11 support on device side


7.5

<=4.9
15.0
15.4
13.1
2010-13
3.5-3.6
352.41?
1.8.2
clang (host) on linux supported, __CUDACC_VER__ macros added


7.5.18
2.0-5.X
<=4.9
15.0
15.4
13.1
2010-13

352.39
1.8.2


8.0.44
2.0-6.X
<=5.3
15.0(.4)-16.0
16(.3)+
13.1(.2)
2012-15
3.8-3.9
367.48
1.8.3-patch2
sm_60 (pascal) support added


8.0.61
2.0-6.X
<=5.3
15.0(.4)-17.0
16(.3)+
13.1(.2)
2012-15
3.8-3.9
375.26
1.8.3-patch2
nvcc 8 is incompatible with std::tuple in gcc 5.4+


9.0.69 (RC)
3.0-7.X
(<=5.5) <=6
15.0(.4)-17.0
17
13.1(.2)
2012-17
3.8-3.9
???.??
1.9.0-patch4
device-side C++14; sm_70 (volta) support added, sm_2X (Fermi) dropped; __CUDACC_VER__ deprecated for __CUDACC_VER_MAJOR/MINOR/BUILD__


9.0.103 (RC)
3.0-7.X
(<=5.5) <=6
15.0(.4)-17.0
17
13.1(.2)
2012-17
3.8-3.9
384.59
1.9.0-patch4
same as above, __CUDACC_VER__ defined as #error rendering it fully broken


9.0.176
3.0-7.X
(<=5.5) <=6
(15.0-)17.0
17.1
13.1(.5)
2012-17
(3.8-)3.9
384.81
1.9.0-patch5
same as above


9.1.85
3.0-7.X
(<=5.5) <=6
(15.0-)17.0
17.X
13.1(.6)
2012-17
(3.8-)4.0
387.26
1.9.1-patch2
math_functions.hpp moved to crt/


9.1.85.1


cuBLAS 9.1.128: Volta GEMM kernels optimized


9.1.85.2


ptxas: fix address calculations using large immediate operands


9.1.85.3


cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.


9.0-9.1


nvcc 9.0-9.1 is incompatible with std::tuple in gcc 6+


9.2.88
3.0-7.X
<=7
(15.0-)17.0
17-18.X
13.1(.6),16.1
2012-17
(3.8-)5.0
396.26
1.9.2
CUTLASS 1.0 added; std::tuple fixed (prior GCC 6 issues)


as of 7.0, clang seems to be the only supported compiler on OSX (but no version check found).
Compilers such as pgC, icc, xlC are only supported on x86 linux and little endian.
dynamic parallelism was added with sm_35 and CUDA 5.0.
clang++ -x cuda

clang++ can compile CUDA C++ to ptx as well.
Give it a whirl!


clang++
supported CUDA release
supported SMs


3.9-5.0
7.0-8.0
2.0-(5.0)6.0


6.0
7.0-9.0
(2.0)3.0-7.0


trunk
7.0-9.2
(2.0)3.0-7.2


https://llvm.org/docs/CompileCudaWithLLVM.html
CUDA version	SM Arch	g++	icpc	pgc++	xlC	MSC	clang++	driver	thrust	note
3.2	1.0-?	?	11.1	?
4.0	1.0-?	<=4.4	11.1	?
4.1	1.0-?	<=4.5	11.1	?
4.2	1.0-?	<=4.6	11.1	?
5.0	1.0-3.X	<=4.6	11.1	?				?	1.5.3
5.5	1.0-?	<=4.8	12.1	?				?	1.7.0	C++11 on host side supported; ICC fixed to build `20110811`
6.0	1.0-5.X	<=4.8	13.1	?				331.62	1.7.1
6.5	1.1-5.X	<=4.8	14.0	?		?		?	1.7.2	experimenal device side C++11 support; including this version, `<thrust/sort.h>` skrews up `__CUDA_ARCH__` (must be undefined on host)
7.0.17 (RC)	?	<=4.9	15.0	>=14.9	13.1.1	?		346.29	1.8.0	first official PGI support, first xlc string found; powerpc64 w. little endian supported
7.0.27	2.0-5.X	<=4.9	15.0	>=14.9	13.1.1	2010-13		346.46	1.8.1	official C++11 support on device side
7.5		<=4.9	15.0	15.4	13.1	2010-13	3.5-3.6	352.41?	1.8.2	clang (host) on linux supported, `__CUDACC_VER__` macros added
7.5.18	2.0-5.X	<=4.9	15.0	15.4	13.1	2010-13		352.39	1.8.2
8.0.44	2.0-6.X	<=5.3	15.0(.4)-16.0	16(.3)+	13.1(.2)	2012-15	3.8-3.9	367.48	1.8.3-patch2	sm_60 (pascal) support added
8.0.61	2.0-6.X	<=5.3	15.0(.4)-17.0	16(.3)+	13.1(.2)	2012-15	3.8-3.9	375.26	1.8.3-patch2	nvcc 8 is incompatible with `std::tuple` in gcc 5.4+
9.0.69 (RC)	3.0-7.X	(<=5.5) <=6	15.0(.4)-17.0	17	13.1(.2)	2012-17	3.8-3.9	???.??	1.9.0-patch4	device-side C++14; sm_70 (volta) support added, sm_2X (Fermi) dropped; `__CUDACC_VER__` deprecated for `__CUDACC_VER_MAJOR/MINOR/BUILD__`
9.0.103 (RC)	3.0-7.X	(<=5.5) <=6	15.0(.4)-17.0	17	13.1(.2)	2012-17	3.8-3.9	384.59	1.9.0-patch4	same as above, `__CUDACC_VER__` defined as `#error` rendering it fully broken
9.0.176	3.0-7.X	(<=5.5) <=6	(15.0-)17.0	17.1	13.1(.5)	2012-17	(3.8-)3.9	384.81	1.9.0-patch5	same as above
9.1.85	3.0-7.X	(<=5.5) <=6	(15.0-)17.0	17.X	13.1(.6)	2012-17	(3.8-)4.0	387.26	1.9.1-patch2	`math_functions.hpp` moved to `crt/`
9.1.85.1										cuBLAS 9.1.128: Volta GEMM kernels optimized
9.1.85.2										ptxas: fix address calculations using large immediate operands
9.1.85.3										cuBLAS: fixes to GEMM optimizations for convolutional sequence to sequence (seq2seq) models.
9.0-9.1										nvcc 9.0-9.1 is incompatible with `std::tuple` in gcc 6+
9.2.88	3.0-7.X	<=7	(15.0-)17.0	17-18.X	13.1(.6),16.1	2012-17	(3.8-)5.0	396.26	1.9.2	CUTLASS 1.0 added; `std::tuple` fixed (prior GCC 6 issues)
clang++	supported CUDA release	supported SMs
3.9-5.0	7.0-8.0	2.0-(5.0)6.0
6.0	7.0-9.0	(2.0)3.0-7.0
trunk	7.0-9.2	(2.0)3.0-7.2