Skip to content

Instantly share code, notes, and snippets.

@androiddrew
Last active April 13, 2023 15:13
Show Gist options
  • Star 0 You must be signed in to star a gist
  • Fork 0 You must be signed in to fork a gist
  • Save androiddrew/9470fc5cfde190a71a5971abc7c2aa9f to your computer and use it in GitHub Desktop.
Save androiddrew/9470fc5cfde190a71a5971abc7c2aa9f to your computer and use it in GitHub Desktop.
Bits and Bytes on Jetson Orin
mkdir -p build
mkdir -p dependencies
ENVIRONMENT
============================
CUDA_VERSION: 114
============================
NVCC path: /usr/local/cuda-11.4/bin/nvcc
GPP path: /usr/bin/g++ VERSION: g++ (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
CUDA_HOME: /usr/local/cuda-11.4
CONDA_PREFIX:
PATH: /home/toor/.cargo/bin:/usr/local/cuda-11.4/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
LD_LIBRARY_PATH: /usr/local/cuda-11.4/lib64:
============================
/usr/local/cuda-11.4/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' --use_fast_math -Xptxas=-v -dc /home/toor/workspace/bitsandbytes_jetsonX2/csrc/ops.cu /home/toor/workspace/bitsandbytes_jetsonX2/csrc/kernels.cu -I /usr/local/cuda-11.4/include -I /home/toor/workspace/bitsandbytes_jetsonX2/csrc -I /include -I /home/toor/workspace/bitsandbytes_jetsonX2/include -L /usr/local/cuda-11.4/lib64 -lcudart -lcublas -lcublasLt -lcurand -lcusparse -L /lib --output-directory /home/toor/workspace/bitsandbytes_jetsonX2/build
ptxas info : 11 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : 11 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : 11 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas info : 11 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_75'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
40 bytes stack frame, 56 bytes spill stores, 84 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_75'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi
48 bytes stack frame, 56 bytes spill stores, 84 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi64ELi64ELi1EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi64ELi64ELi1EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi64ELi64ELi1EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi64ELi64ELi1EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi128ELi64ELi2EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi128ELi64ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi128ELi64ELi2EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi128ELi64ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi256ELi128ELi2EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi256ELi128ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi256ELi128ELi2EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi256ELi128ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi256ELi2EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi256ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi256ELi2EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi256ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi1024ELi256ELi4EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi1024ELi256ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi1024ELi256ELi4EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi1024ELi256ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi2048ELi512ELi4EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi2048ELi512ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi2048ELi512ELi4EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi2048ELi512ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi4096ELi1024ELi4EEvPfPhS0_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi4096ELi1024ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi4096ELi1024ELi4EEvPfPhS1_PT_i' for 'sm_75'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi4096ELi1024ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi1ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi1ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 49 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 42 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 58 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_75'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii' for 'sm_75'
ptxas info : Function properties for _Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_75'
ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 116 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 115 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKfffffifPfS5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKfffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKfffffifPfS6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKfffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKfffffifPfS5_S5_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKfffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKfffffifPfS6_S6_ffi' for 'sm_75'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKfffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPfffiS3_S3_S3_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPfffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPfffiS4_S4_S4_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPfffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 69 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPfffiS3_S3_S3_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPfffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPfffiS4_S4_S4_ffi' for 'sm_75'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPfffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 53 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 57 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_fffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_fffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 49 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_fffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_fffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_fffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 54 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_fffffiffbi' for 'sm_75'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 53 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 53 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 49 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_75'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 45 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6__halfEvPT_PffS1_i' for 'sm_75'
ptxas info : Function properties for _Z18kEstimateQuantilesI6__halfEvPT_PffS1_i
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_75'
ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii' for 'sm_75'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii' for 'sm_75'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 34 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii' for 'sm_75'
ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 42 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 43 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_75'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 63 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_75'
ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_75'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_75'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_75'
ptxas info : Function properties for _Z11kDequantizePfPhS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_75'
ptxas info : Function properties for _Z9kQuantizePfS_Phi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 52 registers, 21520 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_75'
ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 392 bytes cmem[0]
ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMinPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMaxPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas info : 11 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_80'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
72 bytes stack frame, 76 bytes spill stores, 128 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_80'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi
56 bytes stack frame, 76 bytes spill stores, 112 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi64ELi64ELi1EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi64ELi64ELi1EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi64ELi64ELi1EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi64ELi64ELi1EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi128ELi64ELi2EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi128ELi64ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi128ELi64ELi2EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi128ELi64ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi256ELi128ELi2EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi256ELi128ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi256ELi128ELi2EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi256ELi128ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi256ELi2EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi256ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi256ELi2EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi256ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi1024ELi256ELi4EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi1024ELi256ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi1024ELi256ELi4EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi1024ELi256ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi2048ELi512ELi4EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi2048ELi512ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi2048ELi512ELi4EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi2048ELi512ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi4096ELi1024ELi4EEvPfPhS0_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi4096ELi1024ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi4096ELi1024ELi4EEvPfPhS1_PT_i' for 'sm_80'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi4096ELi1024ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi1ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi1ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_80'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii' for 'sm_80'
ptxas info : Function properties for _Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_80'
ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 31 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 116 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 116 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKfffffifPfS5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKfffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKfffffifPfS6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKfffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKfffffifPfS5_S5_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKfffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKfffffifPfS6_S6_ffi' for 'sm_80'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKfffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 54 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPfffiS3_S3_S3_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPfffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPfffiS4_S4_S4_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPfffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 66 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPfffiS3_S3_S3_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPfffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPfffiS4_S4_S4_ffi' for 'sm_80'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPfffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_fffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_fffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_fffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_fffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_fffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_fffffiffbi' for 'sm_80'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_80'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6__halfEvPT_PffS1_i' for 'sm_80'
ptxas info : Function properties for _Z18kEstimateQuantilesI6__halfEvPT_PffS1_i
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 81 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_80'
ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii' for 'sm_80'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii' for 'sm_80'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii' for 'sm_80'
ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_80'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_80'
ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 13 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_80'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_80'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_80'
ptxas info : Function properties for _Z11kDequantizePfPhS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_80'
ptxas info : Function properties for _Z9kQuantizePfS_Phi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_80'
ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 392 bytes cmem[0]
ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMinPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMaxPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas warning : Value of threads per SM for entry _Z9kQuantizePfS_Phi is out of range. .minnctapersm will be ignored
ptxas info : 11 bytes gmem
ptxas info : Compiling entry function '_ZN3cub11EmptyKernelIvEEvv' for 'sm_86'
ptxas info : Function properties for _ZN3cub11EmptyKernelIvEEvv
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 4 registers, 352 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi4ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi4ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi2ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi2ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseI6__halfLi1ELi2048ELi8EEvPT_S2_PhfffifPfS4_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit1StateBlockwiseIfLi1ELi2048ELi8EEvPT_S1_PhfffifPfS3_ffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 80 registers, 432 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseI6__halfLi0ELi2048ELi8EEvPT_S2_PhS3_fffifPfS4_S4_S4_ffbi
72 bytes stack frame, 76 bytes spill stores, 128 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi' for 'sm_86'
ptxas info : Function properties for _Z35kOptimizerStatic8bit2StateBlockwiseIfLi0ELi2048ELi8EEvPT_S1_PhS2_fffifPfS3_S3_S3_ffbi
56 bytes stack frame, 76 bytes spill stores, 112 bytes spill loads
ptxas info : Used 80 registers, 456 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi64ELi64ELi1EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi64ELi64ELi1EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi64ELi64ELi1EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi64ELi64ELi1EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi128ELi64ELi2EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi128ELi64ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi128ELi64ELi2EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi128ELi64ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi256ELi128ELi2EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi256ELi128ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi256ELi128ELi2EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi256ELi128ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi512ELi256ELi2EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi512ELi256ELi2EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi512ELi256ELi2EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi512ELi256ELi2EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 24 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi1024ELi256ELi4EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi1024ELi256ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi1024ELi256ELi4EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi1024ELi256ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi2048ELi512ELi4EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi2048ELi512ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi2048ELi512ELi4EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi2048ELi512ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseIfLi4096ELi1024ELi4EEvPfPhS0_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseIfLi4096ELi1024ELi4EEvPfPhS0_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 27 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z20kDequantizeBlockwiseI6__halfLi4096ELi1024ELi4EEvPfPhS1_PT_i' for 'sm_86'
ptxas info : Function properties for _Z20kDequantizeBlockwiseI6__halfLi4096ELi1024ELi4EEvPfPhS1_PT_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi64ELi1ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi64ELi1ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi64ELi1ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi64ELi1ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 30 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi128ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi128ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi128ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi256ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi256ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi256ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi512ELi2ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi512ELi2ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi512ELi2ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 39 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi1024ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi1024ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi1024ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi2048ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi2048ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi2048ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi1EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi1EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi1EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseIfLi4096ELi4ELi0EEvPfPT_S0_PhS0_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseIfLi4096ELi4ELi0EEvPfPT_S0_PhS0_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0EEvPfPT_S1_PhS1_ii' for 'sm_86'
ptxas info : Function properties for _Z18kQuantizeBlockwiseI6__halfLi4096ELi4ELi0EEvPfPT_S1_PhS1_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 400 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii' for 'sm_86'
ptxas info : Function properties for _Z19kPercentileClippingI6__halfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii' for 'sm_86'
ptxas info : Function properties for _Z19kPercentileClippingIfLi2048ELi4EEvPT_Pfii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 376 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PKffffffifPfS5_S5_S5_S5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PKffffffifPfS6_S6_S6_S6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 484 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateIfLi0EEvPT_S1_PhS2_PffffiS3_S3_S3_S3_S3_S3_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 116 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit2StateI6__halfLi0EEvPT_S2_PhS3_PffffiS4_S4_S4_S4_S4_S4_fi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 116 registers, 464 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKfffffifPfS5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPKfffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKfffffifPfS6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPKfffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 43 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKfffffifPfS5_S5_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPKfffffifPfS5_S5_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 45 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKfffffifPfS6_S6_ffi' for 'sm_86'
ptxas info : Function properties for _Z26kOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPKfffffifPfS6_S6_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 444 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPfffiS3_S3_S3_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi2EEvPT_S1_PhPfffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPfffiS4_S4_S4_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi2EEvPT_S2_PhPfffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 66 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPfffiS3_S3_S3_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateIfLi1EEvPT_S1_PhPfffiS3_S3_S3_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPfffiS4_S4_S4_ffi' for 'sm_86'
ptxas info : Function properties for _Z38kPreconditionOptimizerStatic8bit1StateI6__halfLi1EEvPT_S2_PhPfffiS4_S4_S4_ffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit2StateIfLi0EEvPT_S1_PfS2_S2_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit2StateI6__halfLi0EEvPT_S2_PfS3_S3_ffffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 64 registers, 436 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateIfLi0ELi4096ELi8EEvPT_S1_PfS2_S2_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 56 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit2StateI6__halfLi0ELi4096ELi8EEvPT_S2_PfS3_S3_ffffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 55 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_fffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi4EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_fffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi4EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_fffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi2EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_fffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi2EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_fffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateIfLi1EEvPT_S1_PfS2_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_fffffiffbi' for 'sm_86'
ptxas info : Function properties for _Z21kOptimizer32bit1StateI6__halfLi1EEvPT_S2_PfS3_fffffiffbi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 50 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi4ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi4ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi2ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi2ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_fffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateIfLi1ELi4096ELi8EEvPT_S1_PfS2_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 47 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_fffiffi' for 'sm_86'
ptxas info : Function properties for _Z33kPreconditionOptimizer32bit1StateI6__halfLi1ELi4096ELi8EEvPT_S2_PfS3_fffiffi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 44 registers, 412 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesI6__halfEvPT_PffS1_i' for 'sm_86'
ptxas info : Function properties for _Z18kEstimateQuantilesI6__halfEvPT_PffS1_i
16 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 81 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kEstimateQuantilesIfEvPT_PffS0_i' for 'sm_86'
ptxas info : Function properties for _Z18kEstimateQuantilesIfEvPT_PffS0_i
32 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 82 registers, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii' for 'sm_86'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi1EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 38 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii' for 'sm_86'
ptxas info : Function properties for _Z18kDoubleRowColQuantILi64ELi4ELi16ELi256ELi0EEvP6__halfPfS2_PaS3_PiS4_S1_S4_fiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 36 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii' for 'sm_86'
ptxas info : Function properties for _Z22kdequant_mm_int32_fp16ILi4ELi128ELi512EEvPiPfS1_P6__halfS1_S1_S3_iiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 37 registers, 424 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 32 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi4EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 29 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi3EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi1ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z21kTransformRowToFormatILi256ELi8ELi32ELi256ELi0ELi2EEvPcS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 48 registers, 388 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi32ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi16ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveIaLi8ELi8EEvPiS0_S0_S0_S0_P6__halfPT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi32ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi16ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii' for 'sm_86'
ptxas info : Function properties for _Z27kspmm_coo_very_sparse_naiveI6__halfLi8ELi16EEvPiS1_S1_S1_S1_PS0_PT_S2_Pfiiii
192 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 40 registers, 440 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi4EEvPcPiS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z16kExtractOutliersILi4EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z16kExtractOutliersILi3EEvPcPiS0_iiiii' for 'sm_86'
ptxas info : Function properties for _Z16kExtractOutliersILi3EEvPcPiS0_iiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 13 registers, 396 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii' for 'sm_86'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi1EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii' for 'sm_86'
ptxas info : Function properties for _Z15kgetColRowStatsI6__halfLi64ELi4ELi16ELi256ELi0EEvPT_PfS3_Pifiiii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 28 registers, 404 bytes cmem[0]
ptxas info : Compiling entry function '_Z11kDequantizePfPhS_i' for 'sm_86'
ptxas info : Function properties for _Z11kDequantizePfPhS_i
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 12 registers, 1024 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z9kQuantizePfS_Phi' for 'sm_86'
ptxas info : Function properties for _Z9kQuantizePfS_Phi
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 51 registers, 21520 bytes smem, 380 bytes cmem[0]
ptxas info : Compiling entry function '_Z22kHistogramScatterAdd2DPfPiS0_S_ii' for 'sm_86'
ptxas info : Function properties for _Z22kHistogramScatterAdd2DPfPiS0_S_ii
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 14 registers, 392 bytes cmem[0]
ptxas info : Function properties for _Z9dQuantizeILi1EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9dQuantizeILi0EEhPfff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMinPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Function properties for _Z9atomicMaxPff
0 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
/usr/local/cuda-11.4/bin/nvcc -gencode arch=compute_75,code=sm_75 -gencode arch=compute_80,code=sm_80 -gencode arch=compute_86,code=sm_86 -Xcompiler '-fPIC' -dlink /home/toor/workspace/bitsandbytes_jetsonX2/build/ops.o /home/toor/workspace/bitsandbytes_jetsonX2/build/kernels.o -o /home/toor/workspace/bitsandbytes_jetsonX2/build/link.o
/usr/bin/g++ -std=c++14 -DBUILD_CUDA -shared -fPIC -I /usr/local/cuda-11.4/include -I /home/toor/workspace/bitsandbytes_jetsonX2/csrc -I /include -I /home/toor/workspace/bitsandbytes_jetsonX2/include /home/toor/workspace/bitsandbytes_jetsonX2/build/ops.o /home/toor/workspace/bitsandbytes_jetsonX2/build/kernels.o /home/toor/workspace/bitsandbytes_jetsonX2/build/link.o /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.cpp /home/toor/workspace/bitsandbytes_jetsonX2/csrc/cpu_ops.cpp /home/toor/workspace/bitsandbytes_jetsonX2/csrc/pythonInterface.c -o ./bitsandbytes/libbitsandbytes_cuda114.so -L /usr/local/cuda-11.4/lib64 -lcudart -lcublas -lcublasLt -lcurand -lcusparse -L /lib
In file included from /home/toor/workspace/bitsandbytes_jetsonX2/include/BinSearch.h:5,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.h:1,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.cpp:1:
/home/toor/workspace/bitsandbytes_jetsonX2/include/SIMD.h:32:2: warning: #warning "--- THIS IS AARCH64" [-Wcpp]
32 | #warning "--- THIS IS AARCH64"
| ^~~~~~~
In file included from /home/toor/workspace/bitsandbytes_jetsonX2/include/SIMD.h:33,
from /home/toor/workspace/bitsandbytes_jetsonX2/include/BinSearch.h:5,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.h:1,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.cpp:1:
/home/toor/workspace/bitsandbytes_jetsonX2/include/sse2neon.h:79: warning: "FORCE_INLINE" redefined
79 | #define FORCE_INLINE static inline __attribute__((always_inline))
|
In file included from /home/toor/workspace/bitsandbytes_jetsonX2/include/AAlloc.h:3,
from /home/toor/workspace/bitsandbytes_jetsonX2/include/BinSearch.h:3,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.h:1,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/common.cpp:1:
/home/toor/workspace/bitsandbytes_jetsonX2/include/Portable.h:80: note: this is the location of the previous definition
80 | # define FORCE_INLINE __attribute__((always_inline)) inline
|
In file included from /home/toor/workspace/bitsandbytes_jetsonX2/include/BinSearch.h:5,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/cpu_ops.cpp:1:
/home/toor/workspace/bitsandbytes_jetsonX2/include/SIMD.h:32:2: warning: #warning "--- THIS IS AARCH64" [-Wcpp]
32 | #warning "--- THIS IS AARCH64"
| ^~~~~~~
In file included from /home/toor/workspace/bitsandbytes_jetsonX2/include/SIMD.h:33,
from /home/toor/workspace/bitsandbytes_jetsonX2/include/BinSearch.h:5,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/cpu_ops.cpp:1:
/home/toor/workspace/bitsandbytes_jetsonX2/include/sse2neon.h:79: warning: "FORCE_INLINE" redefined
79 | #define FORCE_INLINE static inline __attribute__((always_inline))
|
In file included from /home/toor/workspace/bitsandbytes_jetsonX2/include/AAlloc.h:3,
from /home/toor/workspace/bitsandbytes_jetsonX2/include/BinSearch.h:3,
from /home/toor/workspace/bitsandbytes_jetsonX2/csrc/cpu_ops.cpp:1:
/home/toor/workspace/bitsandbytes_jetsonX2/include/Portable.h:80: note: this is the location of the previous definition
80 | # define FORCE_INLINE __attribute__((always_inline)) inline
import bitsandbytes as bnb
import torch
p = torch.nn.Parameter(torch.rand(10, 10).cuda())
a = torch.rand(10, 10).cuda()
p1 = p.data.sum().item()
adam = bnb.optim.Adam([p])
out = a * p
loss = out.sum()
loss.backward()
adam.step()
p2 = p.data.sum().item()
assert p1 != p2
print("SUCCESS!")
print("Installation was successful!")

Git clone

git clone https://github.com/g588928812/bitsandbytes_jetsonX.git
cd bitsandbytes_jetsonX/
CUDA_VERSION=114 make cuda11x

Create a python wheel

python -m build

Create a new virtualenv

python3 -m venv env 
source env/bin/activate

Pip install all deps for the check script, using correct torch wheel from jetson zoo

pip install dist/bitsandbytes-0.37.2-py3-none-any.whl 
pip install ~/jetpack_5_0_wheels/torch-1.13.0a0+340c4120.nv22.06-cp38-cp38-linux_aarch64.whl 
pip install numpy

Attempt to check installation

python ~/check_bits_and_bytes.py

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.4/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 8.7
CUDA SETUP: Detected CUDA version 114
CUDA SETUP: Loading binary /home/toor/workspace/bitsandbytes_jetsonX2/env2/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda114.so...
Error invalid device function at line 119 in file /home/toor/workspace/bitsandbytes_jetsonX2/csrc/ops.cu

@g588928812
Copy link

Apparently something in kOptimizer32bit2State fails (called on line 118 in ops.cu). It works on my Jetson Xavier (see output below), it might be related to cuBLASLt (mine is compiled without since low compute cap.) It might also be caused by something i've changed. I haven't touched anything in kOptimizer32bit2State but I cannot exclude that something I did elsewhere causes this. Unfortunately I only have the Xavier available so I cannot debug it.

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11.4/lib64/libcudart.so
CUDA SETUP: Highest compute capability among GPUs detected: 7.2
CUDA SETUP: Detected CUDA version 114
/home/g/.local/lib/python3.8/site-packages/bitsandbytes-0.37.2-py3.8.egg/bitsandbytes/cuda_setup/main.py:136: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
CUDA SETUP: Loading binary /home/g/.local/lib/python3.8/site-packages/bitsandbytes-0.37.2-py3.8.egg/bitsandbytes/libbitsandbytes_cuda114_nocublaslt.so...
SUCCESS!
Installation was successful!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment