Skip to content

Instantly share code, notes, and snippets.

@bhargav
Created September 17, 2023 18:41
Show Gist options
  • Save bhargav/7f8c2984ba32ff99ce8e93433d9059a6 to your computer and use it in GitHub Desktop.
Save bhargav/7f8c2984ba32ff99ce8e93433d9059a6 to your computer and use it in GitHub Desktop.
2023-09 ctransformer amd build failure
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
Collecting ctransformers
Using cached ctransformers-0.2.27.tar.gz (376 kB)
Installing build dependencies: started
Installing build dependencies: finished with status 'done'
Getting requirements to build wheel: started
Getting requirements to build wheel: finished with status 'done'
Preparing metadata (pyproject.toml): started
Preparing metadata (pyproject.toml): finished with status 'done'
Collecting py-cpuinfo<10.0.0,>=9.0.0
Using cached py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Collecting huggingface-hub
Using cached huggingface_hub-0.17.1-py3-none-any.whl (294 kB)
Collecting fsspec
Using cached fsspec-2023.9.1-py3-none-any.whl (173 kB)
Collecting packaging>=20.9
Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting pyyaml>=5.1
Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Collecting typing-extensions>=3.7.4.3
Using cached typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Collecting filelock
Using cached filelock-3.12.4-py3-none-any.whl (11 kB)
Collecting requests
Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting tqdm>=4.42.1
Using cached tqdm-4.66.1-py3-none-any.whl (78 kB)
Collecting charset-normalizer<4,>=2
Using cached charset_normalizer-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (201 kB)
Collecting urllib3<3,>=1.21.1
Using cached urllib3-2.0.4-py3-none-any.whl (123 kB)
Collecting certifi>=2017.4.17
Using cached certifi-2023.7.22-py3-none-any.whl (158 kB)
Collecting idna<4,>=2.5
Using cached idna-3.4-py3-none-any.whl (61 kB)
Building wheels for collected packages: ctransformers
Building wheel for ctransformers (pyproject.toml): started
Building wheel for ctransformers (pyproject.toml): finished with status 'error'
error: subprocess-exited-with-error
× Building wheel for ctransformers (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [2479 lines of output]
--------------------------------------------------------------------------------
-- Trying 'Ninja' generator
--------------------------------
---------------------------
----------------------
-----------------
------------
-------
--
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Not searching for unused variables given on the command line.
-- The C compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- The CXX compiler identification is Clang 16.0.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done (0.6s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_cmake_test_compile/build
--
-------
------------
-----------------
----------------------
---------------------------
--------------------------------
-- Trying 'Ninja' generator - success
--------------------------------------------------------------------------------
Configuring Project
Working directory:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-build
Command:
/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/cmake/data/bin/cmake /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5 -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-install -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/usr/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/usr/include/python3.10 -DPython_EXECUTABLE:PATH=/usr/bin/python3 -DPython_ROOT_DIR:PATH=/usr -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/usr/include/python3.10 -DPython3_EXECUTABLE:PATH=/usr/bin/python3 -DPython3_ROOT_DIR:PATH=/usr -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/usr/include/python3.10 -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -DCT_HIPBLAS=1 -DCMAKE_BUILD_TYPE:STRING=Release
Not searching for unused variables given on the command line.
-- The C compiler identification is Clang 16.0.0
-- The CXX compiler identification is Clang 16.0.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- CT_INSTRUCTIONS: avx2
-- CT_CUBLAS: OFF
-- CT_HIPBLAS: 1
-- CT_METAL: OFF
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- x86 detected
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
CMakeLists.txt:177 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
-- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
Compatibility with CMake < 3.5 will be removed from a future version of
CMake.
Update the VERSION argument <min> value or use a ...<max> suffix to tell
CMake that the project does not need compatibility with older versions.
Call Stack (most recent call first):
/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
/opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
CMakeLists.txt:178 (find_package)
-- hip::amdhip64 is SHARED_LIBRARY
-- HIP and hipBLAS found
-- Configuring done (0.9s)
-- Generating done (0.0s)
-- Build files have been written to: /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-build
[1/8] Building C object CMakeFiles/ctransformers.dir/models/ggml/ggml-alloc.c.o
[2/8] Building C object CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o
FAILED: CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o
/opt/rocm/llvm/bin/clang -DCC_TURING=1000000000 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMQ_Y=64 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dctransformers_EXPORTS -I/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models -isystem /opt/rocm/include -isystem /opt/rocm-5.6.0/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -mfma -mavx2 -mf16c -mavx -MD -MT CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o -MF CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o.d -o CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o -c /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:252:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.h:46:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:37: error: array has incomplete element type 'struct cudaDeviceProp'
struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:10: note: forward declaration of 'struct cudaDeviceProp'
struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2413:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
GGML_F16_VEC_REDUCE(sumf, sum);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2045:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
#define GGML_F16_VEC_REDUCE GGML_F32Cx8_REDUCE
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2035:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
#define GGML_F32Cx8_REDUCE GGML_F32x8_REDUCE
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:1981:11: note: expanded from macro 'GGML_F32x8_REDUCE'
res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1)); \
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:3456:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2045:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
#define GGML_F16_VEC_REDUCE GGML_F32Cx8_REDUCE
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2035:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
#define GGML_F32Cx8_REDUCE GGML_F32x8_REDUCE
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:1981:11: note: expanded from macro 'GGML_F32x8_REDUCE'
res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1)); \
~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 warnings and 1 error generated.
[3/8] Building C object CMakeFiles/ctransformers.dir/models/ggml/k_quants.c.o
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/k_quants.c:186:11: warning: variable 'sum_x' set but not used [-Wunused-but-set-variable]
float sum_x = 0;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/k_quants.c:187:11: warning: variable 'sum_x2' set but not used [-Wunused-but-set-variable]
float sum_x2 = 0;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/k_quants.c:182:14: warning: unused function 'make_qkx1_quants' [-Wunused-function]
static float make_qkx1_quants(int n, int nmax, const float * restrict x, uint8_t * restrict L, float * restrict the_min,
^
3 warnings generated.
[4/8] Building CXX object CMakeFiles/ctransformers.dir/models/llm.cc.o
FAILED: CMakeFiles/ctransformers.dir/models/llm.cc.o
/opt/rocm/llvm/bin/clang++ -DCC_TURING=1000000000 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMQ_Y=64 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dctransformers_EXPORTS -I/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models -isystem /opt/rocm/include -isystem /opt/rocm-5.6.0/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -mfma -mavx2 -mf16c -mavx -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT CMakeFiles/ctransformers.dir/models/llm.cc.o -MF CMakeFiles/ctransformers.dir/models/llm.cc.o.d -o CMakeFiles/ctransformers.dir/models/llm.cc.o -c /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:1:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.h:4:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/common.h:24:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.h:46:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:25: error: field has incomplete type 'struct cudaDeviceProp'
struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:10: note: forward declaration of 'cudaDeviceProp'
struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
^
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:1:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.h:4:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/common.h:239:35: warning: braces around scalar initializer [-Wbraced-scalar-init]
return ct_new_tensor(ctx, type, {x}, gpu);
^~~
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:1:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.h:28:58: warning: unused parameter 'add_bos_token' [-Wunused-parameter]
const bool add_bos_token) const {
^
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:7:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:5:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/llama.cpp:6:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/llama.h:31:13: warning: 'DEPRECATED' macro redefined [-Wmacro-redefined]
# define DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
^
/opt/rocm/include/hip/hip_runtime_api.h:494:9: note: previous definition is here
#define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
^
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:7:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:7:51: warning: unused parameter 'level' [-Wunused-parameter]
static void ct_llama_log_callback(llama_log_level level, const char *text,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:7:70: warning: unused parameter 'text' [-Wunused-parameter]
static void ct_llama_log_callback(llama_log_level level, const char *text,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:8:41: warning: unused parameter 'user_data' [-Wunused-parameter]
void *user_data) {}
^
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:9:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/replit.cc:631:50: warning: unused parameter 'add_bos_token' [-Wunused-parameter]
const bool add_bos_token) const override {
^
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:15:
In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/falcon.cc:5:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/libfalcon.cpp:18:10: fatal error: 'cuda_runtime.h' file not found
#include <cuda_runtime.h>
^~~~~~~~~~~~~~~~
7 warnings and 2 errors generated when compiling for gfx1030.
[5/8] Building CXX object CMakeFiles/ctransformers.dir/models/ggml/cmpnct_unicode.cpp.o
[6/8] Building CXX object CMakeFiles/ctransformers.dir/models/ggml/ggml-cuda.cu.o
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
85 warnings generated when compiling for gfx1030.
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
85 warnings generated when compiling for gfx900.
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
85 warnings generated when compiling for gfx906.
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
85 warnings generated when compiling for gfx908.
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
85 warnings generated when compiling for gfx90a.
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
const float * x_dmf = (float *) x_dm;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
const block_q4_0 * bx0 = (block_q4_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
const block_q4_1 * bx0 = (block_q4_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
const block_q5_0 * bx0 = (block_q5_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
const block_q5_1 * bx0 = (block_q5_1 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
const block_q8_0 * bx0 = (block_q8_0 *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
const block_q2_K * bx0 = (block_q2_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
const block_q3_K * bx0 = (block_q3_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
const block_q4_K * bx0 = (block_q4_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
const block_q5_K * bx0 = (block_q5_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
const int * scales = (int *) bxi->scales;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
const block_q6_K * bx0 = (block_q6_K *) vx;
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
^
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
^
85 warnings generated when compiling for host.
ninja: build stopped: subcommand failed.
Traceback (most recent call last):
File "/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/setuptools_wrap.py", line 674, in setup
cmkr.make(make_args, install_target=cmake_install_target, env=env)
File "/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/cmaker.py", line 697, in make
self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
File "/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/cmaker.py", line 742, in make_impl
raise SKBuildError(msg)
An error occurred while building with CMake.
Command:
/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/cmake/data/bin/cmake --build . --target install --config Release --
Install target:
install
Source directory:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5
Working directory:
/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-build
Please check the install target is valid and see CMake's output for more information.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for ctransformers
Failed to build ctransformers
ERROR: Could not build wheels for ctransformers, which is required to install pyproject.toml-based projects
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment