bhargav/gist:7f8c2984ba32ff99ce8e93433d9059a6 Secret

## gistfile1.txt
CC=/opt/rocm/llvm/bin/clang CXX=/opt/rocm/llvm/bin/clang++ CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers

Collecting ctransformers
  Using cached ctransformers-0.2.27.tar.gz (376 kB)
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting py-cpuinfo<10.0.0,>=9.0.0
  Using cached py_cpuinfo-9.0.0-py3-none-any.whl (22 kB)
Collecting huggingface-hub
  Using cached huggingface_hub-0.17.1-py3-none-any.whl (294 kB)
Collecting fsspec
  Using cached fsspec-2023.9.1-py3-none-any.whl (173 kB)
Collecting packaging>=20.9
  Using cached packaging-23.1-py3-none-any.whl (48 kB)
Collecting pyyaml>=5.1
  Using cached PyYAML-6.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (705 kB)
Collecting typing-extensions>=3.7.4.3
  Using cached typing_extensions-4.7.1-py3-none-any.whl (33 kB)
Collecting filelock
  Using cached filelock-3.12.4-py3-none-any.whl (11 kB)
Collecting requests
  Using cached requests-2.31.0-py3-none-any.whl (62 kB)
Collecting tqdm>=4.42.1
  Using cached tqdm-4.66.1-py3-none-any.whl (78 kB)
Collecting charset-normalizer<4,>=2
  Using cached charset_normalizer-3.2.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (201 kB)
Collecting urllib3<3,>=1.21.1
  Using cached urllib3-2.0.4-py3-none-any.whl (123 kB)
Collecting certifi>=2017.4.17
  Using cached certifi-2023.7.22-py3-none-any.whl (158 kB)
Collecting idna<4,>=2.5
  Using cached idna-3.4-py3-none-any.whl (61 kB)
Building wheels for collected packages: ctransformers
  Building wheel for ctransformers (pyproject.toml): started
  Building wheel for ctransformers (pyproject.toml): finished with status 'error'
  error: subprocess-exited-with-error

  × Building wheel for ctransformers (pyproject.toml) did not run successfully.
  │ exit code: 1
  ╰─> [2479 lines of output]


      --------------------------------------------------------------------------------
      -- Trying 'Ninja' generator
      --------------------------------
      ---------------------------
      ----------------------
      -----------------
      ------------
      -------
      --
      CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
        Compatibility with CMake < 3.5 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.

      Not searching for unused variables given on the command line.

      -- The C compiler identification is Clang 16.0.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- The CXX compiler identification is Clang 16.0.0
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- Configuring done (0.6s)
      -- Generating done (0.0s)
      -- Build files have been written to: /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_cmake_test_compile/build
      --
      -------
      ------------
      -----------------
      ----------------------
      ---------------------------
      --------------------------------
      -- Trying 'Ninja' generator - success
      --------------------------------------------------------------------------------

      Configuring Project
        Working directory:
          /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-build
        Command:
          /tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/cmake/data/bin/cmake /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5 -G Ninja -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja --no-warn-unused-cli -DCMAKE_INSTALL_PREFIX:PATH=/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-install -DPYTHON_VERSION_STRING:STRING=3.10.6 -DSKBUILD:INTERNAL=TRUE -DCMAKE_MODULE_PATH:PATH=/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/resources/cmake -DPYTHON_EXECUTABLE:PATH=/usr/bin/python3 -DPYTHON_INCLUDE_DIR:PATH=/usr/include/python3.10 -DPython_EXECUTABLE:PATH=/usr/bin/python3 -DPython_ROOT_DIR:PATH=/usr -DPython_FIND_REGISTRY:STRING=NEVER -DPython_INCLUDE_DIR:PATH=/usr/include/python3.10 -DPython3_EXECUTABLE:PATH=/usr/bin/python3 -DPython3_ROOT_DIR:PATH=/usr -DPython3_FIND_REGISTRY:STRING=NEVER -DPython3_INCLUDE_DIR:PATH=/usr/include/python3.10 -DCMAKE_MAKE_PROGRAM:FILEPATH=/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/ninja/data/bin/ninja -DCT_HIPBLAS=1 -DCMAKE_BUILD_TYPE:STRING=Release

      Not searching for unused variables given on the command line.
      -- The C compiler identification is Clang 16.0.0
      -- The CXX compiler identification is Clang 16.0.0
      -- Detecting C compiler ABI info
      -- Detecting C compiler ABI info - done
      -- Check for working C compiler: /opt/rocm/llvm/bin/clang - skipped
      -- Detecting C compile features
      -- Detecting C compile features - done
      -- Detecting CXX compiler ABI info
      -- Detecting CXX compiler ABI info - done
      -- Check for working CXX compiler: /opt/rocm/llvm/bin/clang++ - skipped
      -- Detecting CXX compile features
      -- Detecting CXX compile features - done
      -- CT_INSTRUCTIONS: avx2
      -- CT_CUBLAS: OFF
      -- CT_HIPBLAS: 1
      -- CT_METAL: OFF
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD
      -- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
      -- Found Threads: TRUE
      -- x86 detected
      CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
        Compatibility with CMake < 3.5 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.
      Call Stack (most recent call first):
        CMakeLists.txt:177 (find_package)


      -- hip::amdhip64 is SHARED_LIBRARY
      -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS
      -- Performing Test HIP_CLANG_SUPPORTS_PARALLEL_JOBS - Success
      CMake Deprecation Warning at /opt/rocm/lib/cmake/hip/hip-config.cmake:20 (cmake_minimum_required):
        Compatibility with CMake < 3.5 will be removed from a future version of
        CMake.

        Update the VERSION argument <min> value or use a ...<max> suffix to tell
        CMake that the project does not need compatibility with older versions.
      Call Stack (most recent call first):
        /tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/cmake/data/share/cmake-3.27/Modules/CMakeFindDependencyMacro.cmake:76 (find_package)
        /opt/rocm/lib/cmake/hipblas/hipblas-config.cmake:90 (find_dependency)
        CMakeLists.txt:178 (find_package)


      -- hip::amdhip64 is SHARED_LIBRARY
      -- HIP and hipBLAS found
      -- Configuring done (0.9s)
      -- Generating done (0.0s)
      -- Build files have been written to: /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-build
      [1/8] Building C object CMakeFiles/ctransformers.dir/models/ggml/ggml-alloc.c.o
      [2/8] Building C object CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o
      FAILED: CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o
      /opt/rocm/llvm/bin/clang -DCC_TURING=1000000000 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMQ_Y=64 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dctransformers_EXPORTS -I/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models -isystem /opt/rocm/include -isystem /opt/rocm-5.6.0/include -O3 -DNDEBUG -std=gnu11 -fPIC -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -mfma -mavx2 -mf16c -mavx -MD -MT CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o -MF CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o.d -o CMakeFiles/ctransformers.dir/models/ggml/ggml.c.o -c /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:252:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.h:46:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:37: error: array has incomplete element type 'struct cudaDeviceProp'
        struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:10: note: forward declaration of 'struct cudaDeviceProp'
        struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2413:5: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
          GGML_F16_VEC_REDUCE(sumf, sum);
          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2045:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
      #define GGML_F16_VEC_REDUCE         GGML_F32Cx8_REDUCE
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2035:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
      #define GGML_F32Cx8_REDUCE      GGML_F32x8_REDUCE
                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:1981:11: note: expanded from macro 'GGML_F32x8_REDUCE'
          res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1));                     \
              ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:3456:9: warning: implicit conversion increases floating-point precision: 'float' to 'ggml_float' (aka 'double') [-Wdouble-promotion]
              GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
              ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2045:37: note: expanded from macro 'GGML_F16_VEC_REDUCE'
      #define GGML_F16_VEC_REDUCE         GGML_F32Cx8_REDUCE
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:2035:33: note: expanded from macro 'GGML_F32Cx8_REDUCE'
      #define GGML_F32Cx8_REDUCE      GGML_F32x8_REDUCE
                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml.c:1981:11: note: expanded from macro 'GGML_F32x8_REDUCE'
          res = _mm_cvtss_f32(_mm_hadd_ps(t1, t1));                     \
              ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      2 warnings and 1 error generated.
      [3/8] Building C object CMakeFiles/ctransformers.dir/models/ggml/k_quants.c.o
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/k_quants.c:186:11: warning: variable 'sum_x' set but not used [-Wunused-but-set-variable]
          float sum_x = 0;
                ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/k_quants.c:187:11: warning: variable 'sum_x2' set but not used [-Wunused-but-set-variable]
          float sum_x2 = 0;
                ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/k_quants.c:182:14: warning: unused function 'make_qkx1_quants' [-Wunused-function]
      static float make_qkx1_quants(int n, int nmax, const float * restrict x, uint8_t * restrict L, float * restrict the_min,
                   ^
      3 warnings generated.
      [4/8] Building CXX object CMakeFiles/ctransformers.dir/models/llm.cc.o
      FAILED: CMakeFiles/ctransformers.dir/models/llm.cc.o
      /opt/rocm/llvm/bin/clang++ -DCC_TURING=1000000000 -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMQ_Y=64 -DGGML_CUDA_MMV_Y=1 -DGGML_USE_CUBLAS -DGGML_USE_HIPBLAS -DGGML_USE_K_QUANTS -DK_QUANTS_PER_ITERATION=2 -D__HIP_PLATFORM_AMD__=1 -D__HIP_PLATFORM_HCC__=1 -Dctransformers_EXPORTS -I/tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models -isystem /opt/rocm/include -isystem /opt/rocm-5.6.0/include -O3 -DNDEBUG -std=gnu++11 -fPIC -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -mfma -mavx2 -mf16c -mavx -mllvm -amdgpu-early-inline-all=true -mllvm -amdgpu-function-calls=false -x hip --offload-arch=gfx900 --offload-arch=gfx906 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 -MD -MT CMakeFiles/ctransformers.dir/models/llm.cc.o -MF CMakeFiles/ctransformers.dir/models/llm.cc.o.d -o CMakeFiles/ctransformers.dir/models/llm.cc.o -c /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:1:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.h:4:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/common.h:24:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.h:46:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:25: error: field has incomplete type 'struct cudaDeviceProp'
        struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda-ggllm.h:15:10: note: forward declaration of 'cudaDeviceProp'
        struct cudaDeviceProp device_props[GGML_CUDA_MAX_DEVICES];
               ^
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:1:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.h:4:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/common.h:239:35: warning: braces around scalar initializer [-Wbraced-scalar-init]
        return ct_new_tensor(ctx, type, {x}, gpu);
                                        ^~~
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:1:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.h:28:58: warning: unused parameter 'add_bos_token' [-Wunused-parameter]
                                                    const bool add_bos_token) const {
                                                               ^
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:7:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:5:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/llama.cpp:6:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/llama.h:31:13: warning: 'DEPRECATED' macro redefined [-Wmacro-redefined]
      #    define DEPRECATED(func, hint) func __attribute__((deprecated(hint)))
                  ^
      /opt/rocm/include/hip/hip_runtime_api.h:494:9: note: previous definition is here
      #define DEPRECATED(msg) __attribute__ ((deprecated(msg)))
              ^
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:7:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:7:51: warning: unused parameter 'level' [-Wunused-parameter]
      static void ct_llama_log_callback(llama_log_level level, const char *text,
                                                        ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:7:70: warning: unused parameter 'text' [-Wunused-parameter]
      static void ct_llama_log_callback(llama_log_level level, const char *text,
                                                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/llama.cc:8:41: warning: unused parameter 'user_data' [-Wunused-parameter]
                                        void *user_data) {}
                                              ^
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:9:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/replit.cc:631:50: warning: unused parameter 'add_bos_token' [-Wunused-parameter]
                                            const bool add_bos_token) const override {
                                                       ^
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llm.cc:15:
      In file included from /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/llms/falcon.cc:5:
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/libfalcon.cpp:18:10: fatal error: 'cuda_runtime.h' file not found
      #include <cuda_runtime.h>
               ^~~~~~~~~~~~~~~~
      7 warnings and 2 errors generated when compiling for gfx1030.
      [5/8] Building CXX object CMakeFiles/ctransformers.dir/models/ggml/cmpnct_unicode.cpp.o
      [6/8] Building CXX object CMakeFiles/ctransformers.dir/models/ggml/ggml-cuda.cu.o
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
          const float * x_dmf = (float *) x_dm;
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
          const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
          float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
                                                                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      85 warnings generated when compiling for gfx1030.
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
          const float * x_dmf = (float *) x_dm;
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
          const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
          float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
                                                                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      85 warnings generated when compiling for gfx900.
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
          const float * x_dmf = (float *) x_dm;
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
          const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
          float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
                                                                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      85 warnings generated when compiling for gfx906.
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
          const float * x_dmf = (float *) x_dm;
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
          const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
          float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
                                                                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      85 warnings generated when compiling for gfx908.
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
          const float * x_dmf = (float *) x_dm;
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
          const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
          float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
                                                                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      85 warnings generated when compiling for gfx90a.
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:166:41: warning: cast from 'const signed char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:176:41: warning: cast from 'const unsigned char *' to 'unsigned short *' drops const qualifier [-Wcast-qual]
          const uint16_t * x16 = (uint16_t *) (x8 + sizeof(int) * i32); // assume at least 2 byte alignment
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:186:22: warning: cast from 'const signed char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:190:22: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
          return *((int *) (x8 + sizeof(int) * i32)); // assume at least 4 byte alignment
                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2047:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2057:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2058:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2108:37: warning: cast from 'const __half2 *' to 'float *' drops const qualifier [-Wcast-qual]
          const float * x_dmf = (float *) x_dm;
                                          ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2104:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2141:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2151:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2152:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2195:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2233:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2243:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2244:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2307:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2347:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_1(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2357:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2358:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2418:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2453:129: warning: unused parameter 'x_sc' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q8_0(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                                      ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2463:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2464:24: warning: unused parameter 'x_sc' [-Wunused-parameter]
          int * __restrict__ x_sc, const int & i_offset, const int & i_max, const int & k, const int & blocks_per_row) {
                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2508:125: warning: unused parameter 'x_sc' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2542:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q2_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2554:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2611:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2767:41: warning: cast from 'const int *' to 'signed char *' drops const qualifier [-Wcast-qual]
          const int8_t * scales = ((int8_t *) (x_sc + i * (WARP_SIZE/4) + i/4 + kbx*4)) + ky/4;
                                              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2881:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q4_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2893:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2962:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3062:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q5_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3074:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3154:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3191:116: warning: unused parameter 'x_qh' [-Wunused-parameter]
      template <int mmq_y> static __device__ __forceinline__ void allocate_tiles_q6_K(int ** x_ql, half2 ** x_dm, int ** x_qh, int ** x_sc) {
                                                                                                                         ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3203:106: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const void * __restrict__ vx, int * __restrict__ x_ql, half2 * __restrict__ x_dm, int * __restrict__ x_qh,
                                                                                                               ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3274:94: warning: unused parameter 'x_qh' [-Wunused-parameter]
          const int * __restrict__ x_ql, const half2 * __restrict__ x_dm, const int * __restrict__ x_qh, const int * __restrict__ x_sc,
                                                                                                   ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:5693:72: warning: unused parameter 'i02' [-Wunused-parameter]
          float * src0_ddf_i, float * src1_ddf_i, float * dst_ddf_i, int64_t i02, int64_t i01_low, int64_t i01_high, int i1,
                                                                             ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, false>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4520:9: note: in instantiation of function template specialization 'mul_mat_q4_0<false>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2068:45: warning: cast from 'const void *' to 'block_q4_0 *' drops const qualifier [-Wcast-qual]
          const block_q4_0 * bx0 = (block_q4_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3423:9: note: in instantiation of function template specialization 'load_tiles_q4_0<64, 8, true>' requested here
              load_tiles_q4_0<mmq_y, nwarps, need_check>, VDR_Q4_0_Q8_1_MMQ, vec_dot_q4_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4524:9: note: in instantiation of function template specialization 'mul_mat_q4_0<true>' requested here
              mul_mat_q4_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, false>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4557:9: note: in instantiation of function template specialization 'mul_mat_q4_1<false>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2162:45: warning: cast from 'const void *' to 'block_q4_1 *' drops const qualifier [-Wcast-qual]
          const block_q4_1 * bx0 = (block_q4_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3461:9: note: in instantiation of function template specialization 'load_tiles_q4_1<64, 8, true>' requested here
              load_tiles_q4_1<mmq_y, nwarps, need_check>, VDR_Q4_1_Q8_1_MMQ, vec_dot_q4_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4561:9: note: in instantiation of function template specialization 'mul_mat_q4_1<true>' requested here
              mul_mat_q4_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, false>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4594:9: note: in instantiation of function template specialization 'mul_mat_q5_0<false>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2254:45: warning: cast from 'const void *' to 'block_q5_0 *' drops const qualifier [-Wcast-qual]
          const block_q5_0 * bx0 = (block_q5_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3495:9: note: in instantiation of function template specialization 'load_tiles_q5_0<64, 8, true>' requested here
              load_tiles_q5_0<mmq_y, nwarps, need_check>, VDR_Q5_0_Q8_1_MMQ, vec_dot_q5_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4598:9: note: in instantiation of function template specialization 'mul_mat_q5_0<true>' requested here
              mul_mat_q5_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, false>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4631:9: note: in instantiation of function template specialization 'mul_mat_q5_1<false>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2368:45: warning: cast from 'const void *' to 'block_q5_1 *' drops const qualifier [-Wcast-qual]
          const block_q5_1 * bx0 = (block_q5_1 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3529:9: note: in instantiation of function template specialization 'load_tiles_q5_1<64, 8, true>' requested here
              load_tiles_q5_1<mmq_y, nwarps, need_check>, VDR_Q5_1_Q8_1_MMQ, vec_dot_q5_1_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4635:9: note: in instantiation of function template specialization 'mul_mat_q5_1<true>' requested here
              mul_mat_q5_1<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, false>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4668:9: note: in instantiation of function template specialization 'mul_mat_q8_0<false>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2475:45: warning: cast from 'const void *' to 'block_q8_0 *' drops const qualifier [-Wcast-qual]
          const block_q8_0 * bx0 = (block_q8_0 *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3563:9: note: in instantiation of function template specialization 'load_tiles_q8_0<64, 8, true>' requested here
              load_tiles_q8_0<mmq_y, nwarps, need_check>, VDR_Q8_0_Q8_1_MMQ, vec_dot_q8_0_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4672:9: note: in instantiation of function template specialization 'mul_mat_q8_0<true>' requested here
              mul_mat_q8_0<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, false>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4705:9: note: in instantiation of function template specialization 'mul_mat_q2_K<false>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2565:45: warning: cast from 'const void *' to 'block_q2_K *' drops const qualifier [-Wcast-qual]
          const block_q2_K * bx0 = (block_q2_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3597:9: note: in instantiation of function template specialization 'load_tiles_q2_K<64, 8, true>' requested here
              load_tiles_q2_K<mmq_y, nwarps, need_check>, VDR_Q2_K_Q8_1_MMQ, vec_dot_q2_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4709:9: note: in instantiation of function template specialization 'mul_mat_q2_K<true>' requested here
              mul_mat_q2_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, false>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4744:9: note: in instantiation of function template specialization 'mul_mat_q3_K<false>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2686:45: warning: cast from 'const void *' to 'block_q3_K *' drops const qualifier [-Wcast-qual]
          const block_q3_K * bx0 = (block_q3_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3635:9: note: in instantiation of function template specialization 'load_tiles_q3_K<64, 8, true>' requested here
              load_tiles_q3_K<mmq_y, nwarps, need_check>, VDR_Q3_K_Q8_1_MMQ, vec_dot_q3_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4748:9: note: in instantiation of function template specialization 'mul_mat_q3_K<true>' requested here
              mul_mat_q3_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, false>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4782:9: note: in instantiation of function template specialization 'mul_mat_q4_K<false>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2904:45: warning: cast from 'const void *' to 'block_q4_K *' drops const qualifier [-Wcast-qual]
          const block_q4_K * bx0 = (block_q4_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3673:9: note: in instantiation of function template specialization 'load_tiles_q4_K<64, 8, true>' requested here
              load_tiles_q4_K<mmq_y, nwarps, need_check>, VDR_Q4_K_Q8_1_MMQ, vec_dot_q4_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4786:9: note: in instantiation of function template specialization 'mul_mat_q4_K<true>' requested here
              mul_mat_q4_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:2949:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, false>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4819:9: note: in instantiation of function template specialization 'mul_mat_q5_K<false>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3085:45: warning: cast from 'const void *' to 'block_q5_K *' drops const qualifier [-Wcast-qual]
          const block_q5_K * bx0 = (block_q5_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3707:9: note: in instantiation of function template specialization 'load_tiles_q5_K<64, 8, true>' requested here
              load_tiles_q5_K<mmq_y, nwarps, need_check>, VDR_Q5_K_Q8_1_MMQ, vec_dot_q5_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4823:9: note: in instantiation of function template specialization 'mul_mat_q5_K<true>' requested here
              mul_mat_q5_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3141:38: warning: cast from 'const unsigned char *' to 'int *' drops const qualifier [-Wcast-qual]
              const int * scales = (int *) bxi->scales;
                                           ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, false>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4856:9: note: in instantiation of function template specialization 'mul_mat_q6_K<false>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3214:45: warning: cast from 'const void *' to 'block_q6_K *' drops const qualifier [-Wcast-qual]
          const block_q6_K * bx0 = (block_q6_K *) vx;
                                                  ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:3745:9: note: in instantiation of function template specialization 'load_tiles_q6_K<64, 8, true>' requested here
              load_tiles_q6_K<mmq_y, nwarps, need_check>, VDR_Q6_K_Q8_1_MMQ, vec_dot_q6_K_q8_1_mul_mat>
              ^
      /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/models/ggml/ggml-cuda.cu:4860:9: note: in instantiation of function template specialization 'mul_mat_q6_K<true>' requested here
              mul_mat_q6_K<need_check><<<block_nums, block_dims, 0, stream>>>
              ^
      85 warnings generated when compiling for host.
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/setuptools_wrap.py", line 674, in setup
          cmkr.make(make_args, install_target=cmake_install_target, env=env)
        File "/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/cmaker.py", line 697, in make
          self.make_impl(clargs=clargs, config=config, source_dir=source_dir, install_target=install_target, env=env)
        File "/tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/skbuild/cmaker.py", line 742, in make_impl
          raise SKBuildError(msg)

      An error occurred while building with CMake.
        Command:
          /tmp/pip-build-env-v3f_z5es/overlay/local/lib/python3.10/dist-packages/cmake/data/bin/cmake --build . --target install --config Release --
        Install target:
          install
        Source directory:
          /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5
        Working directory:
          /tmp/pip-install-7cb4q_dn/ctransformers_4e4f626657934362ba44b5b0332d47c5/_skbuild/linux-x86_64-3.10/cmake-build
      Please check the install target is valid and see CMake's output for more information.

      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for ctransformers
Failed to build ctransformers
ERROR: Could not build wheels for ctransformers, which is required to install pyproject.toml-based projects