hideaki-t/nvblas.md

## nvblas.md

      
    Raw
  

              nvblas.md
            
          
    whisper.cppでもGPU(CUDA)を使う

基本的に ggerganov/whisper.cpp#220 に書いてある通り。
whisper.cppをOpenBLASを使うようにビルドして、実行時にNVBLASを使う。
Arch Linuxでしか試していない。
nvblasブランチをチェックアウト

https://github.com/ggerganov/whisper.cpp/tree/nvblas
場合によってはmasterブランチでも可、masterブランチでもnvblasをつかうを参照
普通にOpenBLASを使う設定でビルド

多分必要なもの

C/C++ compiler
cmake
OpenBLAS (CBLAS)

$ cmake WHISPER_SUPPORT_OPENBLAS=1 .
-- OpenBLAS found
-- CMAKE_SYSTEM_PROCESSOR: x86_64
-- x86 detected
-- Configuring done
-- Generating done
-- Build files have been written to: /home/hideaki/whisper.cpp
$ make
[ 14%] Building C object CMakeFiles/whisper.dir/ggml.c.o
...
[ 28%] Building CXX object CMakeFiles/whisper.dir/whisper.cpp.o
[ 42%] Linking CXX shared library libwhisper.so
[ 42%] Built target whisper
[ 57%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[ 71%] Linking CXX executable ../../bin/main
[ 71%] Built target main
[ 85%] Building CXX object examples/bench/CMakeFiles/bench.dir/bench.cpp.o
[100%] Linking CXX executable ../../bin/bench
[100%] Built target bench
NVBLASを使ってWhipser.cppを起動


実行時のみCUDAが必要

NVBLAS用の設定ファイルを作る

see https://docs.nvidia.com/cuda/nvblas/index.html#configuration-keywords
$ cat nvblas.conf
NVBLAS_LOGFILE nvblas.log
NVBLAS_CPU_BLAS_LIB /usr/lib/libopenblas.so
NVBLAS_GPU_LIST ALL
起動

$ NVBLAS_CONFIG_FILE=./nvblas.conf LD_PRELOAD=/opt/cuda/targets/x86_64-linux/lib/libnvblas.so ./bin/main -l ja -m models/ggml-medium.bin u.wav
whisper_model_load: loading model from 'models/ggml-medium.bin'
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 4
whisper_model_load: adding 1608 extra tokens
whisper_model_load: mem_required  = 2608.00 MB
whisper_model_load: ggml ctx size = 1462.35 MB
whisper_model_load: memory size   =  182.62 MB
whisper_model_load: model size    = 1462.12 MB

system_info: n_threads = 4 / 20 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 |

main: processing 'u.wav' (44077 samples, 2.8 sec), 4 threads, 1 processors, lang = ja, task = transcribe, timestamps = 1 ...

[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to './nvblas.conf'

[00:00:00.000 --> 00:00:02.000]  こんにちは


whisper_print_timings:     load time =   919.52 ms
whisper_print_timings:      mel time =    11.68 ms
whisper_print_timings:   sample time =     0.54 ms
whisper_print_timings:   encode time =  6856.67 ms / 285.69 ms per layer
whisper_print_timings:   decode time =   186.09 ms / 7.75 ms per layer
whisper_print_timings:    total time =  7975.28 ms
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to './nvblas.conf'が表示されれば多分成功。
確認するには、NVBLAS_TRACE_LOG_ENABLED=1 を設定ファイルか起動時に指定するのが確実。BLAS呼び出しがログに残る。
$ head -n 5 nvblas.log
[NVBLAS] Using devices :0
[NVBLAS] Config parsed
[NVBLAS] sgemm[gpu]: ta=T, tb=N, m=1024, n=1500, k=1024
[NVBLAS] sgemm[gpu]: ta=T, tb=N, m=1024, n=1500, k=1024
[NVBLAS] sgemm[gpu]: ta=T, tb=N, m=1024, n=1500, k=1024
masterブランチでもNVBLASをつかう

環境によっては、OpenBLASとCBLASが別にビルドされていて(i.e. OpenBLASがNO_CBLAS=1でビルドされている)、
CBLASがOpenBLASを呼ぶ形になっていると、masterブランチのものもでもNVBLASを使って実行できる。
Arch Linuxならopenblasとcblasを使えば可能で、AURからopenblas-lapackとか持ってくると、libopenblas.soがCBLASも含むのでダメっぽい。
この場合libopenblas.soとlibcblas.soをリンクしないといけないので、簡単なパッチが必要。
diff --git a/CMakeLists.txt b/CMakeLists.txt
index b02b854..85db298 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -100,12 +100,16 @@ if (APPLE AND NOT WHISPER_NO_ACCELERATE)
 endif()

 if (WHISPER_SUPPORT_OPENBLAS)
+    find_library(CBLAS_LIB
+        NAMES cblas libcblas
+        )
     find_library(OPENBLAS_LIB
         NAMES openblas libopenblas
         )
-    if (OPENBLAS_LIB)
+    if (OPENBLAS_LIB AND CBLAS_LIB)
         message(STATUS "OpenBLAS found")

+        set(WHISPER_EXTRA_LIBS  ${WHISPER_EXTRA_LIBS}  ${CBLAS_LIB})
         set(WHISPER_EXTRA_LIBS  ${WHISPER_EXTRA_LIBS}  ${OPENBLAS_LIB})
         set(WHISPER_EXTRA_FLAGS ${WHISPER_EXTRA_FLAGS} -DGGML_USE_OPENBLAS)
     else()
ggerganov/whisper.cpp#220 にあるとおり、NVBLASはCBLAS呼び出しは乗っ取れない。
ただ、考えてみるとCBLASが単純に対応するBLASのルーチンを呼ぶだけなら乗っ取れるのでは?と思って試したらできた。