rgerganov/build.md

## build.md

      
    Raw
  

              build.md
            
          
    ➜  build-android2 git:(master) ✗ make -j
[  1%] Generating build details from Git
[  2%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  3%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  4%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[  5%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
-- Found Git: /usr/bin/git (found version "2.25.1") 
Scanning dependencies of target build_info
[  6%] Building CXX object common/CMakeFiles/build_info.dir/build-info.cpp.o
[  6%] Built target build_info
/opt/src/llama.cpp/ggml.c:1203:5: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
    GGML_F16_VEC_REDUCE(sumf, sum);
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/src/llama.cpp/ggml.c:730:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/opt/src/llama.cpp/ggml.c:720:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/opt/src/llama.cpp/ggml.c:650:11: note: expanded from macro 'GGML_F32x4_REDUCE'
    res = GGML_F32x4_REDUCE_ONE(x[0]);         \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/src/llama.cpp/ggml.c:635:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
/opt/src/llama.cpp/ggml.c:1251:9: warning: implicit conversion increases floating-point precision: 'float32_t' (aka 'float') to 'ggml_float' (aka 'double') [-Wdouble-promotion]
        GGML_F16_VEC_REDUCE(sumf[k], sum[k]);
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/src/llama.cpp/ggml.c:730:41: note: expanded from macro 'GGML_F16_VEC_REDUCE'
    #define GGML_F16_VEC_REDUCE         GGML_F32Cx4_REDUCE
                                        ^
/opt/src/llama.cpp/ggml.c:720:38: note: expanded from macro 'GGML_F32Cx4_REDUCE'
    #define GGML_F32Cx4_REDUCE       GGML_F32x4_REDUCE
                                     ^
/opt/src/llama.cpp/ggml.c:650:11: note: expanded from macro 'GGML_F32x4_REDUCE'
    res = GGML_F32x4_REDUCE_ONE(x[0]);         \
        ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~
/opt/src/llama.cpp/ggml.c:635:34: note: expanded from macro 'GGML_F32x4_REDUCE_ONE'
#define GGML_F32x4_REDUCE_ONE(x) vaddvq_f32(x)
                                 ^~~~~~~~~~~~~
2 warnings generated.
[  6%] Built target ggml
[  7%] Linking C static library libggml_static.a
[  8%] Building CXX object CMakeFiles/llama.dir/llama.cpp.o
[  8%] Built target ggml_static
[  9%] Linking CXX static library libllama.a
[  9%] Built target llama
[ 11%] Building CXX object common/CMakeFiles/common.dir/common.cpp.o
[ 11%] Building CXX object examples/benchmark/CMakeFiles/benchmark.dir/benchmark-matmult.cpp.o
[ 12%] Building CXX object examples/quantize-stats/CMakeFiles/quantize-stats.dir/quantize-stats.cpp.o
[ 13%] Building CXX object common/CMakeFiles/common.dir/grammar-parser.cpp.o
[ 14%] Building CXX object common/CMakeFiles/common.dir/console.cpp.o
[ 15%] Building CXX object examples/llava/CMakeFiles/llava.dir/llava.cpp.o
[ 16%] Building C object tests/CMakeFiles/test-c.dir/test-c.c.o
[ 17%] Building CXX object examples/llava/CMakeFiles/llava.dir/clip.cpp.o
[ 18%] Building CXX object common/CMakeFiles/common.dir/sampling.cpp.o
[ 19%] Building CXX object examples/quantize/CMakeFiles/quantize.dir/quantize.cpp.o
[ 20%] Building CXX object common/CMakeFiles/common.dir/train.cpp.o
[ 21%] Linking CXX executable ../bin/test-c
[ 21%] Built target test-c
[ 22%] Linking CXX executable ../../bin/quantize
[ 22%] Built target quantize
[ 23%] Linking CXX executable ../../bin/benchmark
[ 23%] Built target benchmark
[ 25%] Linking CXX executable ../../bin/quantize-stats
[ 25%] Built target quantize-stats
[ 26%] Linking CXX static library libcommon.a
[ 26%] Built target common
[ 27%] Building CXX object tests/CMakeFiles/test-quantize-fns.dir/test-quantize-fns.cpp.o
[ 28%] Building CXX object tests/CMakeFiles/test-quantize-perf.dir/test-quantize-perf.cpp.o
[ 29%] Building CXX object tests/CMakeFiles/test-backend-ops.dir/test-backend-ops.cpp.o
[ 32%] Building CXX object tests/CMakeFiles/test-tokenizer-1-llama.dir/test-tokenizer-1-llama.cpp.o
[ 32%] Building CXX object tests/CMakeFiles/test-grammar-parser.dir/test-grammar-parser.cpp.o
[ 32%] Building CXX object tests/CMakeFiles/test-sampling.dir/test-sampling.cpp.o
[ 33%] Building CXX object tests/CMakeFiles/test-llama-grammar.dir/test-llama-grammar.cpp.o
[ 34%] Building CXX object tests/CMakeFiles/test-tokenizer-1-bpe.dir/test-tokenizer-1-bpe.cpp.o
[ 35%] Building CXX object tests/CMakeFiles/test-tokenizer-0-falcon.dir/test-tokenizer-0-falcon.cpp.o
[ 36%] Building CXX object tests/CMakeFiles/test-tokenizer-0-llama.dir/test-tokenizer-0-llama.cpp.o
[ 37%] Building CXX object tests/CMakeFiles/test-grad0.dir/test-grad0.cpp.o
[ 38%] Building CXX object tests/CMakeFiles/test-rope.dir/test-rope.cpp.o
[ 39%] Building CXX object examples/batched/CMakeFiles/batched.dir/batched.cpp.o
[ 40%] Building CXX object examples/baby-llama/CMakeFiles/baby-llama.dir/baby-llama.cpp.o
[ 41%] Building CXX object examples/embedding/CMakeFiles/embedding.dir/embedding.cpp.o
[ 42%] Building CXX object examples/beam-search/CMakeFiles/beam-search.dir/beam-search.cpp.o
[ 44%] Building CXX object examples/convert-llama2c-to-ggml/CMakeFiles/convert-llama2c-to-ggml.dir/convert-llama2c-to-ggml.cpp.o
[ 44%] Building CXX object examples/batched-bench/CMakeFiles/batched-bench.dir/batched-bench.cpp.o
[ 45%] Building CXX object examples/finetune/CMakeFiles/finetune.dir/finetune.cpp.o
[ 46%] Building CXX object examples/infill/CMakeFiles/infill.dir/infill.cpp.o
[ 47%] Building CXX object examples/llama-bench/CMakeFiles/llama-bench.dir/llama-bench.cpp.o
[ 48%] Building CXX object examples/main/CMakeFiles/main.dir/main.cpp.o
[ 50%] Building CXX object examples/tokenize/CMakeFiles/tokenize.dir/tokenize.cpp.o
[ 51%] Building CXX object examples/parallel/CMakeFiles/parallel.dir/parallel.cpp.o
[ 52%] Building CXX object examples/perplexity/CMakeFiles/perplexity.dir/perplexity.cpp.o
[ 53%] Building CXX object examples/save-load-state/CMakeFiles/save-load-state.dir/save-load-state.cpp.o
[ 54%] Building CXX object examples/simple/CMakeFiles/simple.dir/simple.cpp.o
[ 55%] Building CXX object examples/speculative/CMakeFiles/speculative.dir/speculative.cpp.o
[ 56%] Building CXX object examples/train-text-from-scratch/CMakeFiles/train-text-from-scratch.dir/train-text-from-scratch.cpp.o
[ 57%] Building CXX object examples/lookahead/CMakeFiles/lookahead.dir/lookahead.cpp.o
[ 58%] Building CXX object examples/export-lora/CMakeFiles/export-lora.dir/export-lora.cpp.o
[ 59%] Building CXX object pocs/vdot/CMakeFiles/vdot.dir/vdot.cpp.o
[ 60%] Building CXX object pocs/vdot/CMakeFiles/q8dot.dir/q8dot.cpp.o
[ 61%] Linking CXX executable ../bin/test-sampling
[ 62%] Linking CXX executable ../bin/test-rope
[ 63%] Linking CXX executable ../bin/test-quantize-fns
[ 63%] Built target test-sampling
[ 63%] Built target test-quantize-fns
[ 63%] Built target test-rope
[ 64%] Linking CXX executable ../bin/test-grammar-parser
[ 65%] Linking CXX executable ../../bin/beam-search
[ 65%] Built target test-grammar-parser
[ 66%] Linking CXX executable ../../bin/tokenize
[ 66%] Built target beam-search
[ 67%] Linking CXX executable ../bin/test-grad0
[ 68%] Linking CXX executable ../../bin/vdot
[ 68%] Built target tokenize
[ 69%] Linking CXX executable ../../bin/embedding
[ 69%] Built target test-grad0
[ 69%] Built target vdot
[ 70%] Linking CXX executable ../../bin/baby-llama
[ 70%] Built target embedding
[ 71%] Linking CXX executable ../../bin/save-load-state
[ 72%] Linking CXX executable ../../bin/batched
[ 72%] Built target baby-llama
[ 72%] Built target save-load-state
[ 73%] Linking CXX executable ../../bin/q8dot
[ 73%] Built target llava
[ 75%] Linking CXX executable ../bin/test-tokenizer-0-falcon
[ 76%] Linking CXX executable ../../bin/batched-bench
[ 77%] Building CXX object examples/llava/CMakeFiles/llava-cli.dir/llava-cli.cpp.o
[ 77%] Built target batched
[ 78%] Linking CXX executable ../../bin/simple
[ 79%] Linking CXX static library libllava_static.a
[ 80%] Building CXX object examples/server/CMakeFiles/server.dir/server.cpp.o
[ 80%] Built target q8dot
[ 81%] Linking CXX executable ../bin/test-tokenizer-0-llama
[ 81%] Built target test-tokenizer-0-falcon
[ 81%] Built target llava_static
[ 81%] Built target batched-bench
[ 82%] Linking CXX executable ../bin/test-quantize-perf
[ 82%] Built target simple
[ 82%] Built target test-tokenizer-0-llama
[ 82%] Built target test-quantize-perf
[ 83%] Linking CXX executable ../../bin/speculative
[ 84%] Linking CXX executable ../../bin/export-lora
[ 85%] Linking CXX executable ../../bin/parallel
[ 86%] Linking CXX executable ../../bin/infill
[ 86%] Built target export-lora
[ 86%] Built target speculative
[ 86%] Built target infill
[ 86%] Built target parallel
[ 87%] Linking CXX executable ../../bin/lookahead
[ 88%] Linking CXX executable ../../bin/convert-llama2c-to-ggml
[ 88%] Built target lookahead
[ 88%] Built target convert-llama2c-to-ggml
[ 89%] Linking CXX executable ../../bin/perplexity
[ 90%] Linking CXX executable ../../bin/llava-cli
[ 91%] Linking CXX executable ../../bin/train-text-from-scratch
[ 91%] Built target perplexity
[ 91%] Built target llava-cli
[ 91%] Built target train-text-from-scratch
[ 92%] Linking CXX executable ../../bin/main
[ 92%] Built target main
[ 93%] Linking CXX executable ../../bin/finetune
[ 93%] Built target finetune
[ 94%] Linking CXX executable ../bin/test-tokenizer-1-llama
[ 95%] Linking CXX executable ../bin/test-backend-ops
[ 95%] Built target test-tokenizer-1-llama
[ 95%] Built target test-backend-ops
[ 96%] Linking CXX executable ../bin/test-tokenizer-1-bpe
[ 96%] Built target test-tokenizer-1-bpe
[ 97%] Linking CXX executable ../../bin/llama-bench
[ 97%] Built target llama-bench
[ 98%] Linking CXX executable ../bin/test-llama-grammar
[ 98%] Built target test-llama-grammar
[100%] Linking CXX executable ../../bin/server
[100%] Built target server


## results.md

      
    Raw
  

              results.md
            
          
    husky:/data/local/tmp $ uname -a
Linux localhost 5.15.110-android14-11-gcc48824eebe8-ab10865596 #1 SMP PREEMPT Tue Sep 26 19:57:58 UTC 2023 aarch64 Toybox
husky:/data/local/tmp $ getprop ro.product.model
Pixel 8 Pro
husky:/data/local/tmp $ ./llama-bench -m ./tinyllama-1b/ggml-model-q4_0.gguf -m ./tinyllama-1b/ggml-model-q8_0.gguf -p 512 -n 128 -ngl 99
| model                          |       size |     params | backend    |    threads | test       |              t/s |
| ------------------------------ | ---------: | ---------: | ---------- | ---------: | ---------- | ---------------: |
| llama 1B Q4_0                  | 606.53 MiB |     1.10 B | CPU        |          4 | pp 512     |     34.96 ± 0.22 |
| llama 1B Q4_0                  | 606.53 MiB |     1.10 B | CPU        |          4 | tg 128     |     19.78 ± 0.19 |
| llama 1B Q8_0                  |   1.09 GiB |     1.10 B | CPU        |          4 | pp 512     |     34.97 ± 5.30 |
| llama 1B Q8_0                  |   1.09 GiB |     1.10 B | CPU        |          4 | tg 128     |     13.73 ± 0.04 |

build: 2994f0c (1656)