Skip to content

Instantly share code, notes, and snippets.

@chauhang
Last active April 7, 2024 21:08
Show Gist options
  • Save chauhang/ca75857c6a152df65b79302fefa1fe2c to your computer and use it in GitHub Desktop.
Save chauhang/ca75857c6a152df65b79302fefa1fe2c to your computer and use it in GitHub Desktop.
executorch llama2

Initial failures on base model downloads pytorch/executorch#2907

Command

python -m examples.models.llama2.export_llama --checkpoint $MODEL_PATH/consolidated.00.pth --params $MODEL_PATH/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32

Error

Could not import fairseq2 modules.
INFO:root:Loading model with checkpoint=/Users/gchauhan/dev/llama-fast/checkpoints/meta-llama/Llama-2-7b/consolidated.00.pth, params=/Users/gchauhan/dev/llama-fast/checkpoints/meta-llama/Llama-2-7b/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 408, in export_llama
    return _export_llama(modelname, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 529, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 486, in _prepare_for_llama_export
    load_llama_model(
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/builder.py", line 83, in load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/model.py", line 139, in __init__
    self.model_ = Transformer(model_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 418, in __init__
    self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 143, in __init__
    self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 4096]
@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Rerun by generating tokenizer.bin

Generate tokenizer.bin

(steps not clear in readme -- mentions for Stories model but not for Llama2 model)

python -m examples.models.llama2.tokenizer.tokenizer -t ~/dev/models/checkpoints/meta-llama/Llama-2-7b/tokenizer.model -o tokenizer.bin

Run llama model

 cmake-out/examples/models/llama2/llama_main --model_path=xnnpack_llama2.pte --tokenizer_path=tokenizer.bin --prompt="Abrahim Lincoln"

Success with output (speed was very slow)

I 00:00:00.000199 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.000212 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.000215 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 0
I 00:00:00.000217 executorch:main.cpp:65] Resetting threadpool with num threads = 10
I 00:00:00.000415 executorch:runner.cpp:49] Creating LLaMa runner: model_path=xnnpack_llama2.pte, tokenizer_path=tokenizer.bin
I 00:00:10.375479 executorch:runner.cpp:64] Reading metadata from model
I 00:00:10.375503 executorch:runner.cpp:123] get_vocab_size: 32000
I 00:00:10.375506 executorch:runner.cpp:123] get_bos_id: 1
I 00:00:10.375508 executorch:runner.cpp:123] get_eos_id: 2
I 00:00:10.375509 executorch:runner.cpp:123] get_n_bos: 1
I 00:00:10.375511 executorch:runner.cpp:123] get_n_eos: 1
I 00:00:10.375514 executorch:runner.cpp:123] get_max_seq_len: 128
I 00:00:10.375516 executorch:runner.cpp:123] use_kv_cache: 0
I 00:00:10.375518 executorch:runner.cpp:123] use_sdpa_with_kv_cache: 0
I 00:00:10.375519 executorch:runner.cpp:123] append_eos_to_prompt: 0
Abrahim Lincoln’s Trip to New Salem in 1831
ʹThe Life and Times of Abraham Lincolnʹ, by Herndon and Weik, was published in Chicago in 1889, and contained a number of dissertations which were said to have been read at a recent Lincoln Centennial celebration. These dissertations were mostly out of date and full of errors.
In 1894, Herndon and Weik published a second edition of their book, in which they presented an extended account of Lincoln’s life and writings. This second
PyTorchObserver {"prompt_tokens":5,"generated_tokens":122,"model_load_start_ms":1712514333018,"model_load_end_ms":1712514343409,"inference_start_ms":1712514343409,"inference_end_ms":1712514738750,"prompt_eval_end_ms":1712514343803,"first_token_ms":1712514344148,"aggregate_sampling_time_ms":38,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:06:45.731759 executorch:runner.cpp:411] 	Prompt Tokens: 5    Generated Tokens: 122
I 00:06:45.731762 executorch:runner.cpp:417] 	Model Load Time:		10.391000 (seconds)
I 00:06:45.731767 executorch:runner.cpp:427] 	Total inference time:		395.341000 (seconds)		 Rate: 	0.308594 (tokens/second)
I 00:06:45.731769 executorch:runner.cpp:435] 		Prompt evaluation:	0.394000 (seconds)		 Rate: 	12.690355 (tokens/second)
I 00:06:45.731771 executorch:runner.cpp:446] 		Generated 122 tokens:	394.947000 (seconds)		 Rate: 	0.308902 (tokens/second)
I 00:06:45.731772 executorch:runner.cpp:454] 	Time to first generated token:	0.739000 (seconds)
I 00:06:45.731774 executorch:runner.cpp:461] 	Sampling time over 127 tokens:	0.038000 (seconds)

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

For tests for Android - ET android build and steps for building llama2 model ET runner

Commands for building Android version

export ANDROID_NDK=/Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342  
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out-android .

Output

cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out-android .
-- The C compiler identification is Clang 17.0.2
-- The CXX compiler identification is Clang 17.0.2
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Downloading FXdiv to /Users/gchauhan/dev/executorch/cmake-out-android/FXdiv-source (define FXDIV_SOURCE_DIR to avoid it)
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/FXdiv-download
[ 11%] Creating directories for 'fxdiv'
[ 22%] Performing download step (git clone) for 'fxdiv'
Cloning into 'FXdiv-source'...
Already on 'master'
Your branch is up to date with 'origin/master'.
[ 33%] Performing update step for 'fxdiv'
[ 44%] No patch step for 'fxdiv'
[ 55%] No configure step for 'fxdiv'
[ 66%] No build step for 'fxdiv'
[ 77%] No install step for 'fxdiv'
[ 88%] No test step for 'fxdiv'
[100%] Completed 'fxdiv'
[100%] Built target fxdiv
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Using python executable 'python'
-- Resolved buck2 as /Users/gchauhan/dev/executorch/cmake-out-android/buck2-bin/buck2-99e407b49dc432eda0cbddd67ea78346.
-- executorch: Generating source lists
-- executorch: Generating source file list /Users/gchauhan/dev/executorch/cmake-out-android/executorch_srcs.cmake
-- executorch: Using sources file /Users/gchauhan/dev/executorch/cmake-out-android/executorch_srcs.cmake
-- Proceeding with version: 23.5.26.0
-- Looking for strtof_l
-- Looking for strtof_l - found
-- Looking for strtoull_l
-- Looking for strtoull_l - found
-- Looking for realpath
-- Looking for realpath - found
-- CMAKE_CXX_FLAGS: -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security  
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/kernels/portable/functions.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/kernels/portable/functions.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/Functions.h;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: portable_ops_lib
--   KERNEL_LIBS: portable_kernels
--   DEPS: executorch
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/kernels/optimized/optimized-oss.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/kernels/optimized/optimized-oss.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/Functions.h;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: optimized_ops_lib
--   KERNEL_LIBS: optimized_kernels
--   DEPS: executorch
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out-android/configurations/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/cmake-out-android/configurations/merged.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/cmake-out-android/configurations/merged.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out-android/configurations/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out-android/configurations/Functions.h;/Users/gchauhan/dev/executorch/cmake-out-android/configurations/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: optimized_native_cpu_ops_lib
--   KERNEL_LIBS: portable_kernels;optimized_kernels
--   DEPS: executorch
CMake Deprecation Warning at third-party/gflags/CMakeLists.txt:73 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Looking for C++ include unistd.h
-- Looking for C++ include unistd.h - found
-- Looking for C++ include stdint.h
-- Looking for C++ include stdint.h - found
-- Looking for C++ include inttypes.h
-- Looking for C++ include inttypes.h - found
-- Looking for C++ include sys/types.h
-- Looking for C++ include sys/types.h - found
-- Looking for C++ include sys/stat.h
-- Looking for C++ include sys/stat.h - found
-- Looking for C++ include fnmatch.h
-- Looking for C++ include fnmatch.h - found
-- Looking for C++ include stddef.h
-- Looking for C++ include stddef.h - found
-- Check size of uint32_t
-- Check size of uint32_t - done
-- Looking for strtoll
-- Looking for strtoll - found
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
-- Downloading FP16 to /Users/gchauhan/dev/executorch/cmake-out-android/FP16-source (define FP16_SOURCE_DIR to avoid it)
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/FP16-download
[ 11%] Creating directories for 'fp16'
[ 22%] Performing download step (download, verify and extract) for 'fp16'
-- Downloading...
   dst='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/Maratyszcza/FP16/archive/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
-- verifying file...
       file='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
-- Downloading... done
-- extracting...
     src='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
     dst='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-source'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'fp16'
[ 44%] No patch step for 'fp16'
[ 55%] No configure step for 'fp16'
[ 66%] No build step for 'fp16'
[ 77%] No install step for 'fp16'
[ 88%] No test step for 'fp16'
[100%] Completed 'fp16'
[100%] Built target fp16
CMake Deprecation Warning at cmake-out-android/FP16-source/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Downloading PSimd to /Users/gchauhan/dev/executorch/cmake-out-android/psimd-source (define PSIMD_SOURCE_DIR to avoid it)
CMake Deprecation Warning at CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/psimd-download
[ 11%] Creating directories for 'psimd'
[ 22%] Performing download step (git clone) for 'psimd'
Cloning into 'psimd-source'...
Already on 'master'
Your branch is up to date with 'origin/master'.
[ 33%] Performing update step for 'psimd'
[ 44%] No patch step for 'psimd'
[ 55%] No configure step for 'psimd'
[ 66%] No build step for 'psimd'
[ 77%] No install step for 'psimd'
[ 88%] No test step for 'psimd'
[100%] Completed 'psimd'
[100%] Built target psimd
CMake Deprecation Warning at cmake-out-android/psimd-source/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- 
-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : Clang
--   CMAKE_TOOLCHAIN_FILE          : /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/build/cmake/android.toolchain.cmake
--   BUCK2                         : /Users/gchauhan/dev/executorch/cmake-out-android/buck2-bin/buck2-99e407b49dc432eda0cbddd67ea78346
--   PYTHON_EXECUTABLE             : python
--   FLATC_EXECUTABLE              : flatc
--   EXECUTORCH_ENABLE_LOGGING              : 1
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : OFF
--   EXECUTORCH_LOG_LEVEL                   : Info
--   EXECUTORCH_BUILD_ANDROID_JNI           : OFF
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : OFF
--   EXECUTORCH_BUILD_COREML                : OFF
--   EXECUTORCH_BUILD_CUSTOM                : OFF
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : ON
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : ON
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : ON
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : OFF
--   EXECUTORCH_BUILD_FLATC                 : ON
--   EXECUTORCH_BUILD_GFLAGS                : ON
--   EXECUTORCH_BUILD_GTESTS                : OFF
--   EXECUTORCH_BUILD_HOST_TARGETS          : ON
--   EXECUTORCH_BUILD_MPS                   : OFF
--   EXECUTORCH_BUILD_PYBIND                : OFF
--   EXECUTORCH_BUILD_QNN                   : OFF
--   EXECUTORCH_BUILD_OPTIMIZED             : ON
--   EXECUTORCH_BUILD_QUANTIZED             : OFF
--   EXECUTORCH_BUILD_SDK                   : OFF
--   EXECUTORCH_BUILD_SIZE_TEST             : OFF
--   EXECUTORCH_BUILD_XNNPACK               : ON
--   EXECUTORCH_BUILD_VULKAN                : OFF
--   EXECUTORCH_BUILD_PTHREADPOOL           : ON
--   EXECUTORCH_BUILD_CPUINFO               : ON
-- Configuring done (16.5s)
-- Generating done (1.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android
cmake --build cmake-out-android -j16 --target install --config Release

Truncated output

[  0%] Building CXX object third-party/gflags/CMakeFiles/gflags_nothreads_static.dir/src/gflags.cc.o
[  0%] Building CXX object backends/xnnpack/third-party/XNNPACK/CMakeFiles/convolution-test-helpers.dir/test/convolution-test-helpers.cc.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/api.c.o
[  0%] Building CXX object third-party/gflags/CMakeFiles/gflags_nothreads_static.dir/src/gflags_reporting.cc.o
[  0%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernel-utils.dir/src/microkernel-utils.c.o
[  0%] Building CXX object third-party/gflags/CMakeFiles/gflags_nothreads_static.dir/src/gflags_completions.cc.o
[  0%] Building CXX object third-party/flatbuffers/CMakeFiles/flatc.dir/src/reflection.cpp.o
[  0%] Building CXX object third-party/flatbuffers/CMakeFiles/flatc.dir/src/idl_parser.cpp.o
[  0%] Building CXX object third-party/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_text.cpp.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/api.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/init.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/cache.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/cache.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/log.c.o
[  0%] Building CXX object kernels/optimized/CMakeFiles/eigen_blas.dir/third-party/eigen/blas/single.cpp.o
clang++: clang++: clang++clang++: : clang++: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]clang++clang++: warning: 
argument unused during compilation: '-s' [-Wunused-command-line-argument]: 
clang++warning: warning: : warning: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]argument unused during compilation: '-s' [-Wunused-command-line-argument]warning: 
warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]argument unused during compilation: '-s' [-Wunused-command-line-argument]
argument unused during compilation: '-s' [-Wunused-command-line-argument]

argument unused during compilation: '-s' [-Wunused-command-line-argument]
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-16.c.o
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-64.c.o
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-2048.c.o
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/vlog.c.o
[100%] Built target microkernels-all
Install the project...
-- Install configuration: "Release"
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/share/cpuinfo/cpuinfo-config.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libcpuinfo.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/cpuinfo.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/share/cpuinfo/cpuinfo-targets.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/share/cpuinfo/cpuinfo-targets-release.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/pkgconfig/libcpuinfo.pc
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/pthreadpool.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libpthreadpool.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fxdiv.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libportable_kernels.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libportable_ops_lib.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libeigen_blas.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libcpublas.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/liboptimized_kernels.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/liboptimized_ops_lib.a
-- Up-to-date: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libcpublas.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/liboptimized_native_cpu_ops_lib.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libexecutorch.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/cmake/ExecuTorch/executorch-config.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libextension_data_loader.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libextension_module.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/bitcasts.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/fp16.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/psimd.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/__init__.py
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/avx.py
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/avx2.py
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/psimd.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libXNNPACK.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/xnnpack.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/experiments-config.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libxnnpack_backend.a

Build llama for android

cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out-android/examples/models/llama2 \
    examples/models/llama2

etdump library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
bundled_program library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
flatccrt library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
mpsdelegate library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
qnn_executorch_backend library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
vulkan_backend library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
-- executorch: Using source file list /Users/gchauhan/dev/executorch/cmake-out-android/examples/models/llama2/runner/../../../../executorch_srcs.cmake
-- 
-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : Clang
--   CMAKE_TOOLCHAIN_FILE          : /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/build/cmake/android.toolchain.cmake
--   BUCK2                         : 
--   PYTHON_EXECUTABLE             : python
--   FLATC_EXECUTABLE              : 
--   EXECUTORCH_ENABLE_LOGGING              : 
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : 
--   EXECUTORCH_LOG_LEVEL                   : 
--   EXECUTORCH_BUILD_ANDROID_JNI           : 
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : 
--   EXECUTORCH_BUILD_COREML                : 
--   EXECUTORCH_BUILD_CUSTOM                : 
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : 
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : 
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : 
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : 
--   EXECUTORCH_BUILD_FLATC                 : 
--   EXECUTORCH_BUILD_GFLAGS                : 
--   EXECUTORCH_BUILD_GTESTS                : 
--   EXECUTORCH_BUILD_HOST_TARGETS          : 
--   EXECUTORCH_BUILD_MPS                   : 
--   EXECUTORCH_BUILD_PYBIND                : 
--   EXECUTORCH_BUILD_QNN                   : 
--   EXECUTORCH_BUILD_OPTIMIZED             : ON
--   EXECUTORCH_BUILD_QUANTIZED             : 
--   EXECUTORCH_BUILD_SDK                   : 
--   EXECUTORCH_BUILD_SIZE_TEST             : 
--   EXECUTORCH_BUILD_XNNPACK               : 
--   EXECUTORCH_BUILD_VULKAN                : 
--   EXECUTORCH_BUILD_PTHREADPOOL           : 
--   EXECUTORCH_BUILD_CPUINFO               : 
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/examples/models/llama2
cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/runner.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/Users/gchauhan/dev/executorch/kernels/optimized/blas/CPUBlas.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/__/tokenizer/tokenizer.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/__/sampler/sampler.cpp.o
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<unsigned char>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<unsigned char>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<signed char>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<signed char>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<short>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<short>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<int>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<int>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<long>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<long>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<float>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<float>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<double>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<double>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<bool>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<bool>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
16 warnings generated.
[ 66%] Linking CXX static library libllama_runner.a
[ 66%] Built target llama_runner
[ 88%] Building CXX object CMakeFiles/llama_main.dir/main.cpp.o
[ 88%] Building CXX object CMakeFiles/llama_main.dir/Users/gchauhan/dev/executorch/backends/xnnpack/threadpool/cpuinfo_utils.cpp.o
[100%] Linking CXX executable llama_main
[100%] Built target llama_main

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Unable to run on Android Emulator

adb push for 4GB pte file hangs or crashes the emulator

Run model on Android (actual device worked)

Copy files

adb push xnnpack_llama2.pte /data/local/tmp/
adb push tokenizer.bin /data/local/tmp/
adb push cmake-out-android/examples/models/llama2/llama_main /data/local/tmp/

Run model on device

adb shell "cd /data/local/tmp && ./llama_main --model_path ./xnnpack_llama2.pte --tokenizer_path ./tokenizer.bin --prompt "Once upon a time" --seq_len 120"
I 00:00:00.003152 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.003479 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.003550 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 4
I 00:00:00.003586 executorch:main.cpp:65] Resetting threadpool with num threads = 4
I 00:00:00.008603 executorch:runner.cpp:49] Creating LLaMa runner: model_path=./xnnpack_llama2.pte, tokenizer_path=./tokenizer.bin
I 00:00:12.040637 executorch:runner.cpp:64] Reading metadata from model
I 00:00:12.041047 executorch:runner.cpp:123] get_vocab_size: 32000
I 00:00:12.041061 executorch:runner.cpp:123] get_bos_id: 1
I 00:00:12.041077 executorch:runner.cpp:123] get_eos_id: 2
I 00:00:12.041089 executorch:runner.cpp:123] get_n_bos: 1
I 00:00:12.041095 executorch:runner.cpp:123] get_n_eos: 1
I 00:00:12.041100 executorch:runner.cpp:123] get_max_seq_len: 128
I 00:00:12.041105 executorch:runner.cpp:123] use_kv_cache: 0
I 00:00:12.041110 executorch:runner.cpp:123] use_sdpa_with_kv_cache: 0
I 00:00:12.041114 executorch:runner.cpp:123] append_eos_to_prompt: 0
Once upon a time, there was a beautiful city called Baghdad.istration, the Iraqi government is working to remove the names of all Americans and allies in Iraq from the terrorist list. The Iraqi government wants the world to believe that the new Iraq will be a peaceful, democratic place where all the world's people can feel safe. But, if the world's people believe in that new Iraq, they have to believe that the Iraqi government will put the names of all Iraqis who are terrorists on their terrorist listI 00:25:22.676836 executorch:runner.cpp:411] 	Prompt Tokens: 2    Generated Tokens: 117
I 00:25:22.677070 executorch:runner.cpp:417] 	Model Load Time:		12.051000 (seconds)
I 00:25:22.677151 executorch:runner.cpp:427] 	Total inference time:		1510.609000 (seconds)		 Rate: 	0.077452 (tokens/second)
I 00:25:22.677205 executorch:runner.cpp:435] 		Prompt evaluation:	4.939000 (seconds)		 Rate: 	0.404940 (tokens/second)
I 00:25:22.677380 executorch:runner.cpp:446] 		Generated 117 tokens:	1505.670000 (seconds)		 Rate: 	0.077706 (tokens/second)
I 00:25:22.677457 executorch:runner.cpp:454] 	Time to first generated token:	8.448000 (seconds)
I 00:25:22.677507 executorch:runner.cpp:461] 	Sampling time over 119 tokens:	0.136000 (seconds)

PyTorchObserver {"prompt_tokens":2,"generated_tokens":117,"model_load_start_ms":1712522260853,"model_load_end_ms":1712522272904,"inference_start_ms":1712522272904,"inference_end_ms":1712523783513,"prompt_eval_end_ms":1712522277843,"first_token_ms":1712522281352,"aggregate_sampling_time_ms":136,"SCALING_FACTOR_UNITS_PER_SECOND":1000}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment