Skip to content

Instantly share code, notes, and snippets.

@chauhang
Last active April 7, 2024 21:08
Show Gist options
  • Save chauhang/ca75857c6a152df65b79302fefa1fe2c to your computer and use it in GitHub Desktop.
Save chauhang/ca75857c6a152df65b79302fefa1fe2c to your computer and use it in GitHub Desktop.
executorch llama2

Initial failures on base model downloads pytorch/executorch#2907

Command

python -m examples.models.llama2.export_llama --checkpoint $MODEL_PATH/consolidated.00.pth --params $MODEL_PATH/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32

Error

Could not import fairseq2 modules.
INFO:root:Loading model with checkpoint=/Users/gchauhan/dev/llama-fast/checkpoints/meta-llama/Llama-2-7b/consolidated.00.pth, params=/Users/gchauhan/dev/llama-fast/checkpoints/meta-llama/Llama-2-7b/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 408, in export_llama
    return _export_llama(modelname, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 529, in _export_llama
    builder_exported_to_edge = _prepare_for_llama_export(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 486, in _prepare_for_llama_export
    load_llama_model(
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/builder.py", line 83, in load_llama_model
    model, example_inputs, _ = EagerModelFactory.create_model(
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/model_factory.py", line 44, in create_model
    model = model_class(**kwargs)
            ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/model.py", line 139, in __init__
    self.model_ = Transformer(model_args)
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 418, in __init__
    self.tok_embeddings = nn.Embedding(params.vocab_size, params.dim)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/sparse.py", line 143, in __init__
    self.weight = Parameter(torch.empty((num_embeddings, embedding_dim), **factory_kwargs),
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/utils/_device.py", line 78, in __torch_function__
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Trying to create tensor with negative dimension -1: [-1, 4096]
@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

After updating params.json with vocab_size = 32000

python -m examples.models.llama2.export_llama --checkpoint $MODEL_PATH/consolidated.00.pth --params $MODEL_PATH/params.json -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32

Error

Could not import fairseq2 modules.
INFO:root:Loading model with checkpoint=/Users/gchauhan/dev/models/checkpoints/meta-llama/Llama-2-7b/consolidated.00.pth, params=/Users/gchauhan/dev/models/checkpoints/meta-llama/Llama-2-7b/params.json, use_kv_cache=True, weight_type=WeightType.LLAMA
INFO:root:Loaded model with dtype=torch.bfloat16
INFO:datasets:PyTorch version 2.4.0.dev20240324 available.
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5.67k/5.67k [00:00<00:00, 9.04MB/s]
linear: layers.0.attention.wq, in=4096, out=4096
linear: layers.0.attention.wk, in=4096, out=4096
linear: layers.0.attention.wv, in=4096, out=4096
linear: layers.0.attention.wo, in=4096, out=4096
linear: layers.0.feed_forward.w1, in=4096, out=11008
linear: layers.0.feed_forward.w2, in=11008, out=4096
linear: layers.0.feed_forward.w3, in=4096, out=11008
linear: layers.1.attention.wq, in=4096, out=4096
linear: layers.1.attention.wk, in=4096, out=4096
linear: layers.1.attention.wv, in=4096, out=4096
linear: layers.1.attention.wo, in=4096, out=4096
linear: layers.1.feed_forward.w1, in=4096, out=11008
linear: layers.1.feed_forward.w2, in=11008, out=4096
linear: layers.1.feed_forward.w3, in=4096, out=11008
linear: layers.2.attention.wq, in=4096, out=4096
linear: layers.2.attention.wk, in=4096, out=4096
linear: layers.2.attention.wv, in=4096, out=4096
linear: layers.2.attention.wo, in=4096, out=4096
linear: layers.2.feed_forward.w1, in=4096, out=11008
linear: layers.2.feed_forward.w2, in=11008, out=4096
linear: layers.2.feed_forward.w3, in=4096, out=11008
linear: layers.3.attention.wq, in=4096, out=4096
linear: layers.3.attention.wk, in=4096, out=4096
linear: layers.3.attention.wv, in=4096, out=4096
linear: layers.3.attention.wo, in=4096, out=4096
linear: layers.3.feed_forward.w1, in=4096, out=11008
linear: layers.3.feed_forward.w2, in=11008, out=4096
linear: layers.3.feed_forward.w3, in=4096, out=11008
linear: layers.4.attention.wq, in=4096, out=4096
linear: layers.4.attention.wk, in=4096, out=4096
linear: layers.4.attention.wv, in=4096, out=4096
linear: layers.4.attention.wo, in=4096, out=4096
linear: layers.4.feed_forward.w1, in=4096, out=11008
linear: layers.4.feed_forward.w2, in=11008, out=4096
linear: layers.4.feed_forward.w3, in=4096, out=11008
linear: layers.5.attention.wq, in=4096, out=4096
linear: layers.5.attention.wk, in=4096, out=4096
linear: layers.5.attention.wv, in=4096, out=4096
linear: layers.5.attention.wo, in=4096, out=4096
linear: layers.5.feed_forward.w1, in=4096, out=11008
linear: layers.5.feed_forward.w2, in=11008, out=4096
linear: layers.5.feed_forward.w3, in=4096, out=11008
linear: layers.6.attention.wq, in=4096, out=4096
linear: layers.6.attention.wk, in=4096, out=4096
linear: layers.6.attention.wv, in=4096, out=4096
linear: layers.6.attention.wo, in=4096, out=4096
linear: layers.6.feed_forward.w1, in=4096, out=11008
linear: layers.6.feed_forward.w2, in=11008, out=4096
linear: layers.6.feed_forward.w3, in=4096, out=11008
linear: layers.7.attention.wq, in=4096, out=4096
linear: layers.7.attention.wk, in=4096, out=4096
linear: layers.7.attention.wv, in=4096, out=4096
linear: layers.7.attention.wo, in=4096, out=4096
linear: layers.7.feed_forward.w1, in=4096, out=11008
linear: layers.7.feed_forward.w2, in=11008, out=4096
linear: layers.7.feed_forward.w3, in=4096, out=11008
linear: layers.8.attention.wq, in=4096, out=4096
linear: layers.8.attention.wk, in=4096, out=4096
linear: layers.8.attention.wv, in=4096, out=4096
linear: layers.8.attention.wo, in=4096, out=4096
linear: layers.8.feed_forward.w1, in=4096, out=11008
linear: layers.8.feed_forward.w2, in=11008, out=4096
linear: layers.8.feed_forward.w3, in=4096, out=11008
linear: layers.9.attention.wq, in=4096, out=4096
linear: layers.9.attention.wk, in=4096, out=4096
linear: layers.9.attention.wv, in=4096, out=4096
linear: layers.9.attention.wo, in=4096, out=4096
linear: layers.9.feed_forward.w1, in=4096, out=11008
linear: layers.9.feed_forward.w2, in=11008, out=4096
linear: layers.9.feed_forward.w3, in=4096, out=11008
linear: layers.10.attention.wq, in=4096, out=4096
linear: layers.10.attention.wk, in=4096, out=4096
linear: layers.10.attention.wv, in=4096, out=4096
linear: layers.10.attention.wo, in=4096, out=4096
linear: layers.10.feed_forward.w1, in=4096, out=11008
linear: layers.10.feed_forward.w2, in=11008, out=4096
linear: layers.10.feed_forward.w3, in=4096, out=11008
linear: layers.11.attention.wq, in=4096, out=4096
linear: layers.11.attention.wk, in=4096, out=4096
linear: layers.11.attention.wv, in=4096, out=4096
linear: layers.11.attention.wo, in=4096, out=4096
linear: layers.11.feed_forward.w1, in=4096, out=11008
linear: layers.11.feed_forward.w2, in=11008, out=4096
linear: layers.11.feed_forward.w3, in=4096, out=11008
linear: layers.12.attention.wq, in=4096, out=4096
linear: layers.12.attention.wk, in=4096, out=4096
linear: layers.12.attention.wv, in=4096, out=4096
linear: layers.12.attention.wo, in=4096, out=4096
linear: layers.12.feed_forward.w1, in=4096, out=11008
linear: layers.12.feed_forward.w2, in=11008, out=4096
linear: layers.12.feed_forward.w3, in=4096, out=11008
linear: layers.13.attention.wq, in=4096, out=4096
linear: layers.13.attention.wk, in=4096, out=4096
linear: layers.13.attention.wv, in=4096, out=4096
linear: layers.13.attention.wo, in=4096, out=4096
linear: layers.13.feed_forward.w1, in=4096, out=11008
linear: layers.13.feed_forward.w2, in=11008, out=4096
linear: layers.13.feed_forward.w3, in=4096, out=11008
linear: layers.14.attention.wq, in=4096, out=4096
linear: layers.14.attention.wk, in=4096, out=4096
linear: layers.14.attention.wv, in=4096, out=4096
linear: layers.14.attention.wo, in=4096, out=4096
linear: layers.14.feed_forward.w1, in=4096, out=11008
linear: layers.14.feed_forward.w2, in=11008, out=4096
linear: layers.14.feed_forward.w3, in=4096, out=11008
linear: layers.15.attention.wq, in=4096, out=4096
linear: layers.15.attention.wk, in=4096, out=4096
linear: layers.15.attention.wv, in=4096, out=4096
linear: layers.15.attention.wo, in=4096, out=4096
linear: layers.15.feed_forward.w1, in=4096, out=11008
linear: layers.15.feed_forward.w2, in=11008, out=4096
linear: layers.15.feed_forward.w3, in=4096, out=11008
linear: layers.16.attention.wq, in=4096, out=4096
linear: layers.16.attention.wk, in=4096, out=4096
linear: layers.16.attention.wv, in=4096, out=4096
linear: layers.16.attention.wo, in=4096, out=4096
linear: layers.16.feed_forward.w1, in=4096, out=11008
linear: layers.16.feed_forward.w2, in=11008, out=4096
linear: layers.16.feed_forward.w3, in=4096, out=11008
linear: layers.17.attention.wq, in=4096, out=4096
linear: layers.17.attention.wk, in=4096, out=4096
linear: layers.17.attention.wv, in=4096, out=4096
linear: layers.17.attention.wo, in=4096, out=4096
linear: layers.17.feed_forward.w1, in=4096, out=11008
linear: layers.17.feed_forward.w2, in=11008, out=4096
linear: layers.17.feed_forward.w3, in=4096, out=11008
linear: layers.18.attention.wq, in=4096, out=4096
linear: layers.18.attention.wk, in=4096, out=4096
linear: layers.18.attention.wv, in=4096, out=4096
linear: layers.18.attention.wo, in=4096, out=4096
linear: layers.18.feed_forward.w1, in=4096, out=11008
linear: layers.18.feed_forward.w2, in=11008, out=4096
linear: layers.18.feed_forward.w3, in=4096, out=11008
linear: layers.19.attention.wq, in=4096, out=4096
linear: layers.19.attention.wk, in=4096, out=4096
linear: layers.19.attention.wv, in=4096, out=4096
linear: layers.19.attention.wo, in=4096, out=4096
linear: layers.19.feed_forward.w1, in=4096, out=11008
linear: layers.19.feed_forward.w2, in=11008, out=4096
linear: layers.19.feed_forward.w3, in=4096, out=11008
linear: layers.20.attention.wq, in=4096, out=4096
linear: layers.20.attention.wk, in=4096, out=4096
linear: layers.20.attention.wv, in=4096, out=4096
linear: layers.20.attention.wo, in=4096, out=4096
linear: layers.20.feed_forward.w1, in=4096, out=11008
linear: layers.20.feed_forward.w2, in=11008, out=4096
linear: layers.20.feed_forward.w3, in=4096, out=11008
linear: layers.21.attention.wq, in=4096, out=4096
linear: layers.21.attention.wk, in=4096, out=4096
linear: layers.21.attention.wv, in=4096, out=4096
linear: layers.21.attention.wo, in=4096, out=4096
linear: layers.21.feed_forward.w1, in=4096, out=11008
linear: layers.21.feed_forward.w2, in=11008, out=4096
linear: layers.21.feed_forward.w3, in=4096, out=11008
linear: layers.22.attention.wq, in=4096, out=4096
linear: layers.22.attention.wk, in=4096, out=4096
linear: layers.22.attention.wv, in=4096, out=4096
linear: layers.22.attention.wo, in=4096, out=4096
linear: layers.22.feed_forward.w1, in=4096, out=11008
linear: layers.22.feed_forward.w2, in=11008, out=4096
linear: layers.22.feed_forward.w3, in=4096, out=11008
linear: layers.23.attention.wq, in=4096, out=4096
linear: layers.23.attention.wk, in=4096, out=4096
linear: layers.23.attention.wv, in=4096, out=4096
linear: layers.23.attention.wo, in=4096, out=4096
linear: layers.23.feed_forward.w1, in=4096, out=11008
linear: layers.23.feed_forward.w2, in=11008, out=4096
linear: layers.23.feed_forward.w3, in=4096, out=11008
linear: layers.24.attention.wq, in=4096, out=4096
linear: layers.24.attention.wk, in=4096, out=4096
linear: layers.24.attention.wv, in=4096, out=4096
linear: layers.24.attention.wo, in=4096, out=4096
linear: layers.24.feed_forward.w1, in=4096, out=11008
linear: layers.24.feed_forward.w2, in=11008, out=4096
linear: layers.24.feed_forward.w3, in=4096, out=11008
linear: layers.25.attention.wq, in=4096, out=4096
linear: layers.25.attention.wk, in=4096, out=4096
linear: layers.25.attention.wv, in=4096, out=4096
linear: layers.25.attention.wo, in=4096, out=4096
linear: layers.25.feed_forward.w1, in=4096, out=11008
linear: layers.25.feed_forward.w2, in=11008, out=4096
linear: layers.25.feed_forward.w3, in=4096, out=11008
linear: layers.26.attention.wq, in=4096, out=4096
linear: layers.26.attention.wk, in=4096, out=4096
linear: layers.26.attention.wv, in=4096, out=4096
linear: layers.26.attention.wo, in=4096, out=4096
linear: layers.26.feed_forward.w1, in=4096, out=11008
linear: layers.26.feed_forward.w2, in=11008, out=4096
linear: layers.26.feed_forward.w3, in=4096, out=11008
linear: layers.27.attention.wq, in=4096, out=4096
linear: layers.27.attention.wk, in=4096, out=4096
linear: layers.27.attention.wv, in=4096, out=4096
linear: layers.27.attention.wo, in=4096, out=4096
linear: layers.27.feed_forward.w1, in=4096, out=11008
linear: layers.27.feed_forward.w2, in=11008, out=4096
linear: layers.27.feed_forward.w3, in=4096, out=11008
linear: layers.28.attention.wq, in=4096, out=4096
linear: layers.28.attention.wk, in=4096, out=4096
linear: layers.28.attention.wv, in=4096, out=4096
linear: layers.28.attention.wo, in=4096, out=4096
linear: layers.28.feed_forward.w1, in=4096, out=11008
linear: layers.28.feed_forward.w2, in=11008, out=4096
linear: layers.28.feed_forward.w3, in=4096, out=11008
linear: layers.29.attention.wq, in=4096, out=4096
linear: layers.29.attention.wk, in=4096, out=4096
linear: layers.29.attention.wv, in=4096, out=4096
linear: layers.29.attention.wo, in=4096, out=4096
linear: layers.29.feed_forward.w1, in=4096, out=11008
linear: layers.29.feed_forward.w2, in=11008, out=4096
linear: layers.29.feed_forward.w3, in=4096, out=11008
linear: layers.30.attention.wq, in=4096, out=4096
linear: layers.30.attention.wk, in=4096, out=4096
linear: layers.30.attention.wv, in=4096, out=4096
linear: layers.30.attention.wo, in=4096, out=4096
linear: layers.30.feed_forward.w1, in=4096, out=11008
linear: layers.30.feed_forward.w2, in=11008, out=4096
linear: layers.30.feed_forward.w3, in=4096, out=11008
linear: layers.31.attention.wq, in=4096, out=4096
linear: layers.31.attention.wk, in=4096, out=4096
linear: layers.31.attention.wv, in=4096, out=4096
linear: layers.31.attention.wo, in=4096, out=4096
linear: layers.31.feed_forward.w1, in=4096, out=11008
linear: layers.31.feed_forward.w2, in=11008, out=4096
linear: layers.31.feed_forward.w3, in=4096, out=11008
linear: output, in=4096, out=32000
INFO:root:model.to torch.float32
Traceback (most recent call last):
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/custom_ops/sdpa_with_kv_cache.py", line 19, in <module>
    op = torch.ops.llama.sdpa_with_kv_cache.default
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_ops.py", line 927, in __getattr__
    raise AttributeError(
AttributeError: '_OpNamespace' 'llama' object has no attribute 'sdpa_with_kv_cache'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 30, in <module>
    main()  # pragma: no cover
    ^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama.py", line 26, in main
    export_llama(modelname, args)
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 408, in export_llama
    return _export_llama(modelname, args)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/export_llama_lib.py", line 531, in _export_llama
    ).export_to_edge(quantizers)
      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/gchauhan/dev/executorch/examples/models/llama2/builder.py", line 264, in export_to_edge
    m = capture_pre_autograd_graph(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_export/__init__.py", line 151, in capture_pre_autograd_graph
    m = torch._dynamo.export(
        ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 1232, in inner
    result_traced = opt_f(*args, **kwargs)
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/eval_frame.py", line 390, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1527, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 939, in catch_errors
    return callback(frame, cache_entry, hooks, frame_state, skip=1)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 400, in _convert_frame_assert
    return _compile(
           ^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/contextlib.py", line 81, in inner
    return func(*args, **kwds)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 686, in _compile
    guarded_code = compile_inner(code, one_graph, hooks, transform)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/utils.py", line 265, in time_wrapper
    r = func(*args, **kwargs)
        ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 541, in compile_inner
    out_code = transform_code_object(code, transform)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/bytecode_transformation.py", line 1036, in transform_code_object
    transformations(instructions, code_options)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 165, in _fn
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/convert_frame.py", line 503, in transform
    tracer.run()
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2106, in run
    super().run()
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 829, in run
    while self.step():
          ^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 471, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1771, in CALL
    self.call_function(fn, args, kwargs)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/nn_module.py", line 341, in call_function
    return tx.inline_user_function_return(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 689, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2253, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2367, in inline_call_
    tracer.run()
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 829, in run
    while self.step():
          ^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 471, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1233, in CALL_FUNCTION_EX
    self.call_function(fn, argsvars.items, kwargsvars)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 342, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 296, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 91, in call_function
    return tx.inline_user_function_return(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 689, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2253, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2367, in inline_call_
    tracer.run()
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 829, in run
    while self.step():
          ^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 471, in wrapper
    return inner_fn(self, inst)
           ^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1771, in CALL
    self.call_function(fn, args, kwargs)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 683, in call_function
    self.push(fn.call_function(self, args, kwargs))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 342, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 296, in call_function
    return super().call_function(tx, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/variables/functions.py", line 91, in call_function
    return tx.inline_user_function_return(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 689, in inline_user_function_return
    return InliningInstructionTranslator.inline_call(self, fn, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2253, in inline_call
    return cls.inline_call_(parent, func, args, kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 2367, in inline_call_
    tracer.run()
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 829, in run
    while self.step():
          ^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 743, in step
    self.dispatch_table[inst.opcode](self, inst)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/_dynamo/symbolic_convert.py", line 1054, in IMPORT_NAME
    value = __import__(
            ^^^^^^^^^^^
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/custom_ops/sdpa_with_kv_cache.py", line 25, in <module>
    assert (
AssertionError: custom_ops_aot_lib does not exist, please set LD_LIBRARY_PATH: None correctly

from user code:
   File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 458, in forward
    h = layer(
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1536, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 399, in forward
    h = self.attention.forward(
  File "/opt/miniconda3/envs/et/lib/python3.11/site-packages/executorch/examples/models/llama2/llama_transformer.py", line 280, in forward
    from .custom_ops import sdpa_with_kv_cache  # noqa

Set TORCH_LOGS="+dynamo" and TORCHDYNAMO_VERBOSE=1 for more information

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Removed -kv --use_sdpa_with_kv_cache options - generated pte file, 225 subgraphs

python -m examples.models.llama2.export_llama --checkpoint $MODEL_PATH/consolidated.00.pth --params $MODEL_PATH/params.json  -X -qmode 8da4w --group_size 128 -d fp32    

Output (truncated)

Could not import fairseq2 modules.
INFO:root:Loading model with checkpoint=/Users/gchauhan/dev/models/checkpoints/meta-llama/Llama-2-7b/consolidated.00.pth, params=/Users/gchauhan/dev/models/checkpoints/meta-llama/Llama-2-7b/params.json, use_kv_cache=False, weight_type=WeightType.LLAMA
INFO:root:Loaded model with dtype=torch.bfloat16
INFO:datasets:PyTorch version 2.4.0.dev20240324 available.
linear: layers.0.attention.wq, in=4096, out=4096
linear: layers.0.attention.wk, in=4096, out=4096
linear: layers.0.attention.wv, in=4096, out=4096
linear: layers.0.attention.wo, in=4096, out=4096
....

   %aten_view_copy_default_1021 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.view_copy.default](args = (%aten_mm_default_222, [1, %sym_size, 11008]), kwargs = {})
    %aten_mul_tensor_511 : [num_users=2] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Tensor](args = (%aten_mul_tensor_510, %aten_view_copy_default_1021), kwargs = {})
    %quantized_decomposed_choose_qparams_per_token_asymmetric_default_223 : [num_users=2] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default](args = (%aten_mul_tensor_511, torch.int8), kwargs = {})
    %getitem_446 : [num_users=2] = call_function[target=operator.getitem](args = (%quantized_decomposed_choose_qparams_per_token_asymmetric_default_223, 0), kwargs = {})
    %getitem_447 : [num_users=2] = call_function[target=operator.getitem](args = (%quantized_decomposed_choose_qparams_per_token_asymmetric_default_223, 1), kwargs = {})
    %quantized_decomposed_quantize_per_token_default_223 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.quantize_per_token.default](args = (%aten_mul_tensor_511, %getitem_446, %getitem_447, -128, 127, torch.int8), kwargs = {})
    %quantized_decomposed_dequantize_per_token_default_223 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.dequantize_per_token.default](args = (%quantized_decomposed_quantize_per_token_default_223, %getitem_446, %getitem_447, -128, 127, torch.int8, torch.float32), kwargs = {})
    %quantized_decomposed_dequantize_per_channel_group_default_223 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.dequantize_per_channel_group.default](args = (%arg769_1, %arg770_1, %arg771_1, -8, 7, torch.int8, 128, torch.float32), kwargs = {})
    %aten_permute_copy_default_383 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.permute_copy.default](args = (%quantized_decomposed_dequantize_per_channel_group_default_223, [1, 0]), kwargs = {})
    %sym_size_128 : [num_users=2] = call_function[target=torch.ops.aten.sym_size.int](args = (%aten_view_copy_default_1019, 1), kwargs = {})
    %aten_view_copy_default_1022 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.view_copy.default](args = (%quantized_decomposed_dequantize_per_token_default_223, [%sym_size_128, 11008]), kwargs = {})
    %aten_mm_default_223 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mm.default](args = (%aten_view_copy_default_1022, %aten_permute_copy_default_383), kwargs = {})
    %aten_view_copy_default_1023 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.view_copy.default](args = (%aten_mm_default_223, [1, %sym_size_128, 4096]), kwargs = {})
    %aten_add_tensor_223 : [num_users=2] = call_function[target=executorch.exir.dialects.edge._ops.aten.add.Tensor](args = (%aten_add_tensor_221, %aten_view_copy_default_1023), kwargs = {})
    %aten_mul_tensor_512 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Tensor](args = (%aten_add_tensor_223, %aten_add_tensor_223), kwargs = {})
    %aten_mean_dim_64 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mean.dim](args = (%aten_mul_tensor_512, [-1], True), kwargs = {})
    %aten_add_tensor_224 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.add.Tensor](args = (%aten_mean_dim_64, %_lifted_tensor_constant773), kwargs = {})
    %aten_rsqrt_default_64 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.rsqrt.default](args = (%aten_add_tensor_224,), kwargs = {})
    %aten_mul_tensor_513 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Tensor](args = (%aten_add_tensor_223, %aten_rsqrt_default_64), kwargs = {})
    %aten_mul_tensor_514 : [num_users=2] = call_function[target=executorch.exir.dialects.edge._ops.aten.mul.Tensor](args = (%aten_mul_tensor_513, %arg65_1), kwargs = {})
    %quantized_decomposed_choose_qparams_per_token_asymmetric_default_224 : [num_users=2] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.choose_qparams_per_token_asymmetric.default](args = (%aten_mul_tensor_514, torch.int8), kwargs = {})
    %getitem_448 : [num_users=2] = call_function[target=operator.getitem](args = (%quantized_decomposed_choose_qparams_per_token_asymmetric_default_224, 0), kwargs = {})
    %getitem_449 : [num_users=2] = call_function[target=operator.getitem](args = (%quantized_decomposed_choose_qparams_per_token_asymmetric_default_224, 1), kwargs = {})
    %quantized_decomposed_quantize_per_token_default_224 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.quantize_per_token.default](args = (%aten_mul_tensor_514, %getitem_448, %getitem_449, -128, 127, torch.int8), kwargs = {})
    %quantized_decomposed_dequantize_per_token_default_224 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.dequantize_per_token.default](args = (%quantized_decomposed_quantize_per_token_default_224, %getitem_448, %getitem_449, -128, 127, torch.int8, torch.float32), kwargs = {})
    %quantized_decomposed_dequantize_per_channel_group_default_224 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.quantized_decomposed.dequantize_per_channel_group.default](args = (%arg772_1, %arg773_1, %arg774_1, -8, 7, torch.int8, 128, torch.float32), kwargs = {})
    %aten_permute_copy_default_384 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.permute_copy.default](args = (%quantized_decomposed_dequantize_per_channel_group_default_224, [1, 0]), kwargs = {})
    %aten_view_copy_default_1024 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.view_copy.default](args = (%quantized_decomposed_dequantize_per_token_default_224, [%sym_size, 4096]), kwargs = {})
    %aten_mm_default_224 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.mm.default](args = (%aten_view_copy_default_1024, %aten_permute_copy_default_384), kwargs = {})
    %aten_view_copy_default_1025 : [num_users=1] = call_function[target=executorch.exir.dialects.edge._ops.aten.view_copy.default](args = (%aten_mm_default_224, [1, %sym_size, 32000]), kwargs = {})
    return (aten_view_copy_default_1025,)
INFO:executorch.backends.xnnpack.partition.xnnpack_partitioner:Found 225 subgraphs to be partitioned.
INFO:root:Required memory for activation in bytes: [0, 118053936]
INFO:root:Saved exported program to ./xnnpack_llama2.pte

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Run Step 4 from readme to validate on Computer

Command

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out .
Output
-- The C compiler identification is AppleClang 15.0.0.15000309
-- The CXX compiler identification is AppleClang 15.0.0.15000309
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Library/Developer/CommandLineTools/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Library/Developer/CommandLineTools/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Downloading FXdiv to /Users/gchauhan/dev/executorch/cmake-out/FXdiv-source (define FXDIV_SOURCE_DIR to avoid it)
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out/FXdiv-download
[ 11%] Creating directories for 'fxdiv'
[ 22%] Performing download step (git clone) for 'fxdiv'
Cloning into 'FXdiv-source'...
Already on 'master'
Your branch is up to date with 'origin/master'.
[ 33%] Performing update step for 'fxdiv'
[ 44%] No patch step for 'fxdiv'
[ 55%] No configure step for 'fxdiv'
[ 66%] No build step for 'fxdiv'
[ 77%] No install step for 'fxdiv'
[ 88%] No test step for 'fxdiv'
[100%] Completed 'fxdiv'
[100%] Built target fxdiv
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Using python executable 'python'
-- Resolved buck2 as /Users/gchauhan/dev/executorch/cmake-out/buck2-bin/buck2-99e407b49dc432eda0cbddd67ea78346.
-- executorch: Generating source lists
-- executorch: Generating source file list /Users/gchauhan/dev/executorch/cmake-out/executorch_srcs.cmake
-- executorch: Using sources file /Users/gchauhan/dev/executorch/cmake-out/executorch_srcs.cmake
-- Proceeding with version: 23.5.26.0
-- Looking for strtof_l
-- Looking for strtof_l - not found
-- Looking for strtoull_l
-- Looking for strtoull_l - not found
-- Looking for realpath
-- Looking for realpath - found
-- CMAKE_CXX_FLAGS: 
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out/kernels/portable/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/kernels/portable/functions.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/kernels/portable/functions.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out/kernels/portable/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out/kernels/portable/Functions.h;/Users/gchauhan/dev/executorch/cmake-out/kernels/portable/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: portable_ops_lib
--   KERNEL_LIBS: portable_kernels
--   DEPS: executorch
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out/kernels/optimized/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/kernels/optimized/optimized-oss.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/kernels/optimized/optimized-oss.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out/kernels/optimized/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out/kernels/optimized/Functions.h;/Users/gchauhan/dev/executorch/cmake-out/kernels/optimized/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: optimized_ops_lib
--   KERNEL_LIBS: optimized_kernels
--   DEPS: executorch
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out/configurations/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/cmake-out/configurations/merged.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/cmake-out/configurations/merged.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out/configurations/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out/configurations/Functions.h;/Users/gchauhan/dev/executorch/cmake-out/configurations/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: optimized_native_cpu_ops_lib
--   KERNEL_LIBS: portable_kernels;optimized_kernels
--   DEPS: executorch
CMake Deprecation Warning at third-party/gflags/CMakeLists.txt:73 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Looking for C++ include unistd.h
-- Looking for C++ include unistd.h - found
-- Looking for C++ include stdint.h
-- Looking for C++ include stdint.h - found
-- Looking for C++ include inttypes.h
-- Looking for C++ include inttypes.h - found
-- Looking for C++ include sys/types.h
-- Looking for C++ include sys/types.h - found
-- Looking for C++ include sys/stat.h
-- Looking for C++ include sys/stat.h - found
-- Looking for C++ include fnmatch.h
-- Looking for C++ include fnmatch.h - found
-- Looking for C++ include stddef.h
-- Looking for C++ include stddef.h - found
-- Check size of uint32_t
-- Check size of uint32_t - done
-- Looking for strtoll
-- Looking for strtoll - found
-- The ASM compiler identification is AppleClang
-- Found assembler: /Library/Developer/CommandLineTools/usr/bin/cc
-- Downloading FP16 to /Users/gchauhan/dev/executorch/cmake-out/FP16-source (define FP16_SOURCE_DIR to avoid it)
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out/FP16-download
[ 11%] Creating directories for 'fp16'
[ 22%] Performing download step (download, verify and extract) for 'fp16'
-- Downloading...
   dst='/Users/gchauhan/dev/executorch/cmake-out/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/Maratyszcza/FP16/archive/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
-- [download 11% complete]
-- [download 14% complete]
-- [download 19% complete]
-- [download 22% complete]
-- [download 38% complete]
-- [download 39% complete]
-- [download 42% complete]
-- [download 44% complete]
-- [download 45% complete]
-- [download 72% complete]
-- [download 91% complete]
-- [download 92% complete]
-- [download 100% complete]
-- verifying file...
       file='/Users/gchauhan/dev/executorch/cmake-out/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
-- Downloading... done
-- extracting...
     src='/Users/gchauhan/dev/executorch/cmake-out/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
     dst='/Users/gchauhan/dev/executorch/cmake-out/FP16-source'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'fp16'
[ 44%] No patch step for 'fp16'
[ 55%] No configure step for 'fp16'
[ 66%] No build step for 'fp16'
[ 77%] No install step for 'fp16'
[ 88%] No test step for 'fp16'
[100%] Completed 'fp16'
[100%] Built target fp16
CMake Deprecation Warning at cmake-out/FP16-source/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Downloading PSimd to /Users/gchauhan/dev/executorch/cmake-out/psimd-source (define PSIMD_SOURCE_DIR to avoid it)
CMake Deprecation Warning at CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out/psimd-download
[ 11%] Creating directories for 'psimd'
[ 22%] Performing download step (git clone) for 'psimd'
Cloning into 'psimd-source'...
Already on 'master'
Your branch is up to date with 'origin/master'.
[ 33%] Performing update step for 'psimd'
[ 44%] No patch step for 'psimd'
[ 55%] No configure step for 'psimd'
[ 66%] No build step for 'psimd'
[ 77%] No install step for 'psimd'
[ 88%] No test step for 'psimd'
[100%] Completed 'psimd'
[100%] Built target psimd
CMake Deprecation Warning at cmake-out/psimd-source/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- 
-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : AppleClang
--   CMAKE_TOOLCHAIN_FILE          : 
--   BUCK2                         : /Users/gchauhan/dev/executorch/cmake-out/buck2-bin/buck2-99e407b49dc432eda0cbddd67ea78346
--   PYTHON_EXECUTABLE             : python
--   FLATC_EXECUTABLE              : flatc
--   EXECUTORCH_ENABLE_LOGGING              : 1
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : OFF
--   EXECUTORCH_LOG_LEVEL                   : Info
--   EXECUTORCH_BUILD_ANDROID_JNI           : OFF
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : OFF
--   EXECUTORCH_BUILD_COREML                : OFF
--   EXECUTORCH_BUILD_CUSTOM                : OFF
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : ON
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : ON
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : ON
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : OFF
--   EXECUTORCH_BUILD_FLATC                 : ON
--   EXECUTORCH_BUILD_GFLAGS                : ON
--   EXECUTORCH_BUILD_GTESTS                : OFF
--   EXECUTORCH_BUILD_HOST_TARGETS          : ON
--   EXECUTORCH_BUILD_MPS                   : OFF
--   EXECUTORCH_BUILD_PYBIND                : OFF
--   EXECUTORCH_BUILD_QNN                   : OFF
--   EXECUTORCH_BUILD_OPTIMIZED             : ON
--   EXECUTORCH_BUILD_QUANTIZED             : OFF
--   EXECUTORCH_BUILD_SDK                   : OFF
--   EXECUTORCH_BUILD_SIZE_TEST             : OFF
--   EXECUTORCH_BUILD_XNNPACK               : ON
--   EXECUTORCH_BUILD_VULKAN                : OFF
--   EXECUTORCH_BUILD_PTHREADPOOL           : ON
--   EXECUTORCH_BUILD_CPUINFO               : ON
-- Configuring done (15.9s)
-- Generating done (1.1s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Run Step 4 from readme to validate on Computer

Not clear why we need to rebuild the Executorch again for CPU (it was already done early during the getting started install steps using ./install_requirements.sh --pybind)

Completed the build using

cmake --build cmake-out -j16 --target install --config Release

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Run Step 4 from readme to validate on Computer

Build llama builder

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out/examples/models/llama2 \
    examples/models/llama2

Output

etdump library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
bundled_program library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
flatccrt library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
mpsdelegate library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
qnn_executorch_backend library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
vulkan_backend library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
-- executorch: Using source file list /Users/gchauhan/dev/executorch/cmake-out/examples/models/llama2/runner/../../../../executorch_srcs.cmake
-- 
-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : AppleClang
--   CMAKE_TOOLCHAIN_FILE          : 
--   BUCK2                         : 
--   PYTHON_EXECUTABLE             : python
--   FLATC_EXECUTABLE              : 
--   EXECUTORCH_ENABLE_LOGGING              : 
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : 
--   EXECUTORCH_LOG_LEVEL                   : 
--   EXECUTORCH_BUILD_ANDROID_JNI           : 
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : 
--   EXECUTORCH_BUILD_COREML                : 
--   EXECUTORCH_BUILD_CUSTOM                : 
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : 
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : 
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : 
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : 
--   EXECUTORCH_BUILD_FLATC                 : 
--   EXECUTORCH_BUILD_GFLAGS                : 
--   EXECUTORCH_BUILD_GTESTS                : 
--   EXECUTORCH_BUILD_HOST_TARGETS          : 
--   EXECUTORCH_BUILD_MPS                   : 
--   EXECUTORCH_BUILD_PYBIND                : 
--   EXECUTORCH_BUILD_QNN                   : 
--   EXECUTORCH_BUILD_OPTIMIZED             : ON
--   EXECUTORCH_BUILD_QUANTIZED             : 
--   EXECUTORCH_BUILD_SDK                   : 
--   EXECUTORCH_BUILD_SIZE_TEST             : 
--   EXECUTORCH_BUILD_XNNPACK               : 
--   EXECUTORCH_BUILD_VULKAN                : 
--   EXECUTORCH_BUILD_PTHREADPOOL           : 
--   EXECUTORCH_BUILD_CPUINFO               : 
-- Configuring done (0.1s)
-- Generating done (0.1s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out/examples/models/llama2

Complete the llama build

cmake --build cmake-out/examples/models/llama2 -j16 --config Release

Output (truncated)

[ 33%] Building CXX object runner/CMakeFiles/llama_runner.dir/Users/gchauhan/dev/executorch/kernels/optimized/blas/CPUBlas.cpp.o
[ 33%] Building CXX object runner/CMakeFiles/llama_runner.dir/runner.cpp.o
[ 44%] Building CXX object runner/CMakeFiles/llama_runner.dir/__/sampler/sampler.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/__/tokenizer/tokenizer.cpp.o
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<unsigned char>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
...

Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<bool>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<bool>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
16 warnings generated.
[ 66%] Linking CXX static library libllama_runner.a
[ 66%] Built target llama_runner
[ 88%] Building CXX object CMakeFiles/llama_main.dir/Users/gchauhan/dev/executorch/backends/xnnpack/threadpool/cpuinfo_utils.cpp.o
[ 88%] Building CXX object CMakeFiles/llama_main.dir/main.cpp.o
[100%] Linking CXX executable llama_main
[100%] Built target llama_main

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Running model on computer fails with error

Command

cmake-out/examples/models/llama2/llama_main --model_path=xnnpack_llama2.pte --tokenizer_path=/Users/gchauhan/dev/models/checkpoints/meta-llama/Llama-2-7b/tokenizer.model --prompt="This is a new age"

Output with error

I 00:00:00.000675 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.000706 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.000712 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 0
I 00:00:00.000719 executorch:main.cpp:65] Resetting threadpool with num threads = 10
I 00:00:00.004521 executorch:runner.cpp:49] Creating LLaMa runner: model_path=xnnpack_llama2.pte, tokenizer_path=/Users/gchauhan/dev/models/checkpoints/meta-llama/Llama-2-7b/tokenizer.model
I 00:00:10.134341 executorch:runner.cpp:64] Reading metadata from model
I 00:00:10.134383 executorch:runner.cpp:123] get_vocab_size: 32000
I 00:00:10.134386 executorch:runner.cpp:123] get_bos_id: 1
I 00:00:10.134388 executorch:runner.cpp:123] get_eos_id: 2
I 00:00:10.134390 executorch:runner.cpp:123] get_n_bos: 1
I 00:00:10.134391 executorch:runner.cpp:123] get_n_eos: 1
I 00:00:10.134393 executorch:runner.cpp:123] get_max_seq_len: 128
I 00:00:10.134395 executorch:runner.cpp:123] use_kv_cache: 0
I 00:00:10.134397 executorch:runner.cpp:123] use_sdpa_with_kv_cache: 0
I 00:00:10.134398 executorch:runner.cpp:123] append_eos_to_prompt: 0
I 00:00:10.144900 executorch:tokenizer.cpp:86] The tokenizer vocab size 84545034 is larger than the model vocab size 32000.
E 00:00:10.145355 executorch:tokenizer.cpp:117] Failed to read the word, total length 35127296, index 0

E 00:00:10.145361 executorch:tokenizer.cpp:195] Tokenizer not initialized
F 00:00:10.145377 executorch:runner.cpp:245] In function generate(), assert failed (num_prompt_tokens >= 1): Expected at least 1 prompt token

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Rerun by generating tokenizer.bin

Generate tokenizer.bin

(steps not clear in readme -- mentions for Stories model but not for Llama2 model)

python -m examples.models.llama2.tokenizer.tokenizer -t ~/dev/models/checkpoints/meta-llama/Llama-2-7b/tokenizer.model -o tokenizer.bin

Run llama model

 cmake-out/examples/models/llama2/llama_main --model_path=xnnpack_llama2.pte --tokenizer_path=tokenizer.bin --prompt="Abrahim Lincoln"

Success with output (speed was very slow)

I 00:00:00.000199 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.000212 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.000215 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 0
I 00:00:00.000217 executorch:main.cpp:65] Resetting threadpool with num threads = 10
I 00:00:00.000415 executorch:runner.cpp:49] Creating LLaMa runner: model_path=xnnpack_llama2.pte, tokenizer_path=tokenizer.bin
I 00:00:10.375479 executorch:runner.cpp:64] Reading metadata from model
I 00:00:10.375503 executorch:runner.cpp:123] get_vocab_size: 32000
I 00:00:10.375506 executorch:runner.cpp:123] get_bos_id: 1
I 00:00:10.375508 executorch:runner.cpp:123] get_eos_id: 2
I 00:00:10.375509 executorch:runner.cpp:123] get_n_bos: 1
I 00:00:10.375511 executorch:runner.cpp:123] get_n_eos: 1
I 00:00:10.375514 executorch:runner.cpp:123] get_max_seq_len: 128
I 00:00:10.375516 executorch:runner.cpp:123] use_kv_cache: 0
I 00:00:10.375518 executorch:runner.cpp:123] use_sdpa_with_kv_cache: 0
I 00:00:10.375519 executorch:runner.cpp:123] append_eos_to_prompt: 0
Abrahim Lincoln’s Trip to New Salem in 1831
ʹThe Life and Times of Abraham Lincolnʹ, by Herndon and Weik, was published in Chicago in 1889, and contained a number of dissertations which were said to have been read at a recent Lincoln Centennial celebration. These dissertations were mostly out of date and full of errors.
In 1894, Herndon and Weik published a second edition of their book, in which they presented an extended account of Lincoln’s life and writings. This second
PyTorchObserver {"prompt_tokens":5,"generated_tokens":122,"model_load_start_ms":1712514333018,"model_load_end_ms":1712514343409,"inference_start_ms":1712514343409,"inference_end_ms":1712514738750,"prompt_eval_end_ms":1712514343803,"first_token_ms":1712514344148,"aggregate_sampling_time_ms":38,"SCALING_FACTOR_UNITS_PER_SECOND":1000}
I 00:06:45.731759 executorch:runner.cpp:411] 	Prompt Tokens: 5    Generated Tokens: 122
I 00:06:45.731762 executorch:runner.cpp:417] 	Model Load Time:		10.391000 (seconds)
I 00:06:45.731767 executorch:runner.cpp:427] 	Total inference time:		395.341000 (seconds)		 Rate: 	0.308594 (tokens/second)
I 00:06:45.731769 executorch:runner.cpp:435] 		Prompt evaluation:	0.394000 (seconds)		 Rate: 	12.690355 (tokens/second)
I 00:06:45.731771 executorch:runner.cpp:446] 		Generated 122 tokens:	394.947000 (seconds)		 Rate: 	0.308902 (tokens/second)
I 00:06:45.731772 executorch:runner.cpp:454] 	Time to first generated token:	0.739000 (seconds)
I 00:06:45.731774 executorch:runner.cpp:461] 	Sampling time over 127 tokens:	0.038000 (seconds)

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

For tests for Android - ET android build and steps for building llama2 model ET runner

Commands for building Android version

export ANDROID_NDK=/Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342  
cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out-android .

Output

cmake -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out-android .
-- The C compiler identification is Clang 17.0.2
-- The CXX compiler identification is Clang 17.0.2
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Downloading FXdiv to /Users/gchauhan/dev/executorch/cmake-out-android/FXdiv-source (define FXDIV_SOURCE_DIR to avoid it)
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/FXdiv-download
[ 11%] Creating directories for 'fxdiv'
[ 22%] Performing download step (git clone) for 'fxdiv'
Cloning into 'FXdiv-source'...
Already on 'master'
Your branch is up to date with 'origin/master'.
[ 33%] Performing update step for 'fxdiv'
[ 44%] No patch step for 'fxdiv'
[ 55%] No configure step for 'fxdiv'
[ 66%] No build step for 'fxdiv'
[ 77%] No install step for 'fxdiv'
[ 88%] No test step for 'fxdiv'
[100%] Completed 'fxdiv'
[100%] Built target fxdiv
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - not found
-- Using python executable 'python'
-- Resolved buck2 as /Users/gchauhan/dev/executorch/cmake-out-android/buck2-bin/buck2-99e407b49dc432eda0cbddd67ea78346.
-- executorch: Generating source lists
-- executorch: Generating source file list /Users/gchauhan/dev/executorch/cmake-out-android/executorch_srcs.cmake
-- executorch: Using sources file /Users/gchauhan/dev/executorch/cmake-out-android/executorch_srcs.cmake
-- Proceeding with version: 23.5.26.0
-- Looking for strtof_l
-- Looking for strtof_l - found
-- Looking for strtoull_l
-- Looking for strtoull_l - found
-- Looking for realpath
-- Looking for realpath - found
-- CMAKE_CXX_FLAGS: -g -DANDROID -fdata-sections -ffunction-sections -funwind-tables -fstack-protector-strong -no-canonical-prefixes -D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security  
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/kernels/portable/functions.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/kernels/portable/functions.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/Functions.h;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/portable/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: portable_ops_lib
--   KERNEL_LIBS: portable_kernels
--   DEPS: executorch
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/kernels/optimized/optimized-oss.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/kernels/optimized/optimized-oss.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/Functions.h;/Users/gchauhan/dev/executorch/cmake-out-android/kernels/optimized/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: optimized_ops_lib
--   KERNEL_LIBS: optimized_kernels
--   DEPS: executorch
Command - python;-m;codegen.tools.gen_oplist;--output_path=/Users/gchauhan/dev/executorch/cmake-out-android/configurations/selected_operators.yaml;--ops_schema_yaml_path="/Users/gchauhan/dev/executorch/cmake-out-android/configurations/merged.yaml"
-- Generating kernel bindings:
--   FUNCTIONS_YAML: /Users/gchauhan/dev/executorch/cmake-out-android/configurations/merged.yaml
--   CUSTOM_OPS_YAML: 
Generated files /Users/gchauhan/dev/executorch/cmake-out-android/configurations/RegisterCodegenUnboxedKernelsEverything.cpp;/Users/gchauhan/dev/executorch/cmake-out-android/configurations/Functions.h;/Users/gchauhan/dev/executorch/cmake-out-android/configurations/NativeFunctions.h
-- Generating operator lib:
--   LIB_NAME: optimized_native_cpu_ops_lib
--   KERNEL_LIBS: portable_kernels;optimized_kernels
--   DEPS: executorch
CMake Deprecation Warning at third-party/gflags/CMakeLists.txt:73 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Looking for C++ include unistd.h
-- Looking for C++ include unistd.h - found
-- Looking for C++ include stdint.h
-- Looking for C++ include stdint.h - found
-- Looking for C++ include inttypes.h
-- Looking for C++ include inttypes.h - found
-- Looking for C++ include sys/types.h
-- Looking for C++ include sys/types.h - found
-- Looking for C++ include sys/stat.h
-- Looking for C++ include sys/stat.h - found
-- Looking for C++ include fnmatch.h
-- Looking for C++ include fnmatch.h - found
-- Looking for C++ include stddef.h
-- Looking for C++ include stddef.h - found
-- Check size of uint32_t
-- Check size of uint32_t - done
-- Looking for strtoll
-- Looking for strtoll - found
-- The ASM compiler identification is Clang with GNU-like command-line
-- Found assembler: /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/toolchains/llvm/prebuilt/darwin-x86_64/bin/clang
-- Downloading FP16 to /Users/gchauhan/dev/executorch/cmake-out-android/FP16-source (define FP16_SOURCE_DIR to avoid it)
-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/FP16-download
[ 11%] Creating directories for 'fp16'
[ 22%] Performing download step (download, verify and extract) for 'fp16'
-- Downloading...
   dst='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
   timeout='none'
   inactivity timeout='none'
-- Using src='https://github.com/Maratyszcza/FP16/archive/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
-- verifying file...
       file='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
-- Downloading... done
-- extracting...
     src='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-download/fp16-prefix/src/0a92994d729ff76a58f692d3028ca1b64b145d91.zip'
     dst='/Users/gchauhan/dev/executorch/cmake-out-android/FP16-source'
-- extracting... [tar xfz]
-- extracting... [analysis]
-- extracting... [rename]
-- extracting... [clean up]
-- extracting... done
[ 33%] No update step for 'fp16'
[ 44%] No patch step for 'fp16'
[ 55%] No configure step for 'fp16'
[ 66%] No build step for 'fp16'
[ 77%] No install step for 'fp16'
[ 88%] No test step for 'fp16'
[100%] Completed 'fp16'
[100%] Built target fp16
CMake Deprecation Warning at cmake-out-android/FP16-source/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Downloading PSimd to /Users/gchauhan/dev/executorch/cmake-out-android/psimd-source (define PSIMD_SOURCE_DIR to avoid it)
CMake Deprecation Warning at CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Configuring done (0.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/psimd-download
[ 11%] Creating directories for 'psimd'
[ 22%] Performing download step (git clone) for 'psimd'
Cloning into 'psimd-source'...
Already on 'master'
Your branch is up to date with 'origin/master'.
[ 33%] Performing update step for 'psimd'
[ 44%] No patch step for 'psimd'
[ 55%] No configure step for 'psimd'
[ 66%] No build step for 'psimd'
[ 77%] No install step for 'psimd'
[ 88%] No test step for 'psimd'
[100%] Completed 'psimd'
[100%] Built target psimd
CMake Deprecation Warning at cmake-out-android/psimd-source/CMakeLists.txt:1 (CMAKE_MINIMUM_REQUIRED):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- 
-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : Clang
--   CMAKE_TOOLCHAIN_FILE          : /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/build/cmake/android.toolchain.cmake
--   BUCK2                         : /Users/gchauhan/dev/executorch/cmake-out-android/buck2-bin/buck2-99e407b49dc432eda0cbddd67ea78346
--   PYTHON_EXECUTABLE             : python
--   FLATC_EXECUTABLE              : flatc
--   EXECUTORCH_ENABLE_LOGGING              : 1
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : OFF
--   EXECUTORCH_LOG_LEVEL                   : Info
--   EXECUTORCH_BUILD_ANDROID_JNI           : OFF
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : OFF
--   EXECUTORCH_BUILD_COREML                : OFF
--   EXECUTORCH_BUILD_CUSTOM                : OFF
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : ON
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : ON
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : ON
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : OFF
--   EXECUTORCH_BUILD_FLATC                 : ON
--   EXECUTORCH_BUILD_GFLAGS                : ON
--   EXECUTORCH_BUILD_GTESTS                : OFF
--   EXECUTORCH_BUILD_HOST_TARGETS          : ON
--   EXECUTORCH_BUILD_MPS                   : OFF
--   EXECUTORCH_BUILD_PYBIND                : OFF
--   EXECUTORCH_BUILD_QNN                   : OFF
--   EXECUTORCH_BUILD_OPTIMIZED             : ON
--   EXECUTORCH_BUILD_QUANTIZED             : OFF
--   EXECUTORCH_BUILD_SDK                   : OFF
--   EXECUTORCH_BUILD_SIZE_TEST             : OFF
--   EXECUTORCH_BUILD_XNNPACK               : ON
--   EXECUTORCH_BUILD_VULKAN                : OFF
--   EXECUTORCH_BUILD_PTHREADPOOL           : ON
--   EXECUTORCH_BUILD_CPUINFO               : ON
-- Configuring done (16.5s)
-- Generating done (1.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android
cmake --build cmake-out-android -j16 --target install --config Release

Truncated output

[  0%] Building CXX object third-party/gflags/CMakeFiles/gflags_nothreads_static.dir/src/gflags.cc.o
[  0%] Building CXX object backends/xnnpack/third-party/XNNPACK/CMakeFiles/convolution-test-helpers.dir/test/convolution-test-helpers.cc.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/api.c.o
[  0%] Building CXX object third-party/gflags/CMakeFiles/gflags_nothreads_static.dir/src/gflags_reporting.cc.o
[  0%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernel-utils.dir/src/microkernel-utils.c.o
[  0%] Building CXX object third-party/gflags/CMakeFiles/gflags_nothreads_static.dir/src/gflags_completions.cc.o
[  0%] Building CXX object third-party/flatbuffers/CMakeFiles/flatc.dir/src/reflection.cpp.o
[  0%] Building CXX object third-party/flatbuffers/CMakeFiles/flatc.dir/src/idl_parser.cpp.o
[  0%] Building CXX object third-party/flatbuffers/CMakeFiles/flatc.dir/src/idl_gen_text.cpp.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/api.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/init.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo_internals.dir/src/cache.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/cache.c.o
[  0%] Building C object backends/xnnpack/third-party/cpuinfo/CMakeFiles/cpuinfo.dir/src/log.c.o
[  0%] Building CXX object kernels/optimized/CMakeFiles/eigen_blas.dir/third-party/eigen/blas/single.cpp.o
clang++: clang++: clang++clang++: : clang++: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]clang++clang++: warning: 
argument unused during compilation: '-s' [-Wunused-command-line-argument]: 
clang++warning: warning: : warning: warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]argument unused during compilation: '-s' [-Wunused-command-line-argument]warning: 
warning: argument unused during compilation: '-s' [-Wunused-command-line-argument]argument unused during compilation: '-s' [-Wunused-command-line-argument]
argument unused during compilation: '-s' [-Wunused-command-line-argument]

argument unused during compilation: '-s' [-Wunused-command-line-argument]
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-16.c.o
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-64.c.o
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/exp2minus-k-over-2048.c.o
[100%] Building C object backends/xnnpack/third-party/XNNPACK/CMakeFiles/microkernels-all.dir/src/tables/vlog.c.o
[100%] Built target microkernels-all
Install the project...
-- Install configuration: "Release"
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/share/cpuinfo/cpuinfo-config.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libcpuinfo.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/cpuinfo.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/share/cpuinfo/cpuinfo-targets.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/share/cpuinfo/cpuinfo-targets-release.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/pkgconfig/libcpuinfo.pc
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/pthreadpool.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libpthreadpool.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fxdiv.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libportable_kernels.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libportable_ops_lib.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libeigen_blas.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libcpublas.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/liboptimized_kernels.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/liboptimized_ops_lib.a
-- Up-to-date: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libcpublas.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/liboptimized_native_cpu_ops_lib.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libexecutorch.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/cmake/ExecuTorch/executorch-config.cmake
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libextension_data_loader.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libextension_module.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/bitcasts.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/fp16.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/psimd.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/__init__.py
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/avx.py
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/fp16/avx2.py
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/psimd.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libXNNPACK.a
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/xnnpack.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/include/experiments-config.h
-- Installing: /Users/gchauhan/dev/executorch/cmake-out-android/lib/libxnnpack_backend.a

Build llama for android

cmake  -DCMAKE_TOOLCHAIN_FILE=$ANDROID_NDK/build/cmake/android.toolchain.cmake \
    -DANDROID_ABI=arm64-v8a \
    -DANDROID_PLATFORM=android-23 \
    -DCMAKE_INSTALL_PREFIX=cmake-out-android \
    -DCMAKE_BUILD_TYPE=Release \
    -DPYTHON_EXECUTABLE=python \
    -DEXECUTORCH_BUILD_OPTIMIZED=ON \
    -Bcmake-out-android/examples/models/llama2 \
    examples/models/llama2

etdump library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
bundled_program library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
flatccrt library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
mpsdelegate library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
qnn_executorch_backend library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
vulkan_backend library is not found.
            If needed rebuild with the proper options in CMakeLists.txt
-- executorch: Using source file list /Users/gchauhan/dev/executorch/cmake-out-android/examples/models/llama2/runner/../../../../executorch_srcs.cmake
-- 
-- ******** Summary ********
--   CMAKE_BUILD_TYPE              : Release
--   CMAKE_CXX_STANDARD            : 17
--   CMAKE_CXX_COMPILER_ID         : Clang
--   CMAKE_TOOLCHAIN_FILE          : /Users/gchauhan/Library/Android/sdk/ndk/26.2.11394342/build/cmake/android.toolchain.cmake
--   BUCK2                         : 
--   PYTHON_EXECUTABLE             : python
--   FLATC_EXECUTABLE              : 
--   EXECUTORCH_ENABLE_LOGGING              : 
--   EXECUTORCH_ENABLE_PROGRAM_VERIFICATION : 
--   EXECUTORCH_LOG_LEVEL                   : 
--   EXECUTORCH_BUILD_ANDROID_JNI           : 
--   EXECUTORCH_BUILD_ARM_BAREMETAL         : 
--   EXECUTORCH_BUILD_COREML                : 
--   EXECUTORCH_BUILD_CUSTOM                : 
--   EXECUTORCH_BUILD_EXECUTOR_RUNNER       : 
--   EXECUTORCH_BUILD_EXTENSION_DATA_LOADER : 
--   EXECUTORCH_BUILD_EXTENSION_MODULE      : 
--   EXECUTORCH_BUILD_EXTENSION_RUNNER_UTIL : 
--   EXECUTORCH_BUILD_FLATC                 : 
--   EXECUTORCH_BUILD_GFLAGS                : 
--   EXECUTORCH_BUILD_GTESTS                : 
--   EXECUTORCH_BUILD_HOST_TARGETS          : 
--   EXECUTORCH_BUILD_MPS                   : 
--   EXECUTORCH_BUILD_PYBIND                : 
--   EXECUTORCH_BUILD_QNN                   : 
--   EXECUTORCH_BUILD_OPTIMIZED             : ON
--   EXECUTORCH_BUILD_QUANTIZED             : 
--   EXECUTORCH_BUILD_SDK                   : 
--   EXECUTORCH_BUILD_SIZE_TEST             : 
--   EXECUTORCH_BUILD_XNNPACK               : 
--   EXECUTORCH_BUILD_VULKAN                : 
--   EXECUTORCH_BUILD_PTHREADPOOL           : 
--   EXECUTORCH_BUILD_CPUINFO               : 
-- Configuring done (0.2s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/gchauhan/dev/executorch/cmake-out-android/examples/models/llama2
cmake --build cmake-out-android/examples/models/llama2 -j16 --config Release
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/runner.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/Users/gchauhan/dev/executorch/kernels/optimized/blas/CPUBlas.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/__/tokenizer/tokenizer.cpp.o
[ 55%] Building CXX object runner/CMakeFiles/llama_runner.dir/__/sampler/sampler.cpp.o
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<unsigned char>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<unsigned char>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<signed char>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<signed char>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<short>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<short>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<int>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<int>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<long>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<long>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<float>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<float>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<double>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<double>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:29: note: 'data_ptr' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
                            ^
/Users/gchauhan/dev/executorch/extension/evalue_util/print_evalue.cpp:161:36: warning: 'data_ptr<bool>' is deprecated [-Wdeprecated-declarations]
    ET_FORALL_REAL_TYPES_AND(Bool, PRINT_TENSOR_DATA)
                                   ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/core/portable_type/tensor.h:133:3: note: 'data_ptr<bool>' has been explicitly marked deprecated here
  __ET_DEPRECATED inline T* data_ptr() const {
  ^
/Users/gchauhan/dev/executorch/examples/models/llama2/../../../../executorch/runtime/platform/compiler.h:42:27: note: expanded from macro '__ET_DEPRECATED'
#define __ET_DEPRECATED [[deprecated]]
                          ^
16 warnings generated.
[ 66%] Linking CXX static library libllama_runner.a
[ 66%] Built target llama_runner
[ 88%] Building CXX object CMakeFiles/llama_main.dir/main.cpp.o
[ 88%] Building CXX object CMakeFiles/llama_main.dir/Users/gchauhan/dev/executorch/backends/xnnpack/threadpool/cpuinfo_utils.cpp.o
[100%] Linking CXX executable llama_main
[100%] Built target llama_main

@chauhang
Copy link
Author

chauhang commented Apr 7, 2024

Unable to run on Android Emulator

adb push for 4GB pte file hangs or crashes the emulator

Run model on Android (actual device worked)

Copy files

adb push xnnpack_llama2.pte /data/local/tmp/
adb push tokenizer.bin /data/local/tmp/
adb push cmake-out-android/examples/models/llama2/llama_main /data/local/tmp/

Run model on device

adb shell "cd /data/local/tmp && ./llama_main --model_path ./xnnpack_llama2.pte --tokenizer_path ./tokenizer.bin --prompt "Once upon a time" --seq_len 120"
I 00:00:00.003152 executorch:cpuinfo_utils.cpp:61] Reading file /sys/devices/soc0/image_version
I 00:00:00.003479 executorch:cpuinfo_utils.cpp:77] Failed to open midr file /sys/devices/soc0/image_version
I 00:00:00.003550 executorch:cpuinfo_utils.cpp:157] Number of efficient cores 4
I 00:00:00.003586 executorch:main.cpp:65] Resetting threadpool with num threads = 4
I 00:00:00.008603 executorch:runner.cpp:49] Creating LLaMa runner: model_path=./xnnpack_llama2.pte, tokenizer_path=./tokenizer.bin
I 00:00:12.040637 executorch:runner.cpp:64] Reading metadata from model
I 00:00:12.041047 executorch:runner.cpp:123] get_vocab_size: 32000
I 00:00:12.041061 executorch:runner.cpp:123] get_bos_id: 1
I 00:00:12.041077 executorch:runner.cpp:123] get_eos_id: 2
I 00:00:12.041089 executorch:runner.cpp:123] get_n_bos: 1
I 00:00:12.041095 executorch:runner.cpp:123] get_n_eos: 1
I 00:00:12.041100 executorch:runner.cpp:123] get_max_seq_len: 128
I 00:00:12.041105 executorch:runner.cpp:123] use_kv_cache: 0
I 00:00:12.041110 executorch:runner.cpp:123] use_sdpa_with_kv_cache: 0
I 00:00:12.041114 executorch:runner.cpp:123] append_eos_to_prompt: 0
Once upon a time, there was a beautiful city called Baghdad.istration, the Iraqi government is working to remove the names of all Americans and allies in Iraq from the terrorist list. The Iraqi government wants the world to believe that the new Iraq will be a peaceful, democratic place where all the world's people can feel safe. But, if the world's people believe in that new Iraq, they have to believe that the Iraqi government will put the names of all Iraqis who are terrorists on their terrorist listI 00:25:22.676836 executorch:runner.cpp:411] 	Prompt Tokens: 2    Generated Tokens: 117
I 00:25:22.677070 executorch:runner.cpp:417] 	Model Load Time:		12.051000 (seconds)
I 00:25:22.677151 executorch:runner.cpp:427] 	Total inference time:		1510.609000 (seconds)		 Rate: 	0.077452 (tokens/second)
I 00:25:22.677205 executorch:runner.cpp:435] 		Prompt evaluation:	4.939000 (seconds)		 Rate: 	0.404940 (tokens/second)
I 00:25:22.677380 executorch:runner.cpp:446] 		Generated 117 tokens:	1505.670000 (seconds)		 Rate: 	0.077706 (tokens/second)
I 00:25:22.677457 executorch:runner.cpp:454] 	Time to first generated token:	8.448000 (seconds)
I 00:25:22.677507 executorch:runner.cpp:461] 	Sampling time over 119 tokens:	0.136000 (seconds)

PyTorchObserver {"prompt_tokens":2,"generated_tokens":117,"model_load_start_ms":1712522260853,"model_load_end_ms":1712522272904,"inference_start_ms":1712522272904,"inference_end_ms":1712523783513,"prompt_eval_end_ms":1712522277843,"first_token_ms":1712522281352,"aggregate_sampling_time_ms":136,"SCALING_FACTOR_UNITS_PER_SECOND":1000}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment