Skip to content

Instantly share code, notes, and snippets.

@willy-liu
Last active June 28, 2025 06:44
Show Gist options
  • Save willy-liu/66e9c3f60170ffd657abcdfea2f20d46 to your computer and use it in GitHub Desktop.
Save willy-liu/66e9c3f60170ffd657abcdfea2f20d46 to your computer and use it in GitHub Desktop.
(bitnet-cpp) willy@linux2025:~/Desktop/linux2025/term-project/BitNet$ uftrace record ./build/bin/llama-cli -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "why sky blue" -n 50 --temp 0
(bitnet-cpp) willy@linux2025:~/Desktop/linux2025/term-project/BitNet$ uftrace report
Total time Self time Calls Function
========== ========== ========== ====================
1.057 m 118.955 us 10 ggml_graph_compute_secondary_thread
1.056 m 85.570 ms 306 ggml_graph_compute_thread
1.050 m 134.310 ms 341496 ggml_compute_forward
1.049 m 453.917 ms 82926 ggml_compute_forward_mul_mat
1.047 m 1.953 ms 730 std::condition_variable::wait
1.006 m 6.459 s 715668 ggml_compute_forward_mul_mat_one_chunk
59.627 s 8.976 us 2 llama_load_model_from_file
53.581 s 9.249 ms 53676 std::__invoke_impl
53.532 s 53.532 s 475 linux:schedule
53.522 s 488.271 us 1 main
53.511 s 0.394 us 1 std::thread::_State_impl::_M_run
53.511 s 0.243 us 1 std::thread::_Invoker::operator()
53.511 s 0.233 us 1 std::thread::_Invoker::_M_invoke
53.511 s 0.222 us 1 std::__invoke
53.511 s 356.595 us 1 common_log::resume::$_0::operator()
41.933 s 51.806 us 102 llama_decode
40.352 s 37.444 s 37528064 ggml_vec_dot_i2_i8_s
38.564 s 20.057 s 424062 ggml_barrier
30.567 s 10.666 us 1 common_init_from_params
29.813 s 3.619 us 1 llama_model_load
25.760 s 655.968 ms 1 llm_load_vocab
23.430 s 23.362 s 219431233 ggml_thread_cpu_relax
20.966 s 565.998 us 51 llama_decode_internal
20.692 s 188.142 us 265 ggml_graph_compute_check_for_work
20.691 s 12.213 s 265 ggml_graph_compute_poll_for_work
19.392 s 134.185 us 51 llama_graph_compute
19.392 s 14.396 us 51 ggml_backend_sched_graph_compute_async
19.392 s 43.438 us 51 ggml_backend_sched_compute_splits
19.392 s 17.850 us 51 ggml_backend_graph_compute_async
19.392 s 53.413 us 51 ggml_backend_cpu_graph_compute
19.343 s 36.156 us 51 ggml_graph_compute
18.876 s 18.866 s 11763456 ggml_vec_dot_f16
10.305 s 242.974 ms 280147 std::map::emplace
7.129 s 27.913 ms 283667 std::map::lower_bound
7.101 s 70.411 ms 283667 std::_Rb_tree::lower_bound
7.087 s 4.670 s 11538 quantize_row_i8_s
6.977 s 1.007 s 287758 std::_Rb_tree::_M_lower_bound
6.603 s 20.279 ms 128256 llama_token_to_piece::cxx11
6.583 s 13.725 ms 130209 llama_token_to_piece
6.569 s 68.857 ms 130155 llama_token_to_piece_impl
6.351 s 338.351 ms 129591 llama_decode_text
6.254 s 2.722 s 17604382 std::operator<
4.376 s 76.622 ms 871921 unicode_utf8_to_byte
4.298 s 77.186 ms 872268 std::unordered_map::at
4.221 s 132.559 ms 872268 std::__detail::_Map_base::at
4.174 s 565.350 ms 5865830 std::less::operator()
4.174 s 508.574 ms 409750 unicode_cpts_from_utf8
3.874 s 380.694 ms 873171 std::_Hashtable::find
3.770 s 1.000 ms 1 llama_model_loader::llama_model_loader
3.540 s 431.597 ms 44 gguf_kv_to_str::cxx11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment