Last active
June 28, 2025 06:44
-
-
Save willy-liu/66e9c3f60170ffd657abcdfea2f20d46 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
(bitnet-cpp) willy@linux2025:~/Desktop/linux2025/term-project/BitNet$ uftrace record ./build/bin/llama-cli -m models/BitNet-b1.58-2B-4T/ggml-model-i2_s.gguf -p "why sky blue" -n 50 --temp 0 | |
(bitnet-cpp) willy@linux2025:~/Desktop/linux2025/term-project/BitNet$ uftrace report | |
Total time Self time Calls Function | |
========== ========== ========== ==================== | |
1.057 m 118.955 us 10 ggml_graph_compute_secondary_thread | |
1.056 m 85.570 ms 306 ggml_graph_compute_thread | |
1.050 m 134.310 ms 341496 ggml_compute_forward | |
1.049 m 453.917 ms 82926 ggml_compute_forward_mul_mat | |
1.047 m 1.953 ms 730 std::condition_variable::wait | |
1.006 m 6.459 s 715668 ggml_compute_forward_mul_mat_one_chunk | |
59.627 s 8.976 us 2 llama_load_model_from_file | |
53.581 s 9.249 ms 53676 std::__invoke_impl | |
53.532 s 53.532 s 475 linux:schedule | |
53.522 s 488.271 us 1 main | |
53.511 s 0.394 us 1 std::thread::_State_impl::_M_run | |
53.511 s 0.243 us 1 std::thread::_Invoker::operator() | |
53.511 s 0.233 us 1 std::thread::_Invoker::_M_invoke | |
53.511 s 0.222 us 1 std::__invoke | |
53.511 s 356.595 us 1 common_log::resume::$_0::operator() | |
41.933 s 51.806 us 102 llama_decode | |
40.352 s 37.444 s 37528064 ggml_vec_dot_i2_i8_s | |
38.564 s 20.057 s 424062 ggml_barrier | |
30.567 s 10.666 us 1 common_init_from_params | |
29.813 s 3.619 us 1 llama_model_load | |
25.760 s 655.968 ms 1 llm_load_vocab | |
23.430 s 23.362 s 219431233 ggml_thread_cpu_relax | |
20.966 s 565.998 us 51 llama_decode_internal | |
20.692 s 188.142 us 265 ggml_graph_compute_check_for_work | |
20.691 s 12.213 s 265 ggml_graph_compute_poll_for_work | |
19.392 s 134.185 us 51 llama_graph_compute | |
19.392 s 14.396 us 51 ggml_backend_sched_graph_compute_async | |
19.392 s 43.438 us 51 ggml_backend_sched_compute_splits | |
19.392 s 17.850 us 51 ggml_backend_graph_compute_async | |
19.392 s 53.413 us 51 ggml_backend_cpu_graph_compute | |
19.343 s 36.156 us 51 ggml_graph_compute | |
18.876 s 18.866 s 11763456 ggml_vec_dot_f16 | |
10.305 s 242.974 ms 280147 std::map::emplace | |
7.129 s 27.913 ms 283667 std::map::lower_bound | |
7.101 s 70.411 ms 283667 std::_Rb_tree::lower_bound | |
7.087 s 4.670 s 11538 quantize_row_i8_s | |
6.977 s 1.007 s 287758 std::_Rb_tree::_M_lower_bound | |
6.603 s 20.279 ms 128256 llama_token_to_piece::cxx11 | |
6.583 s 13.725 ms 130209 llama_token_to_piece | |
6.569 s 68.857 ms 130155 llama_token_to_piece_impl | |
6.351 s 338.351 ms 129591 llama_decode_text | |
6.254 s 2.722 s 17604382 std::operator< | |
4.376 s 76.622 ms 871921 unicode_utf8_to_byte | |
4.298 s 77.186 ms 872268 std::unordered_map::at | |
4.221 s 132.559 ms 872268 std::__detail::_Map_base::at | |
4.174 s 565.350 ms 5865830 std::less::operator() | |
4.174 s 508.574 ms 409750 unicode_cpts_from_utf8 | |
3.874 s 380.694 ms 873171 std::_Hashtable::find | |
3.770 s 1.000 ms 1 llama_model_loader::llama_model_loader | |
3.540 s 431.597 ms 44 gguf_kv_to_str::cxx11 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment