alex4o/running_using_gpu.txt

## running_using_gpu.txt
> ./main -m ../phi-2-orange.Q4_K_M.gguf -ngl 33 -i -cml
Log start
main: build = 22 (277fad3)
main: built with clang version 17.0.6 for aarch64-unknown-linux-android24
main: seed  = 1707123971
ggml_vulkan: Using Mali-G78 | uma: 1 | fp16: 1 | warp size: 16
llama_model_loader: loaded meta data with 21 key-value pairs and 325 tensors from ../phi-2-orange.Q4_K_M.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = phi2
llama_model_loader: - kv   1:                               general.name str              = Phi2
llama_model_loader: - kv   2:                        phi2.context_length u32              = 2048
llama_model_loader: - kv   3:                      phi2.embedding_length u32              = 2560
llama_model_loader: - kv   4:                   phi2.feed_forward_length u32              = 10240
llama_model_loader: - kv   5:                           phi2.block_count u32              = 32
llama_model_loader: - kv   6:                  phi2.attention.head_count u32              = 32
llama_model_loader: - kv   7:               phi2.attention.head_count_kv u32              = 32
llama_model_loader: - kv   8:          phi2.attention.layer_norm_epsilon f32              = 0.000010
llama_model_loader: - kv   9:                  phi2.rope.dimension_count u32              = 32
llama_model_loader: - kv  10:                          general.file_type u32              = 15
llama_model_loader: - kv  11:               tokenizer.ggml.add_bos_token bool             = false
llama_model_loader: - kv  12:                       tokenizer.ggml.model str              = gpt2
llama_model_loader: - kv  13:                      tokenizer.ggml.tokens arr[str,51200]   = ["!", "\"", "#", "$", "%", "&", "'", ...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,51200]   = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  15:                      tokenizer.ggml.merges arr[str,50000]   = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
llama_model_loader: - kv  16:                tokenizer.ggml.bos_token_id u32              = 50256
llama_model_loader: - kv  17:                tokenizer.ggml.eos_token_id u32              = 50295
llama_model_loader: - kv  18:            tokenizer.ggml.unknown_token_id u32              = 50256
llama_model_loader: - kv  19:            tokenizer.ggml.padding_token_id u32              = 50256
llama_model_loader: - kv  20:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  195 tensors
llama_model_loader: - type q4_K:   81 tensors
llama_model_loader: - type q5_K:   32 tensors
llama_model_loader: - type q6_K:   17 tensors
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ).
llm_load_print_meta: format           = GGUF V3 (latest)
llm_load_print_meta: arch             = phi2
llm_load_print_meta: vocab type       = BPE
llm_load_print_meta: n_vocab          = 51200
llm_load_print_meta: n_merges         = 50000
llm_load_print_meta: n_ctx_train      = 2048
llm_load_print_meta: n_embd           = 2560
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 32
llm_load_print_meta: n_embd_head_k    = 80
llm_load_print_meta: n_embd_head_v    = 80
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 2560
llm_load_print_meta: n_embd_v_gqa     = 2560
llm_load_print_meta: f_norm_eps       = 1.0e-05
llm_load_print_meta: f_norm_rms_eps   = 0.0e+00
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 10240
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 2048
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 3B
llm_load_print_meta: model ftype      = Q4_K - Medium
llm_load_print_meta: model params     = 2.78 B
llm_load_print_meta: model size       = 1.66 GiB (5.14 BPW)
llm_load_print_meta: general.name     = Phi2
llm_load_print_meta: BOS token        = 50256 '<|endoftext|>'
llm_load_print_meta: EOS token        = 50295 '<|im_end|>'
llm_load_print_meta: UNK token        = 50256 '<|endoftext|>'
llm_load_print_meta: PAD token        = 50256 '<|endoftext|>'
llm_load_print_meta: LF token         = 128 'Ä'
llm_load_tensors: ggml ctx size =    0.25 MiB
llm_load_tensors: offloading 32 repeating layers to GPU
llm_load_tensors: offloading non-repeating layers to GPU
llm_load_tensors: offloaded 33/33 layers to GPU
llm_load_tensors:     Vulkan buffer size =  1634.32 MiB
llm_load_tensors:        CPU buffer size =    70.31 MiB
..........................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:     Vulkan KV buffer size =   160.00 MiB
llama_new_context_with_model: KV self size  =  160.00 MiB, K (f16):   80.00 MiB, V (f16):   80.00 MiB
llama_new_context_with_model: Vulkan_Host input buffer size   =     6.01 MiB
llama_new_context_with_model:     Vulkan compute buffer size =   121.00 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size =     5.50 MiB
llama_new_context_with_model: graph splits (measure): 3

system_info: n_threads = 4 / 8 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '<|im_start|>user
'
sampling:
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
sampling order:
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 4


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

<|im_start|>system

<|im_start|>user

> The first President of the United States is George Washington. He took office on April 30, 1789, after being unanimously elected by the Electoral College. George Washington served as the President for two terms and remained in office until his death on February 22, 1799. During his time in office, he laid the foundation for the nation's political system, military leadership, and diplomatic relations.
```

To make this information more accessible, you can use a chart or a visual aid like this one:

[Insert visual aid of George Washington as the first President with a brief description]

https://www.history.com/this-day-in-history/george-washington-is-named-first-president-us-elections
```


> The height of the Eiffel Tower, an iconic landmark in Paris, France, is approximately 330 meters (1,083 feet). However, please note that this information is approximate and can vary slightly depending on factors like the weather conditions or measurement methods. The true height of the tower, including its antennas, is typically stated to be 300.4 meters (984.6 ft).

The Eiffel Tower stands at a height of 330 meters or 1,083 feet from its base to the top of the antenna, which adds about an additional 36 meters (118 ft) above the structure's main tower. The tower is made up of three levels open to the public and offers stunning panoramic views of Paris below.
```

>
	> ./main -m ../phi-2-orange.Q4_K_M.gguf -ngl 33 -i -cml
	Log start
	main: build = 22 (277fad3)
	main: built with clang version 17.0.6 for aarch64-unknown-linux-android24
	main: seed = 1707123971
	ggml_vulkan: Using Mali-G78 \| uma: 1 \| fp16: 1 \| warp size: 16
	llama_model_loader: loaded meta data with 21 key-value pairs and 325 tensors from ../phi-2-orange.Q4_K_M.gguf (version GGUF V3 (latest))
	llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
	llama_model_loader: - kv 0: general.architecture str = phi2
	llama_model_loader: - kv 1: general.name str = Phi2
	llama_model_loader: - kv 2: phi2.context_length u32 = 2048
	llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560
	llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240
	llama_model_loader: - kv 5: phi2.block_count u32 = 32
	llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32
	llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32
	llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010
	llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32
	llama_model_loader: - kv 10: general.file_type u32 = 15
	llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false
	llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2
	llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ...
	llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
	llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",...
	llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256
	llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50295
	llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256
	llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 50256
	llama_model_loader: - kv 20: general.quantization_version u32 = 2
	llama_model_loader: - type f32: 195 tensors
	llama_model_loader: - type q4_K: 81 tensors
	llama_model_loader: - type q5_K: 32 tensors
	llama_model_loader: - type q6_K: 17 tensors
	llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ).
	llm_load_print_meta: format = GGUF V3 (latest)
	llm_load_print_meta: arch = phi2
	llm_load_print_meta: vocab type = BPE
	llm_load_print_meta: n_vocab = 51200
	llm_load_print_meta: n_merges = 50000
	llm_load_print_meta: n_ctx_train = 2048
	llm_load_print_meta: n_embd = 2560
	llm_load_print_meta: n_head = 32
	llm_load_print_meta: n_head_kv = 32
	llm_load_print_meta: n_layer = 32
	llm_load_print_meta: n_rot = 32
	llm_load_print_meta: n_embd_head_k = 80
	llm_load_print_meta: n_embd_head_v = 80
	llm_load_print_meta: n_gqa = 1
	llm_load_print_meta: n_embd_k_gqa = 2560
	llm_load_print_meta: n_embd_v_gqa = 2560
	llm_load_print_meta: f_norm_eps = 1.0e-05
	llm_load_print_meta: f_norm_rms_eps = 0.0e+00
	llm_load_print_meta: f_clamp_kqv = 0.0e+00
	llm_load_print_meta: f_max_alibi_bias = 0.0e+00
	llm_load_print_meta: n_ff = 10240
	llm_load_print_meta: n_expert = 0
	llm_load_print_meta: n_expert_used = 0
	llm_load_print_meta: rope scaling = linear
	llm_load_print_meta: freq_base_train = 10000.0
	llm_load_print_meta: freq_scale_train = 1
	llm_load_print_meta: n_yarn_orig_ctx = 2048
	llm_load_print_meta: rope_finetuned = unknown
	llm_load_print_meta: model type = 3B
	llm_load_print_meta: model ftype = Q4_K - Medium
	llm_load_print_meta: model params = 2.78 B
	llm_load_print_meta: model size = 1.66 GiB (5.14 BPW)
	llm_load_print_meta: general.name = Phi2
	llm_load_print_meta: BOS token = 50256 '<\|endoftext\|>'
	llm_load_print_meta: EOS token = 50295 '<\|im_end\|>'
	llm_load_print_meta: UNK token = 50256 '<\|endoftext\|>'
	llm_load_print_meta: PAD token = 50256 '<\|endoftext\|>'
	llm_load_print_meta: LF token = 128 'Ä'
	llm_load_tensors: ggml ctx size = 0.25 MiB
	llm_load_tensors: offloading 32 repeating layers to GPU
	llm_load_tensors: offloading non-repeating layers to GPU
	llm_load_tensors: offloaded 33/33 layers to GPU
	llm_load_tensors: Vulkan buffer size = 1634.32 MiB
	llm_load_tensors: CPU buffer size = 70.31 MiB
	..........................................................................................
	llama_new_context_with_model: n_ctx = 512
	llama_new_context_with_model: freq_base = 10000.0
	llama_new_context_with_model: freq_scale = 1
	llama_kv_cache_init: Vulkan KV buffer size = 160.00 MiB
	llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB
	llama_new_context_with_model: Vulkan_Host input buffer size = 6.01 MiB
	llama_new_context_with_model: Vulkan compute buffer size = 121.00 MiB
	llama_new_context_with_model: Vulkan_Host compute buffer size = 5.50 MiB
	llama_new_context_with_model: graph splits (measure): 3

	system_info: n_threads = 4 / 8 \| AVX = 0 \| AVX_VNNI = 0 \| AVX2 = 0 \| AVX512 = 0 \| AVX512_VBMI = 0 \| AVX512_VNNI = 0 \| FMA = 0 \| NEON = 1 \| ARM_FMA = 1 \| F16C = 0 \| FP16_VA = 1 \| WASM_SIMD = 0 \| BLAS = 1 \| SSE3 = 0 \| SSSE3 = 0 \| VSX = 0 \|
	main: interactive mode on.
	Reverse prompt: '<\|im_start\|>user
	'
	sampling:
	repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000
	top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800
	mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000
	sampling order:
	CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp
	generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 4


	== Running in interactive mode. ==
	- Press Ctrl+C to interject at any time.
	- Press Return to return control to LLaMa.
	- To return control without starting a new line, end your input with '/'.
	- If you want to submit another line, end your input with '\'.

	<\|im_start\|>system

	<\|im_start\|>user

	> The first President of the United States is George Washington. He took office on April 30, 1789, after being unanimously elected by the Electoral College. George Washington served as the President for two terms and remained in office until his death on February 22, 1799. During his time in office, he laid the foundation for the nation's political system, military leadership, and diplomatic relations.
	```

	To make this information more accessible, you can use a chart or a visual aid like this one:

	[Insert visual aid of George Washington as the first President with a brief description]

	https://www.history.com/this-day-in-history/george-washington-is-named-first-president-us-elections
	```


	> The height of the Eiffel Tower, an iconic landmark in Paris, France, is approximately 330 meters (1,083 feet). However, please note that this information is approximate and can vary slightly depending on factors like the weather conditions or measurement methods. The true height of the tower, including its antennas, is typically stated to be 300.4 meters (984.6 ft).

	The Eiffel Tower stands at a height of 330 meters or 1,083 feet from its base to the top of the antenna, which adds about an additional 36 meters (118 ft) above the structure's main tower. The tower is made up of three levels open to the public and offers stunning panoramic views of Paris below.
	```

	>