Created
February 5, 2024 09:18
-
-
Save alex4o/702fa06fdcc716234002459dbf6c3270 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> ./main -m ../phi-2-orange.Q4_K_M.gguf -ngl 33 -i -cml | |
Log start | |
main: build = 22 (277fad3) | |
main: built with clang version 17.0.6 for aarch64-unknown-linux-android24 | |
main: seed = 1707123971 | |
ggml_vulkan: Using Mali-G78 | uma: 1 | fp16: 1 | warp size: 16 | |
llama_model_loader: loaded meta data with 21 key-value pairs and 325 tensors from ../phi-2-orange.Q4_K_M.gguf (version GGUF V3 (latest)) | |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
llama_model_loader: - kv 0: general.architecture str = phi2 | |
llama_model_loader: - kv 1: general.name str = Phi2 | |
llama_model_loader: - kv 2: phi2.context_length u32 = 2048 | |
llama_model_loader: - kv 3: phi2.embedding_length u32 = 2560 | |
llama_model_loader: - kv 4: phi2.feed_forward_length u32 = 10240 | |
llama_model_loader: - kv 5: phi2.block_count u32 = 32 | |
llama_model_loader: - kv 6: phi2.attention.head_count u32 = 32 | |
llama_model_loader: - kv 7: phi2.attention.head_count_kv u32 = 32 | |
llama_model_loader: - kv 8: phi2.attention.layer_norm_epsilon f32 = 0.000010 | |
llama_model_loader: - kv 9: phi2.rope.dimension_count u32 = 32 | |
llama_model_loader: - kv 10: general.file_type u32 = 15 | |
llama_model_loader: - kv 11: tokenizer.ggml.add_bos_token bool = false | |
llama_model_loader: - kv 12: tokenizer.ggml.model str = gpt2 | |
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,51200] = ["!", "\"", "#", "$", "%", "&", "'", ... | |
llama_model_loader: - kv 14: tokenizer.ggml.token_type arr[i32,51200] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ... | |
llama_model_loader: - kv 15: tokenizer.ggml.merges arr[str,50000] = ["Ġ t", "Ġ a", "h e", "i n", "r e",... | |
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 50256 | |
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 50295 | |
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 50256 | |
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 50256 | |
llama_model_loader: - kv 20: general.quantization_version u32 = 2 | |
llama_model_loader: - type f32: 195 tensors | |
llama_model_loader: - type q4_K: 81 tensors | |
llama_model_loader: - type q5_K: 32 tensors | |
llama_model_loader: - type q6_K: 17 tensors | |
llm_load_vocab: mismatch in special tokens definition ( 910/51200 vs 944/51200 ). | |
llm_load_print_meta: format = GGUF V3 (latest) | |
llm_load_print_meta: arch = phi2 | |
llm_load_print_meta: vocab type = BPE | |
llm_load_print_meta: n_vocab = 51200 | |
llm_load_print_meta: n_merges = 50000 | |
llm_load_print_meta: n_ctx_train = 2048 | |
llm_load_print_meta: n_embd = 2560 | |
llm_load_print_meta: n_head = 32 | |
llm_load_print_meta: n_head_kv = 32 | |
llm_load_print_meta: n_layer = 32 | |
llm_load_print_meta: n_rot = 32 | |
llm_load_print_meta: n_embd_head_k = 80 | |
llm_load_print_meta: n_embd_head_v = 80 | |
llm_load_print_meta: n_gqa = 1 | |
llm_load_print_meta: n_embd_k_gqa = 2560 | |
llm_load_print_meta: n_embd_v_gqa = 2560 | |
llm_load_print_meta: f_norm_eps = 1.0e-05 | |
llm_load_print_meta: f_norm_rms_eps = 0.0e+00 | |
llm_load_print_meta: f_clamp_kqv = 0.0e+00 | |
llm_load_print_meta: f_max_alibi_bias = 0.0e+00 | |
llm_load_print_meta: n_ff = 10240 | |
llm_load_print_meta: n_expert = 0 | |
llm_load_print_meta: n_expert_used = 0 | |
llm_load_print_meta: rope scaling = linear | |
llm_load_print_meta: freq_base_train = 10000.0 | |
llm_load_print_meta: freq_scale_train = 1 | |
llm_load_print_meta: n_yarn_orig_ctx = 2048 | |
llm_load_print_meta: rope_finetuned = unknown | |
llm_load_print_meta: model type = 3B | |
llm_load_print_meta: model ftype = Q4_K - Medium | |
llm_load_print_meta: model params = 2.78 B | |
llm_load_print_meta: model size = 1.66 GiB (5.14 BPW) | |
llm_load_print_meta: general.name = Phi2 | |
llm_load_print_meta: BOS token = 50256 '<|endoftext|>' | |
llm_load_print_meta: EOS token = 50295 '<|im_end|>' | |
llm_load_print_meta: UNK token = 50256 '<|endoftext|>' | |
llm_load_print_meta: PAD token = 50256 '<|endoftext|>' | |
llm_load_print_meta: LF token = 128 'Ä' | |
llm_load_tensors: ggml ctx size = 0.25 MiB | |
llm_load_tensors: offloading 32 repeating layers to GPU | |
llm_load_tensors: offloading non-repeating layers to GPU | |
llm_load_tensors: offloaded 33/33 layers to GPU | |
llm_load_tensors: Vulkan buffer size = 1634.32 MiB | |
llm_load_tensors: CPU buffer size = 70.31 MiB | |
.......................................................................................... | |
llama_new_context_with_model: n_ctx = 512 | |
llama_new_context_with_model: freq_base = 10000.0 | |
llama_new_context_with_model: freq_scale = 1 | |
llama_kv_cache_init: Vulkan KV buffer size = 160.00 MiB | |
llama_new_context_with_model: KV self size = 160.00 MiB, K (f16): 80.00 MiB, V (f16): 80.00 MiB | |
llama_new_context_with_model: Vulkan_Host input buffer size = 6.01 MiB | |
llama_new_context_with_model: Vulkan compute buffer size = 121.00 MiB | |
llama_new_context_with_model: Vulkan_Host compute buffer size = 5.50 MiB | |
llama_new_context_with_model: graph splits (measure): 3 | |
system_info: n_threads = 4 / 8 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | | |
main: interactive mode on. | |
Reverse prompt: '<|im_start|>user | |
' | |
sampling: | |
repeat_last_n = 64, repeat_penalty = 1.100, frequency_penalty = 0.000, presence_penalty = 0.000 | |
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 | |
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 | |
sampling order: | |
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temp | |
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 4 | |
== Running in interactive mode. == | |
- Press Ctrl+C to interject at any time. | |
- Press Return to return control to LLaMa. | |
- To return control without starting a new line, end your input with '/'. | |
- If you want to submit another line, end your input with '\'. | |
<|im_start|>system | |
<|im_start|>user | |
> The first President of the United States is George Washington. He took office on April 30, 1789, after being unanimously elected by the Electoral College. George Washington served as the President for two terms and remained in office until his death on February 22, 1799. During his time in office, he laid the foundation for the nation's political system, military leadership, and diplomatic relations. | |
``` | |
To make this information more accessible, you can use a chart or a visual aid like this one: | |
[Insert visual aid of George Washington as the first President with a brief description] | |
https://www.history.com/this-day-in-history/george-washington-is-named-first-president-us-elections | |
``` | |
> The height of the Eiffel Tower, an iconic landmark in Paris, France, is approximately 330 meters (1,083 feet). However, please note that this information is approximate and can vary slightly depending on factors like the weather conditions or measurement methods. The true height of the tower, including its antennas, is typically stated to be 300.4 meters (984.6 ft). | |
The Eiffel Tower stands at a height of 330 meters or 1,083 feet from its base to the top of the antenna, which adds about an additional 36 meters (118 ft) above the structure's main tower. The tower is made up of three levels open to the public and offers stunning panoramic views of Paris below. | |
``` | |
> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment