Created
May 10, 2024 10:53
-
-
Save heiner/1e036f5714718e404392b384bd0e2ac3 to your computer and use it in GitHub Desktop.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
$ ./build/bin/main -m grok.bin -p "I believe the meaning of life is" -s 2 -n 10 -ngl 0 | |
Log start | |
main: build = 2788 (8a72b3d4) | |
main: built with Apple clang version 15.0.0 (clang-1500.3.9.4) for arm64-apple-darwin23.4.0 | |
main: seed = 2 | |
llama_model_loader: loaded meta data with 18 key-value pairs and 770 tensors from grok.bin (version GGUF V3 (latest)) | |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
llama_model_loader: - kv 0: general.architecture str = grok | |
llama_model_loader: - kv 1: general.name str = grok | |
llama_model_loader: - kv 2: grok.vocab_size u32 = 131072 | |
llama_model_loader: - kv 3: grok.context_length u32 = 8192 | |
llama_model_loader: - kv 4: grok.embedding_length u32 = 6144 | |
llama_model_loader: - kv 5: grok.block_count u32 = 64 | |
llama_model_loader: - kv 6: grok.feed_forward_length u32 = 32768 | |
llama_model_loader: - kv 7: grok.rope.dimension_count u32 = 128 | |
llama_model_loader: - kv 8: grok.attention.head_count u32 = 48 | |
llama_model_loader: - kv 9: grok.attention.head_count_kv u32 = 8 | |
llama_model_loader: - kv 10: grok.expert_count u32 = 8 | |
llama_model_loader: - kv 11: grok.expert_used_count u32 = 2 | |
llama_model_loader: - kv 12: grok.attention.layer_norm_rms_epsilon f32 = 0.000010 | |
llama_model_loader: - kv 13: grok.rope.freq_base f32 = 10000.000000 | |
llama_model_loader: - kv 14: tokenizer.ggml.model str = llama | |
llama_model_loader: - kv 15: tokenizer.ggml.tokens arr[str,131072] = ["[PAD]", "[BOS]", "[EOS]", "[UNK]", ... | |
llama_model_loader: - kv 16: tokenizer.ggml.scores arr[f32,131072] = [0.000000, 0.000000, 0.000000, 0.0000... | |
llama_model_loader: - kv 17: tokenizer.ggml.token_type arr[i32,131072] = [3, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, ... | |
llama_model_loader: - type f32: 257 tensors | |
llama_model_loader: - type q4_0: 513 tensors | |
llm_load_vocab: mismatch in special tokens definition ( 284/131072 vs 260/131072 ). | |
llm_load_print_meta: format = GGUF V3 (latest) | |
llm_load_print_meta: arch = grok | |
llm_load_print_meta: vocab type = SPM | |
llm_load_print_meta: n_vocab = 131072 | |
llm_load_print_meta: n_merges = 0 | |
llm_load_print_meta: n_ctx_train = 8192 | |
llm_load_print_meta: n_embd = 6144 | |
llm_load_print_meta: n_head = 48 | |
llm_load_print_meta: n_head_kv = 8 | |
llm_load_print_meta: n_layer = 64 | |
llm_load_print_meta: n_rot = 128 | |
llm_load_print_meta: n_embd_head_k = 128 | |
llm_load_print_meta: n_embd_head_v = 128 | |
llm_load_print_meta: n_gqa = 6 | |
llm_load_print_meta: n_embd_k_gqa = 1024 | |
llm_load_print_meta: n_embd_v_gqa = 1024 | |
llm_load_print_meta: f_norm_eps = 0.0e+00 | |
llm_load_print_meta: f_norm_rms_eps = 1.0e-05 | |
llm_load_print_meta: f_clamp_kqv = 0.0e+00 | |
llm_load_print_meta: f_max_alibi_bias = 0.0e+00 | |
llm_load_print_meta: f_logit_scale = 0.0e+00 | |
llm_load_print_meta: n_ff = 32768 | |
llm_load_print_meta: n_expert = 8 | |
llm_load_print_meta: n_expert_used = 2 | |
llm_load_print_meta: causal attn = 1 | |
llm_load_print_meta: pooling type = 0 | |
llm_load_print_meta: rope type = 2 | |
llm_load_print_meta: rope scaling = linear | |
llm_load_print_meta: freq_base_train = 10000.0 | |
llm_load_print_meta: freq_scale_train = 1 | |
llm_load_print_meta: n_yarn_orig_ctx = 8192 | |
llm_load_print_meta: rope_finetuned = unknown | |
llm_load_print_meta: ssm_d_conv = 0 | |
llm_load_print_meta: ssm_d_inner = 0 | |
llm_load_print_meta: ssm_d_state = 0 | |
llm_load_print_meta: ssm_dt_rank = 0 | |
llm_load_print_meta: model type = 314B | |
llm_load_print_meta: model ftype = Q4_0 (guessed) | |
llm_load_print_meta: model params = 315.68 B | |
llm_load_print_meta: model size = 165.38 GiB (4.50 BPW) | |
llm_load_print_meta: general.name = grok | |
llm_load_print_meta: BOS token = 1 '[BOS]' | |
llm_load_print_meta: EOS token = 2 '[EOS]' | |
llm_load_print_meta: UNK token = 0 '[PAD]' | |
llm_load_print_meta: LF token = 79 '<0x0A>' | |
llm_load_tensors: ggml ctx size = 0.37 MiB | |
llm_load_tensors: offloading 0 repeating layers to GPU | |
llm_load_tensors: offloaded 0/65 layers to GPU | |
llm_load_tensors: CPU buffer size = 169351.71 MiB | |
.................................................................................................... | |
llama_new_context_with_model: n_ctx = 512 | |
llama_new_context_with_model: n_batch = 512 | |
llama_new_context_with_model: n_ubatch = 512 | |
llama_new_context_with_model: flash_attn = 0 | |
llama_new_context_with_model: freq_base = 10000.0 | |
llama_new_context_with_model: freq_scale = 1 | |
llama_kv_cache_init: CPU KV buffer size = 128.00 MiB | |
llama_new_context_with_model: KV self size = 128.00 MiB, K (f16): 64.00 MiB, V (f16): 64.00 MiB | |
llama_new_context_with_model: CPU output buffer size = 0.50 MiB | |
llama_new_context_with_model: CPU compute buffer size = 293.01 MiB | |
llama_new_context_with_model: graph nodes = 3464 | |
llama_new_context_with_model: graph splits = 1 | |
system_info: n_threads = 6 / 12 | AVX = 0 | AVX_VNNI = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | SSSE3 = 0 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | | |
sampling: | |
repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 | |
top_k = 40, tfs_z = 1.000, top_p = 0.950, min_p = 0.050, typical_p = 1.000, temp = 0.800 | |
mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 | |
sampling order: | |
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature | |
generate: n_ctx = 512, n_batch = 2048, n_predict = 10, n_keep = 1 | |
[BOS] I believe the meaning of life is important it could possibly the general of the general I | |
llama_print_timings: load time = 77678.27 ms | |
llama_print_timings: sample time = 0.85 ms / 10 runs ( 0.08 ms per token, 11806.38 tokens per second) | |
llama_print_timings: prompt eval time = 193039.14 ms / 8 tokens (24129.89 ms per token, 0.04 tokens per second) | |
llama_print_timings: eval time = 553967.90 ms / 9 runs (61551.99 ms per token, 0.02 tokens per second) | |
llama_print_timings: total time = 747018.61 ms / 17 tokens | |
Log end |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment