-
-
Save lucasavila00/0155f94fbf13e988384af53af8841b0f to your computer and use it in GitHub Desktop.
log.sh
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lucas@desktop:~/oss/llama.cpp$ ./main -m /home/lucas/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q4_K_M.gguf -p "Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello" -n 1 -e | |
Log start | |
main: build = 2679 (7593639c) | |
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu | |
main: seed = 1714261503 | |
llama_model_loader: loaded meta data with 24 key-value pairs and 291 tensors from /home/lucas/.cache/huggingface/hub/models--TheBloke--Mistral-7B-Instruct-v0.2-GGUF/snapshots/3a6fbf4a41a1d52e415a4958cde6856d34b2db93/mistral-7b-instruct-v0.2.Q4_K_M.gguf (version GGUF V3 (latest)) | |
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output. | |
llama_model_loader: - kv 0: general.architecture str = llama | |
llama_model_loader: - kv 1: general.name str = mistralai_mistral-7b-instruct-v0.2 | |
llama_model_loader: - kv 2: llama.context_length u32 = 32768 | |
llama_model_loader: - kv 3: llama.embedding_length u32 = 4096 | |
llama_model_loader: - kv 4: llama.block_count u32 = 32 | |
llama_model_loader: - kv 5: llama.feed_forward_length u32 = 14336 | |
llama_model_loader: - kv 6: llama.rope.dimension_count u32 = 128 | |
llama_model_loader: - kv 7: llama.attention.head_count u32 = 32 | |
llama_model_loader: - kv 8: llama.attention.head_count_kv u32 = 8 | |
llama_model_loader: - kv 9: llama.attention.layer_norm_rms_epsilon f32 = 0,000010 | |
llama_model_loader: - kv 10: llama.rope.freq_base f32 = 1000000,000000 | |
llama_model_loader: - kv 11: general.file_type u32 = 15 | |
llama_model_loader: - kv 12: tokenizer.ggml.model str = llama | |
llama_model_loader: - kv 13: tokenizer.ggml.tokens arr[str,32000] = ["<unk>", "<s>", "</s>", "<0x00>", "<... | |
llama_model_loader: - kv 14: tokenizer.ggml.scores arr[f32,32000] = [0,000000, 0,000000, 0,000000, 0,0000... | |
llama_model_loader: - kv 15: tokenizer.ggml.token_type arr[i32,32000] = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ... | |
llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 1 | |
llama_model_loader: - kv 17: tokenizer.ggml.eos_token_id u32 = 2 | |
llama_model_loader: - kv 18: tokenizer.ggml.unknown_token_id u32 = 0 | |
llama_model_loader: - kv 19: tokenizer.ggml.padding_token_id u32 = 0 | |
llama_model_loader: - kv 20: tokenizer.ggml.add_bos_token bool = true | |
llama_model_loader: - kv 21: tokenizer.ggml.add_eos_token bool = false | |
llama_model_loader: - kv 22: tokenizer.chat_template str = {{ bos_token }}{% for message in mess... | |
llama_model_loader: - kv 23: general.quantization_version u32 = 2 | |
llama_model_loader: - type f32: 65 tensors | |
llama_model_loader: - type q4_K: 193 tensors | |
llama_model_loader: - type q6_K: 33 tensors | |
llm_load_vocab: special tokens definition check successful ( 259/32000 ). | |
llm_load_print_meta: format = GGUF V3 (latest) | |
llm_load_print_meta: arch = llama | |
llm_load_print_meta: vocab type = SPM | |
llm_load_print_meta: n_vocab = 32000 | |
llm_load_print_meta: n_merges = 0 | |
llm_load_print_meta: n_ctx_train = 32768 | |
llm_load_print_meta: n_embd = 4096 | |
llm_load_print_meta: n_head = 32 | |
llm_load_print_meta: n_head_kv = 8 | |
llm_load_print_meta: n_layer = 32 | |
llm_load_print_meta: n_rot = 128 | |
llm_load_print_meta: n_embd_head_k = 128 | |
llm_load_print_meta: n_embd_head_v = 128 | |
llm_load_print_meta: n_gqa = 4 | |
llm_load_print_meta: n_embd_k_gqa = 1024 | |
llm_load_print_meta: n_embd_v_gqa = 1024 | |
llm_load_print_meta: f_norm_eps = 0,0e+00 | |
llm_load_print_meta: f_norm_rms_eps = 1,0e-05 | |
llm_load_print_meta: f_clamp_kqv = 0,0e+00 | |
llm_load_print_meta: f_max_alibi_bias = 0,0e+00 | |
llm_load_print_meta: f_logit_scale = 0,0e+00 | |
llm_load_print_meta: n_ff = 14336 | |
llm_load_print_meta: n_expert = 0 | |
llm_load_print_meta: n_expert_used = 0 | |
llm_load_print_meta: causal attn = 1 | |
llm_load_print_meta: pooling type = 0 | |
llm_load_print_meta: rope type = 0 | |
llm_load_print_meta: rope scaling = linear | |
llm_load_print_meta: freq_base_train = 1000000,0 | |
llm_load_print_meta: freq_scale_train = 1 | |
llm_load_print_meta: n_yarn_orig_ctx = 32768 | |
llm_load_print_meta: rope_finetuned = unknown | |
llm_load_print_meta: ssm_d_conv = 0 | |
llm_load_print_meta: ssm_d_inner = 0 | |
llm_load_print_meta: ssm_d_state = 0 | |
llm_load_print_meta: ssm_dt_rank = 0 | |
llm_load_print_meta: model type = 7B | |
llm_load_print_meta: model ftype = Q4_K - Medium | |
llm_load_print_meta: model params = 7,24 B | |
llm_load_print_meta: model size = 4,07 GiB (4,83 BPW) | |
llm_load_print_meta: general.name = mistralai_mistral-7b-instruct-v0.2 | |
llm_load_print_meta: BOS token = 1 '<s>' | |
llm_load_print_meta: EOS token = 2 '</s>' | |
llm_load_print_meta: UNK token = 0 '<unk>' | |
llm_load_print_meta: PAD token = 0 '<unk>' | |
llm_load_print_meta: LF token = 13 '<0x0A>' | |
ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no | |
ggml_cuda_init: CUDA_USE_TENSOR_CORES: yes | |
ggml_cuda_init: found 1 CUDA devices: | |
Device 0: NVIDIA GeForce RTX 2070, compute capability 7.5, VMM: yes | |
llm_load_tensors: ggml ctx size = 0,11 MiB | |
llm_load_tensors: offloading 0 repeating layers to GPU | |
llm_load_tensors: offloaded 0/33 layers to GPU | |
llm_load_tensors: CPU buffer size = 4165,37 MiB | |
................................................................................................. | |
llama_new_context_with_model: n_ctx = 512 | |
llama_new_context_with_model: n_batch = 512 | |
llama_new_context_with_model: n_ubatch = 512 | |
llama_new_context_with_model: freq_base = 1000000,0 | |
llama_new_context_with_model: freq_scale = 1 | |
llama_kv_cache_init: CUDA_Host KV buffer size = 64,00 MiB | |
llama_new_context_with_model: KV self size = 64,00 MiB, K (f16): 32,00 MiB, V (f16): 32,00 MiB | |
llama_new_context_with_model: CUDA_Host output buffer size = 0,12 MiB | |
llama_new_context_with_model: CUDA0 compute buffer size = 181,04 MiB | |
llama_new_context_with_model: CUDA_Host compute buffer size = 9,01 MiB | |
llama_new_context_with_model: graph nodes = 1030 | |
llama_new_context_with_model: graph splits = 356 | |
system_info: n_threads = 8 / 16 | AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | | |
sampling: | |
repeat_last_n = 64, repeat_penalty = 1,000, frequency_penalty = 0,000, presence_penalty = 0,000 | |
top_k = 40, tfs_z = 1,000, top_p = 0,950, min_p = 0,050, typical_p = 1,000, temp = 0,800 | |
mirostat = 0, mirostat_lr = 0,100, mirostat_ent = 5,000 | |
sampling order: | |
CFG -> Penalties -> top_k -> tfs_z -> typical_p -> top_p -> min_p -> temperature | |
generate: n_ctx = 512, n_batch = 2048, n_predict = 1, n_keep = 1 | |
Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello | |
llama_print_timings: load time = 710,59 ms | |
llama_print_timings: sample time = 0,03 ms / 1 runs ( 0,03 ms per token, 34482,76 tokens per second) | |
llama_print_timings: prompt eval time = 706,45 ms / 436 tokens ( 1,62 ms per token, 617,17 tokens per second) | |
llama_print_timings: eval time = 0,00 ms / 1 runs ( 0,00 ms per token, inf tokens per second) | |
llama_print_timings: total time = 707,92 ms / 437 tokens | |
Log end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
lucas@desktop:~/oss/mistral.rs$ RUST_LOG=debug ./target/profiling/mistralrs-server -i gguf -t mistralai/Mistral-7B-Instruct-v0.1 -m TheBloke/Mistral-7B-Instruct-v0.1-GGUF -f mistral-7b-instruct-v0.1.Q4_K_M.gguf | |
2024-04-27T23:46:32.844504Z INFO mistralrs_server: avx: true, neon: false, simd128: false, f16c: true | |
2024-04-27T23:46:32.844528Z INFO mistralrs_server: Sampling method: penalties -> temperature -> topk -> topp -> multinomial | |
2024-04-27T23:46:32.844532Z INFO mistralrs_server: Loading model `mistralai/Mistral-7B-Instruct-v0.1` on Cuda(CudaDevice(DeviceId(1)))... | |
2024-04-27T23:46:32.844551Z INFO mistralrs_server: Model kind is: quantized from gguf (no adapters) | |
2024-04-27T23:46:34.555249Z INFO mistralrs_core::pipeline::chat_template: bos_tok = <s>, eos_tok = ["</s>"], unk_tok = <unk> | |
2024-04-27T23:46:34.584997Z INFO mistralrs_server: Model loaded. | |
2024-04-27T23:46:34.585159Z INFO mistralrs_server::interactive_mode: Starting interactive loop with sampling params: SamplingParams { temperature: Some(0.1), top_k: Some(32), top_p: Some(0.1), top_n_logprobs: 0, frequency_penalty: Some(0.1), presence_penalty: Some(0.1), stop_toks: None, max_len: Some(4096), logits_bias: None, n_choices: 1 } | |
> Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello Hello | |
2024-04-27T23:46:36.882094Z INFO mistralrs_core::engine: Prompt[445] Completion[] - 765ms | |
2024-04-27T23:46:36.938553Z INFO mistralrs_core::engine: Prompt[] Completion[446] - 56ms | |
Hi there!2024-04-27T23:46:36.964491Z INFO mistralrs_core::engine: Prompt[] Completion[447] - 25ms | |
2024-04-27T23:46:36.988795Z INFO mistralrs_core::engine: Prompt[] Completion[448] - 24ms | |
2024-04-27T23:46:37.015216Z INFO mistralrs_core::engine: Prompt[] Completion[449] - 26ms | |
How can I2024-04-27T23:46:37.042765Z INFO mistralrs_core::engine: Prompt[] Completion[450] - 27ms | |
2024-04-27T23:46:37.067985Z INFO mistralrs_core::engine: Prompt[] Completion[451] - 25ms | |
2024-04-27T23:46:37.095415Z INFO mistralrs_core::engine: Prompt[] Completion[452] - 27ms | |
assist you today2024-04-27T23:46:37.121790Z INFO mistralrs_core::engine: Prompt[] Completion[453] - 26ms | |
2024-04-27T23:46:37.151508Z INFO mistralrs_core::engine: Prompt[] Completion[454] - 29ms | |
? | |
> 2024-04-27T23:46:37.234602Z INFO mistralrs_core::engine: Prompt[] Completion[455] - 83ms |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment