AIWintermuteAI/benchmark_results.md

## benchmark_results.md

      
    Raw
  

              benchmark_results.md
            
          
    ./main -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -n 500 --ignore-eos -f prompts/chat-dishes.txt
./main -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3 -n 500 --ignore-eos -f prompts/chat-dishes.txt


no BLAS 4 threads
llama_print_timings:        load time =     459.67 ms
llama_print_timings:      sample time =     251.73 ms /   500 runs   (    0.50 ms per token,  1986.24 tokens per second)
llama_print_timings: prompt eval time =   10175.15 ms /    68 tokens (  149.63 ms per token,     6.68 tokens per second)
llama_print_timings:        eval time =  133404.92 ms /   499 runs   (  267.34 ms per token,     3.74 tokens per second)
llama_print_timings:       total time =  144601.38 ms

no BLAS 3 threads
llama_print_timings:        load time =     523.47 ms
llama_print_timings:      sample time =     246.28 ms /   500 runs   (    0.49 ms per token,  2030.22 tokens per second)
llama_print_timings: prompt eval time =   12365.74 ms /    68 tokens (  181.85 ms per token,     5.50 tokens per second)
llama_print_timings:        eval time =  117291.59 ms /   499 runs   (  235.05 ms per token,     4.25 tokens per second)
llama_print_timings:       total time =  130545.96 ms

BLAS 4 threads
llama_print_timings:        load time =     541.36 ms
llama_print_timings:      sample time =     257.49 ms /   500 runs   (    0.51 ms per token,  1941.80 tokens per second)
llama_print_timings: prompt eval time =   16855.31 ms /    68 tokens (  247.87 ms per token,     4.03 tokens per second)
llama_print_timings:        eval time =  132333.06 ms /   499 runs   (  265.20 ms per token,     3.77 tokens per second)
llama_print_timings:       total time =  150086.40 ms

BLAS 3 threads
llama_print_timings:        load time =     508.27 ms
llama_print_timings:      sample time =     247.93 ms /   500 runs   (    0.50 ms per token,  2016.73 tokens per second)
llama_print_timings: prompt eval time =   16314.19 ms /    68 tokens (  239.91 ms per token,     4.17 tokens per second)
llama_print_timings:        eval time =  117396.94 ms /   499 runs   (  235.26 ms per token,     4.25 tokens per second)
llama_print_timings:       total time =  134640.20 ms

./lookup -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3 -n 128 --ignore-eos -f prompts/chat-summary.txt --draft 2 --color
decoded  129 tokens in   33.575 seconds, speed:    3.842 t/s
./lookup -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3 -n 128 --ignore-eos -f prompts/chat-summary.txt --draft 3 --color
decoded  129 tokens in   36.597 seconds, speed:    3.525 t/s


Q4_K_M
llama_print_timings:        load time =     432.79 ms
llama_print_timings:      sample time =     242.93 ms /   500 runs   (    0.49 ms per token,  2058.18 tokens per second)
llama_print_timings: prompt eval time =   15477.78 ms /    68 tokens (  227.61 ms per token,     4.39 tokens per second)
llama_print_timings:        eval time =  104818.23 ms /   499 runs   (  210.06 ms per token,     4.76 tokens per second)
llama_print_timings:       total time =  120907.47 ms

Q4_K_S
llama_print_timings:        load time =     417.70 ms
llama_print_timings:      sample time =     240.47 ms /   500 runs   (    0.48 ms per token,  2079.24 tokens per second)
llama_print_timings: prompt eval time =   14911.31 ms /    68 tokens (  219.28 ms per token,     4.56 tokens per second)
llama_print_timings:        eval time =  101519.09 ms /   499 runs   (  203.45 ms per token,     4.92 tokens per second)
llama_print_timings:       total time =  117041.52 ms

Launch inference server with:
./server -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3


## chat-dishes.txt
This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.
User: Write me an extremely detailed description of the 10 best ethnic dishes.
Llama:

## chat-summary.txt
This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.
Indian cuisine consists of a variety of regional and traditional cuisines native to the Indian subcontinent. Given the diversity in soil, climate, culture, ethnic groups, and occupations, these cuisines vary substantially and use locally available spices, herbs, vegetables, and fruits.
Indian food is also heavily influenced by religion, in particular Hinduism and Islam, cultural choices and traditions.[1][2]
Historical events such as invasions, trade relations, and colonialism have played a role in introducing certain foods to this country. The Columbian discovery of the New World brought a number of new vegetables and fruit to India. A number of these such as potatoes, tomatoes, chillies, peanuts, and guava have become staples in many regions of India.
User: Write me a summary of the above text. Use the words from the text.
Llama:
	This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.
	User: Write me an extremely detailed description of the 10 best ethnic dishes.
	Llama:
	This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.
	Indian cuisine consists of a variety of regional and traditional cuisines native to the Indian subcontinent. Given the diversity in soil, climate, culture, ethnic groups, and occupations, these cuisines vary substantially and use locally available spices, herbs, vegetables, and fruits.
	Indian food is also heavily influenced by religion, in particular Hinduism and Islam, cultural choices and traditions.[1][2]
	Historical events such as invasions, trade relations, and colonialism have played a role in introducing certain foods to this country. The Columbian discovery of the New World brought a number of new vegetables and fruit to India. A number of these such as potatoes, tomatoes, chillies, peanuts, and guava have become staples in many regions of India.
	User: Write me a summary of the above text. Use the words from the text.
	Llama: