Skip to content

Instantly share code, notes, and snippets.

@AIWintermuteAI
Created January 6, 2024 14:11
Show Gist options
  • Save AIWintermuteAI/d80add87a69acf20a9fe81ad40f6b4e3 to your computer and use it in GitHub Desktop.
Save AIWintermuteAI/d80add87a69acf20a9fe81ad40f6b4e3 to your computer and use it in GitHub Desktop.
TinyLLaMA Extras
./main -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -n 500 --ignore-eos -f prompts/chat-dishes.txt
./main -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3 -n 500 --ignore-eos -f prompts/chat-dishes.txt

no BLAS 4 threads
llama_print_timings:        load time =     459.67 ms
llama_print_timings:      sample time =     251.73 ms /   500 runs   (    0.50 ms per token,  1986.24 tokens per second)
llama_print_timings: prompt eval time =   10175.15 ms /    68 tokens (  149.63 ms per token,     6.68 tokens per second)
llama_print_timings:        eval time =  133404.92 ms /   499 runs   (  267.34 ms per token,     3.74 tokens per second)
llama_print_timings:       total time =  144601.38 ms

no BLAS 3 threads
llama_print_timings:        load time =     523.47 ms
llama_print_timings:      sample time =     246.28 ms /   500 runs   (    0.49 ms per token,  2030.22 tokens per second)
llama_print_timings: prompt eval time =   12365.74 ms /    68 tokens (  181.85 ms per token,     5.50 tokens per second)
llama_print_timings:        eval time =  117291.59 ms /   499 runs   (  235.05 ms per token,     4.25 tokens per second)
llama_print_timings:       total time =  130545.96 ms

BLAS 4 threads
llama_print_timings:        load time =     541.36 ms
llama_print_timings:      sample time =     257.49 ms /   500 runs   (    0.51 ms per token,  1941.80 tokens per second)
llama_print_timings: prompt eval time =   16855.31 ms /    68 tokens (  247.87 ms per token,     4.03 tokens per second)
llama_print_timings:        eval time =  132333.06 ms /   499 runs   (  265.20 ms per token,     3.77 tokens per second)
llama_print_timings:       total time =  150086.40 ms

BLAS 3 threads
llama_print_timings:        load time =     508.27 ms
llama_print_timings:      sample time =     247.93 ms /   500 runs   (    0.50 ms per token,  2016.73 tokens per second)
llama_print_timings: prompt eval time =   16314.19 ms /    68 tokens (  239.91 ms per token,     4.17 tokens per second)
llama_print_timings:        eval time =  117396.94 ms /   499 runs   (  235.26 ms per token,     4.25 tokens per second)
llama_print_timings:       total time =  134640.20 ms
./lookup -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3 -n 128 --ignore-eos -f prompts/chat-summary.txt --draft 2 --color
decoded  129 tokens in   33.575 seconds, speed:    3.842 t/s
./lookup -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3 -n 128 --ignore-eos -f prompts/chat-summary.txt --draft 3 --color
decoded  129 tokens in   36.597 seconds, speed:    3.525 t/s

Q4_K_M
llama_print_timings:        load time =     432.79 ms
llama_print_timings:      sample time =     242.93 ms /   500 runs   (    0.49 ms per token,  2058.18 tokens per second)
llama_print_timings: prompt eval time =   15477.78 ms /    68 tokens (  227.61 ms per token,     4.39 tokens per second)
llama_print_timings:        eval time =  104818.23 ms /   499 runs   (  210.06 ms per token,     4.76 tokens per second)
llama_print_timings:       total time =  120907.47 ms

Q4_K_S
llama_print_timings:        load time =     417.70 ms
llama_print_timings:      sample time =     240.47 ms /   500 runs   (    0.48 ms per token,  2079.24 tokens per second)
llama_print_timings: prompt eval time =   14911.31 ms /    68 tokens (  219.28 ms per token,     4.56 tokens per second)
llama_print_timings:        eval time =  101519.09 ms /   499 runs   (  203.45 ms per token,     4.92 tokens per second)
llama_print_timings:       total time =  117041.52 ms

Launch inference server with:

./server -m models/tinyllama-1.1b-chat-v1.0.Q5_K_M.gguf -t 3
This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.
User: Write me an extremely detailed description of the 10 best ethnic dishes.
Llama:
This is a conversation between User and Llama, a friendly chatbot. Llama is helpful, kind, honest, good at writing, and never fails to answer any requests immediately and with precision.
Indian cuisine consists of a variety of regional and traditional cuisines native to the Indian subcontinent. Given the diversity in soil, climate, culture, ethnic groups, and occupations, these cuisines vary substantially and use locally available spices, herbs, vegetables, and fruits.
Indian food is also heavily influenced by religion, in particular Hinduism and Islam, cultural choices and traditions.[1][2]
Historical events such as invasions, trade relations, and colonialism have played a role in introducing certain foods to this country. The Columbian discovery of the New World brought a number of new vegetables and fruit to India. A number of these such as potatoes, tomatoes, chillies, peanuts, and guava have become staples in many regions of India.
User: Write me a summary of the above text. Use the words from the text.
Llama:
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment