Skip to content

Instantly share code, notes, and snippets.

@arafatkatze
Created October 29, 2023 10:43
Show Gist options
  • Save arafatkatze/054cd8c58c4d1fcf1d2ee61c67f736dd to your computer and use it in GitHub Desktop.
Save arafatkatze/054cd8c58c4d1fcf1d2ee61c67f736dd to your computer and use it in GitHub Desktop.
A simple text to show how long it takes for streaming to start based on the prompt length sent to Claude API
from anthropic import Anthropic
import time
anthropic = Anthropic(api_key="claude-api-key")
prompt_lengths = [100, 1000, 10000, 50000]
for prompt_length in prompt_lengths:
# generate a prompt of the desired length
prompt = "Human: " + "cars are great " * (prompt_length // 5) + "write an essay on cars"
prompt += "\n\nAssistant:"
start_time = time.time() # Record the time before the stream starts
stream = anthropic.completions.create(
model="claude-2",
max_tokens_to_sample=100,
prompt=prompt,
stream=True
)
stream_start_time = time.time() # Record the time after the stream starts
text = ""
for data in stream:
diff = data.completion # incremental text
text += diff
# print(diff, end="")
end_time = time.time() # Record the time after the stream ends
print(f"\nFor prompt length {prompt_length}:")
print(f"Time taken to start the stream: {stream_start_time - start_time} seconds")
print(f"Time taken for the stream to finish: {end_time - stream_start_time} seconds")
print("\n")
@arafatkatze
Copy link
Author

Example response from my local testing

 optimize python3 stre.py

For prompt length 100:
Time taken to start the stream: 0.6974811553955078 seconds
Time taken for the stream to finish: 3.8557159900665283 seconds



For prompt length 1000:
Time taken to start the stream: 1.1619987487792969 seconds
Time taken for the stream to finish: 3.1468451023101807 seconds



For prompt length 10000:
Time taken to start the stream: 3.893428087234497 seconds
Time taken for the stream to finish: 3.798938751220703 seconds



For prompt length 50000:
Time taken to start the stream: 19.353917360305786 seconds
Time taken for the stream to finish: 4.143714666366577 seconds

@deepak2431
Copy link

This is an interesting fact @arafatkatze. I didn't know that the prompt length also affects the response time for the resulting output.

@arafatkatze
Copy link
Author

arafatkatze commented Oct 31, 2023

@deepak2431 That's an important one. Its harder to notice in GPT 4 api coz of the token limits being much lower(only 8 K) and also coz its faster but the token size's correlation with time to respond is a thing I have seen in most LLMs I tried.

This is also true for Mistral/LLama2 when I am running them hosted locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment