fullstackwebdev/gist:a89ad8522cc01fb409f229f186216773 Secret

## gistfile1.txt

Initialized litellm callbacks, Async Success Callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f1a09f63760>, <litellm.proxy.hooks.tpm_rpm_limiter._PROXY_MaxTPMRPMLimiter object at 0x7f1a09f63790>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f1a09f63850>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f1a09f63880>, <litellm._service_logger.ServiceLogging object at 0x7f1a05f21780>, <bound method ProxyLogging.response_taking_too_long_callback of <litellm.proxy.utils.ProxyLogging object at 0x7f1a09f63280>>]
self.optional_params: {}
litellm.cache: None
kwargs[caching]: False; litellm.cache: None
Final returned optional params: {'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'extra_body': {}}
self.optional_params: {'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'extra_body': {}}
RAW RESPONSE:
<coroutine object OpenAIChatCompletion.acompletion at 0x7f1a05c68890>


04:41:59 - LiteLLM:INFO: utils.py:1133 -

POST Request Sent from LiteLLM:
curl -X POST \
https://openrouter.ai/api/v1/ \
-H 'Authorization: Bearer sk-or-v1-cb5d631007ed554a0e649850d1fe81191da263f4c681********************' \
-d '{'model': 'meta-llama/llama-3-70b-instruct:nitro', 'messages': [{'role': 'user', 'content': "Baleen is a scalable multi-hop reasoning system that improves accuracy and robustness in open-domain questions. It uses a condensed retrieval architecture, where facts from each hop are summarized into a short context for subsequent hops, and a focused late interaction passage retriever (FLIPR) to tackle query complexity. This approach allows Baleen to achieve 96.3% answer recall in the top-20 retrieved passages on the HotPotQA benchmark and outperform the official baseline on the HoVer task by over 30 points in retrieval accuracy.\n\n\n\nProcess and generate Markdown\nConvert the provided text into a markdown format using <details> and <summary> tags to create a hierarchical structure. Organize the text by paragraphs or thematic sections, each as a root-level collapsible block. Within each block, sentence by sentence, each sentence or related groups of sentences nested in collapsible sections. Limit nesting to a maximum of two or three levels to avoid recursive runaway, but not just one. Use paragraphs as clues to start and stop a nesting. Each <summary> tag should contain a keyword or short two word phrase in the spirit of the content of the <details>.  <details> contains the sentence exactly word for word. Do each sentence exactly. This method creates a manageable hierarchical document, with each root details having at least one or more children, with the depth of nesting clearly controlled to prevent infinite recursion.\n\n\nStart with a ```markdown and then wrap the whole thing in <detail><summary>{subject of paper}</summary>...... text and child <detail's.  Remember every sentence is wrapped with a <d><s>keyword</s>content</d> pattern, and inside any sentence highlight keyword\n\nPlease note process the entire text in full details do not skip any sentence or paragraph."}], 'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'extra_body': {'transforms': []}, 'extra_headers': {'HTTP-Referer': 'https://litellm.ai', 'X-Title': 'liteLLM'}}'


RAW RESPONSE:
{"id": null, "choices": null, "created": null, "model": null, "object": null, "system_fingerprint": null, "usage": null, "error": {"message": "This endpoint's maximum context length is 8192 tokens. However, you requested about 12490 tokens (490 in the input, 12000. Please reduce the length of either one, or use the \"middle-out\" transform to compress your prompt automatically.", "code": 400}}


Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
get cache: cache key: 04-41:cooldown_models; local_only: False
get cache: cache result: None
set cache: key: 04-41:cooldown_models; value: ['1873cd978ca45b20666fb36820f14d2de4ea9895370eb897a28d864f835c118f']
InMemoryCache: set_cache
Inside Max Parallel Request Failure Hook
user_api_key: asdf
get cache: cache key: asdf::2024-04-24-04-41::request_count; local_only: False
get cache: cache result: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
updated_value in failure call: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
set cache: key: asdf::2024-04-24-04-41::request_count; value: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
InMemoryCache: set_cache
04:41:59 - LiteLLM Router:INFO: router.py:537 - litellm.acompletion(model=openrouter/meta-llama/llama-3-70b-instruct:nitro) Exception Invalid response object Traceback (most recent call last):
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
    for idx, choice in enumerate(response_object["choices"]):
TypeError: 'NoneType' object is not iterable

Traceback (most recent call last):
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
    for idx, choice in enumerate(response_object["choices"]):
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/main.py", line 318, in acompletion
    response = await init_response
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/llms/openai.py", line 477, in acompletion
    raise e
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/llms/openai.py", line 472, in acompletion
    return convert_to_model_response_object(
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 7077, in convert_to_model_response_object
    raise Exception(f"Invalid response object {traceback.format_exc()}")
Exception: Invalid response object Traceback (most recent call last):
  File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
    for idx, choice in enumerate(response_object["choices"]):
TypeError: 'NoneType' object is not iterable

	Initialized litellm callbacks, Async Success Callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f1a09f63760>, <litellm.proxy.hooks.tpm_rpm_limiter._PROXY_MaxTPMRPMLimiter object at 0x7f1a09f63790>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f1a09f63850>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f1a09f63880>, <litellm._service_logger.ServiceLogging object at 0x7f1a05f21780>, <bound method ProxyLogging.response_taking_too_long_callback of <litellm.proxy.utils.ProxyLogging object at 0x7f1a09f63280>>]
	self.optional_params: {}
	litellm.cache: None
	kwargs[caching]: False; litellm.cache: None
	Final returned optional params: {'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'extra_body': {}}
	self.optional_params: {'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'extra_body': {}}
	RAW RESPONSE:
	<coroutine object OpenAIChatCompletion.acompletion at 0x7f1a05c68890>


	04:41:59 - LiteLLM:INFO: utils.py:1133 -

	POST Request Sent from LiteLLM:
	curl -X POST \
	https://openrouter.ai/api/v1/ \
	-H 'Authorization: Bearer sk-or-v1-cb5d631007ed554a0e649850d1fe81191da263f4c681********************' \
	-d '{'model': 'meta-llama/llama-3-70b-instruct:nitro', 'messages': [{'role': 'user', 'content': "Baleen is a scalable multi-hop reasoning system that improves accuracy and robustness in open-domain questions. It uses a condensed retrieval architecture, where facts from each hop are summarized into a short context for subsequent hops, and a focused late interaction passage retriever (FLIPR) to tackle query complexity. This approach allows Baleen to achieve 96.3% answer recall in the top-20 retrieved passages on the HotPotQA benchmark and outperform the official baseline on the HoVer task by over 30 points in retrieval accuracy.\n\n\n\nProcess and generate Markdown\nConvert the provided text into a markdown format using <details> and <summary> tags to create a hierarchical structure. Organize the text by paragraphs or thematic sections, each as a root-level collapsible block. Within each block, sentence by sentence, each sentence or related groups of sentences nested in collapsible sections. Limit nesting to a maximum of two or three levels to avoid recursive runaway, but not just one. Use paragraphs as clues to start and stop a nesting. Each <summary> tag should contain a keyword or short two word phrase in the spirit of the content of the <details>. <details> contains the sentence exactly word for word. Do each sentence exactly. This method creates a manageable hierarchical document, with each root details having at least one or more children, with the depth of nesting clearly controlled to prevent infinite recursion.\n\n\nStart with a ```markdown and then wrap the whole thing in <detail><summary>{subject of paper}</summary>...... text and child <detail's. Remember every sentence is wrapped with a <d><s>keyword</s>content</d> pattern, and inside any sentence highlight keyword\n\nPlease note process the entire text in full details do not skip any sentence or paragraph."}], 'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'extra_body': {'transforms': []}, 'extra_headers': {'HTTP-Referer': 'https://litellm.ai', 'X-Title': 'liteLLM'}}'


	RAW RESPONSE:
	{"id": null, "choices": null, "created": null, "model": null, "object": null, "system_fingerprint": null, "usage": null, "error": {"message": "This endpoint's maximum context length is 8192 tokens. However, you requested about 12490 tokens (490 in the input, 12000. Please reduce the length of either one, or use the \"middle-out\" transform to compress your prompt automatically.", "code": 400}}


	Logging Details: logger_fn - None \| callable(logger_fn) - False
	Logging Details LiteLLM-Failure Call
	get cache: cache key: 04-41:cooldown_models; local_only: False
	get cache: cache result: None
	set cache: key: 04-41:cooldown_models; value: ['1873cd978ca45b20666fb36820f14d2de4ea9895370eb897a28d864f835c118f']
	InMemoryCache: set_cache
	Inside Max Parallel Request Failure Hook
	user_api_key: asdf
	get cache: cache key: asdf::2024-04-24-04-41::request_count; local_only: False
	get cache: cache result: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
	updated_value in failure call: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
	set cache: key: asdf::2024-04-24-04-41::request_count; value: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
	InMemoryCache: set_cache
	04:41:59 - LiteLLM Router:INFO: router.py:537 - litellm.acompletion(model=openrouter/meta-llama/llama-3-70b-instruct:nitro) Exception Invalid response object Traceback (most recent call last):
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
	for idx, choice in enumerate(response_object["choices"]):
	TypeError: 'NoneType' object is not iterable

	Traceback (most recent call last):
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
	for idx, choice in enumerate(response_object["choices"]):
	TypeError: 'NoneType' object is not iterable

	During handling of the above exception, another exception occurred:

	Traceback (most recent call last):
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/main.py", line 318, in acompletion
	response = await init_response
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/llms/openai.py", line 477, in acompletion
	raise e
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/llms/openai.py", line 472, in acompletion
	return convert_to_model_response_object(
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 7077, in convert_to_model_response_object
	raise Exception(f"Invalid response object {traceback.format_exc()}")
	Exception: Invalid response object Traceback (most recent call last):
	File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
	for idx, choice in enumerate(response_object["choices"]):
	TypeError: 'NoneType' object is not iterable