Skip to content

Instantly share code, notes, and snippets.

@fullstackwebdev
Created April 24, 2024 04:46
Show Gist options
  • Save fullstackwebdev/a89ad8522cc01fb409f229f186216773 to your computer and use it in GitHub Desktop.
Save fullstackwebdev/a89ad8522cc01fb409f229f186216773 to your computer and use it in GitHub Desktop.
Initialized litellm callbacks, Async Success Callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x7f1a09f63760>, <litellm.proxy.hooks.tpm_rpm_limiter._PROXY_MaxTPMRPMLimiter object at 0x7f1a09f63790>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x7f1a09f63850>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x7f1a09f63880>, <litellm._service_logger.ServiceLogging object at 0x7f1a05f21780>, <bound method ProxyLogging.response_taking_too_long_callback of <litellm.proxy.utils.ProxyLogging object at 0x7f1a09f63280>>]
self.optional_params: {}
litellm.cache: None
kwargs[caching]: False; litellm.cache: None
Final returned optional params: {'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'extra_body': {}}
self.optional_params: {'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'extra_body': {}}
RAW RESPONSE:
<coroutine object OpenAIChatCompletion.acompletion at 0x7f1a05c68890>
04:41:59 - LiteLLM:INFO: utils.py:1133 -
POST Request Sent from LiteLLM:
curl -X POST \
https://openrouter.ai/api/v1/ \
-H 'Authorization: Bearer sk-or-v1-cb5d631007ed554a0e649850d1fe81191da263f4c681********************' \
-d '{'model': 'meta-llama/llama-3-70b-instruct:nitro', 'messages': [{'role': 'user', 'content': "Baleen is a scalable multi-hop reasoning system that improves accuracy and robustness in open-domain questions. It uses a condensed retrieval architecture, where facts from each hop are summarized into a short context for subsequent hops, and a focused late interaction passage retriever (FLIPR) to tackle query complexity. This approach allows Baleen to achieve 96.3% answer recall in the top-20 retrieved passages on the HotPotQA benchmark and outperform the official baseline on the HoVer task by over 30 points in retrieval accuracy.\n\n\n\nProcess and generate Markdown\nConvert the provided text into a markdown format using <details> and <summary> tags to create a hierarchical structure. Organize the text by paragraphs or thematic sections, each as a root-level collapsible block. Within each block, sentence by sentence, each sentence or related groups of sentences nested in collapsible sections. Limit nesting to a maximum of two or three levels to avoid recursive runaway, but not just one. Use paragraphs as clues to start and stop a nesting. Each <summary> tag should contain a keyword or short two word phrase in the spirit of the content of the <details>. <details> contains the sentence exactly word for word. Do each sentence exactly. This method creates a manageable hierarchical document, with each root details having at least one or more children, with the depth of nesting clearly controlled to prevent infinite recursion.\n\n\nStart with a ```markdown and then wrap the whole thing in <detail><summary>{subject of paper}</summary>...... text and child <detail's. Remember every sentence is wrapped with a <d><s>keyword</s>content</d> pattern, and inside any sentence highlight keyword\n\nPlease note process the entire text in full details do not skip any sentence or paragraph."}], 'temperature': 0.0, 'top_p': 1, 'n': 1, 'max_tokens': 12000, 'presence_penalty': 0, 'frequency_penalty': 0, 'extra_body': {'transforms': []}, 'extra_headers': {'HTTP-Referer': 'https://litellm.ai', 'X-Title': 'liteLLM'}}'
RAW RESPONSE:
{"id": null, "choices": null, "created": null, "model": null, "object": null, "system_fingerprint": null, "usage": null, "error": {"message": "This endpoint's maximum context length is 8192 tokens. However, you requested about 12490 tokens (490 in the input, 12000. Please reduce the length of either one, or use the \"middle-out\" transform to compress your prompt automatically.", "code": 400}}
Logging Details: logger_fn - None | callable(logger_fn) - False
Logging Details LiteLLM-Failure Call
get cache: cache key: 04-41:cooldown_models; local_only: False
get cache: cache result: None
set cache: key: 04-41:cooldown_models; value: ['1873cd978ca45b20666fb36820f14d2de4ea9895370eb897a28d864f835c118f']
InMemoryCache: set_cache
Inside Max Parallel Request Failure Hook
user_api_key: asdf
get cache: cache key: asdf::2024-04-24-04-41::request_count; local_only: False
get cache: cache result: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
updated_value in failure call: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
set cache: key: asdf::2024-04-24-04-41::request_count; value: {'current_requests': 0, 'current_tpm': 0, 'current_rpm': 0}
InMemoryCache: set_cache
04:41:59 - LiteLLM Router:INFO: router.py:537 - litellm.acompletion(model=openrouter/meta-llama/llama-3-70b-instruct:nitro) Exception Invalid response object Traceback (most recent call last):
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
for idx, choice in enumerate(response_object["choices"]):
TypeError: 'NoneType' object is not iterable
Traceback (most recent call last):
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
for idx, choice in enumerate(response_object["choices"]):
TypeError: 'NoneType' object is not iterable
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/main.py", line 318, in acompletion
response = await init_response
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/llms/openai.py", line 477, in acompletion
raise e
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/llms/openai.py", line 472, in acompletion
return convert_to_model_response_object(
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 7077, in convert_to_model_response_object
raise Exception(f"Invalid response object {traceback.format_exc()}")
Exception: Invalid response object Traceback (most recent call last):
File "/home/fullstack/.local/lib/python3.10/site-packages/litellm/utils.py", line 6957, in convert_to_model_response_object
for idx, choice in enumerate(response_object["choices"]):
TypeError: 'NoneType' object is not iterable
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment