Hi all,
Sharing our public roadmap on LiteLLM performance overheads:
As of v1.78.5, the LiteLLM AI Gateway adds a 8 ms median overhead and 45 ms P99 overhead at 1 K concurrent requests with 4 LiteLLM instances.
This is an ~80% improvement over v1.76.0. This roadmap has 3 key components we plan on achieving by end of 2025:
- Achieve 8 ms median across all major LLM endpoints.
- Resolve open memory leak issues.