BerriAI/litellm v1.43.7-stable on GitHub

📈 New Prometheus Metrics

doc: https://docs.litellm.ai/docs/proxy/prometheus#llm-api--provider-metrics
Release: https://github.com/BerriAI/litellm/releases/tag/v1.43.7-stable

llm_deployment_latency_per_output_token -> Track latency / output tokens
llm_deployment_failure_responses -> Calculate error rate per deployment (divide this by llm_deployment_total_requests
llm_deployment_successful_fallbacks -> Number of successful fallback requests from primary model -> fallback model
llm_deployment_failed_fallbacks -> Number of failed fallback requests from primary model -> fallback model

What's Changed

[Refactor+Testing] Refactor Prometheus metrics to use CustomLogger class + add testing for prometheus by @ishaan-jaff in #5149
fix(main.py): safely fail stream_chunk_builder calls by @krrishdholakia in #5151
Feat - track response latency on prometheus by @ishaan-jaff in #5152
Feat - Proxy track fallback metrics on prometheus by @ishaan-jaff in #5153

Full Changelog: v1.43.6...v1.43.7-stable

Docker Run LiteLLM Proxy

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.43.7-stable

Don't want to maintain your internal proxy? get in touch 🎉

Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Load Test LiteLLM Proxy Results

Name	Status	Median Response Time (ms)	Average Response Time (ms)	Requests/s	Failures/s	Request Count	Failure Count	Min Response Time (ms)	Max Response Time (ms)
/chat/completions	Passed ✅	130.0	158.2456064989428	6.32111960609156	0.0	1892	0	111.09798900000101	2661.257857999999
Aggregated	Passed ✅	130.0	158.2456064989428	6.32111960609156	0.0	1892	0	111.09798900000101	2661.257857999999