📈 New Prometheus Metrics
doc: https://docs.litellm.ai/docs/proxy/prometheus#llm-api--provider-metrics
Release: https://github.com/BerriAI/litellm/releases/tag/v1.43.7-stable
llm_deployment_latency_per_output_token -> Track latency / output tokens
llm_deployment_failure_responses -> Calculate error rate per deployment (divide this by llm_deployment_total_requests
llm_deployment_successful_fallbacks -> Number of successful fallback requests from primary model -> fallback model
llm_deployment_failed_fallbacks -> Number of failed fallback requests from primary model -> fallback model
What's Changed
- [Refactor+Testing] Refactor Prometheus metrics to use CustomLogger class + add testing for prometheus by @ishaan-jaff in #5149
- fix(main.py): safely fail stream_chunk_builder calls by @krrishdholakia in #5151
- Feat - track response latency on prometheus by @ishaan-jaff in #5152
- Feat - Proxy track fallback metrics on prometheus by @ishaan-jaff in #5153
Full Changelog: v1.43.6...v1.43.7-stable
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.43.7-stable
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 130.0 | 158.2456064989428 | 6.32111960609156 | 0.0 | 1892 | 0 | 111.09798900000101 | 2661.257857999999 |
Aggregated | Passed ✅ | 130.0 | 158.2456064989428 | 6.32111960609156 | 0.0 | 1892 | 0 | 111.09798900000101 | 2661.257857999999 |