What's Changed
- feat mem utils debugging return size of in memory cache by @ishaan-jaff in #4705
- [Fix Memory Usage] - only use per request tracking if slack alerting is being used by @ishaan-jaff in #4703
- [Debug-Utils] Add some useful memory usage debugging utils by @ishaan-jaff in #4704
- Return
retry-after
header for rate limited requests by @krrishdholakia in #4706 - add azure ai pricing + token info (mistral/jamba instruct/llama3) by @krrishdholakia in #4702
- Allow setting
logging_only
in guardrails config by @krrishdholakia in #4696
Full Changelog: v1.41.21...v1.41.22
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.41.22
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 150.0 | 170.61999054511887 | 6.341044942359792 | 0.0 | 1895 | 0 | 122.10332300003301 | 1263.2002629999874 |
Aggregated | Passed ✅ | 150.0 | 170.61999054511887 | 6.341044942359792 | 0.0 | 1895 | 0 | 122.10332300003301 | 1263.2002629999874 |