What's Changed
- [Fix] Azure Post-API Call occurs before Pre-API Call in CustomLogger by @ishaan-jaff in #4451
- [Fix-Proxy] Fix in memory caching memory leak by @ishaan-jaff in #4366
- fix: do not resolve vertex project id from creds by @ushuz in #4445
- fix(utils.py): return 'response_cost' in completion call by @krrishdholakia in #4436
- feat(azure.py): azure tts support by @krrishdholakia in #4449
- fix(token_counter.py): New `get_modified_max_tokens' helper func by @krrishdholakia in #4446
- docs: minor link repairs by @nibalizer in #4460
- Fix typo by @lnguyen in #4457
- Docs create pass through routes litellm proxy (tutorial setup cohere Re-Rank Endpoint) by @ishaan-jaff in #4463
- [Feat] Allow users to set pass through endpoint + add Cohere Re-Rank by @ishaan-jaff in #4462
- [Enterprise] Return Raw response from Lakera in failed responses by @ishaan-jaff in #4464
- [Feat] - Proxy support Passing through Langfuse requests by @ishaan-jaff in #4465
New Contributors
Full Changelog: v1.40.29...v1.40.31
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.40.31
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 77 | 91.1696113255562 | 6.455451046411027 | 0.0 | 1929 | 0 | 66.79889399998729 | 1628.1963670000437 |
Aggregated | Passed ✅ | 77 | 91.1696113255562 | 6.455451046411027 | 0.0 | 1929 | 0 | 66.79889399998729 | 1628.1963670000437 |