What's Changed
- Update fireworks ai pricing by @krrishdholakia in #10425
- Schedule budget resets at expectable times (#10331) by @krrishdholakia in #10333
- Embedding caching fixes - handle str -> list cache, set usage tokens for cache hits, combine usage tokens on partial cache hits by @krrishdholakia in #10424
- Contributor PR - Support OPENAI_BASE_URL in addition to OPENAI_API_BASE (#9995) by @ishaan-jaff in #10423
- New feature: Add Python client library for LiteLLM Proxy by @msabramo in #10445
- Add key-level multi-instance tpm/rpm/max parallel request limiting by @krrishdholakia in #10458
- [UI] Allow adding triton models on LiteLLM UI by @ishaan-jaff in #10456
- [Feat] Vector Stores/KnowledgeBases - Allow defining Vector Store Configs by @ishaan-jaff in #10448
- Add low-level interface to client library for doing HTTP requests by @msabramo in #10452
- Correctly re-raise 504 errors and Add
gpt-4o-mini-tts
support by @krrishdholakia in #10462 - UI - Fix filtering on key alias + support global sorting on keys by @krrishdholakia in #10455
- [Bug Fix] Ensure Non-Admin virtual keys can access /mcp routes by @ishaan-jaff in #10473
- [Fixes] Azure OpenAI OIDC - allow using litellm defined params for OIDC Auth by @ishaan-jaff in #10394
- Add supports_pdf_input: true to Claude 3.7 bedrock models by @RupertoM in #9917
- Add
llamafile
as a provider (#10203) by @ishaan-jaff in #10482 - Fix mcp.md in documentation by @1995parham in #10493
- docs(realtime): yaml config example for realtime model by @kmontocam in #10489
- Fix return finish_reason = "tool_calls" for gemini tool calling by @krrishdholakia in #10485
- Add user + team based multi-instance rate limiting by @krrishdholakia in #10497
- mypy tweaks by @msabramo in #10490
- Add vertex ai meta llama 4 support + handle tool call result in content for vertex ai by @krrishdholakia in #10492
- Fix and rewrite of token_counter by @happyherp in #10409
- [Fix + Refactor] Trigger Soft Budget Webhooks When Key Crosses Threshold by @ishaan-jaff in #10491
- [Bug Fix] Ensure Web Search / File Search cost are only added when the response includes the too call by @ishaan-jaff in #10476
- Fixes for
test_team_budget_metrics
andtest_generate_and_update_key
by @S1LV3RJ1NX in #10500
New Contributors
- @RupertoM made their first contribution in #9917
- @1995parham made their first contribution in #10493
- @kmontocam made their first contribution in #10489
- @happyherp made their first contribution in #10409
Full Changelog: v1.67.5-nightly...v1.67.6-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.67.6-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 180.0 | 222.14133768625408 | 6.199997054289087 | 0.0 | 1855 | 0 | 165.41539600001443 | 4686.558129000048 |
Aggregated | Passed ✅ | 180.0 | 222.14133768625408 | 6.199997054289087 | 0.0 | 1855 | 0 | 165.41539600001443 | 4686.558129000048 |