BerriAI/litellm v1.67.6-nightly on GitHub

What's Changed

Update fireworks ai pricing by @krrishdholakia in #10425
Schedule budget resets at expectable times (#10331) by @krrishdholakia in #10333
Embedding caching fixes - handle str -> list cache, set usage tokens for cache hits, combine usage tokens on partial cache hits by @krrishdholakia in #10424
Contributor PR - Support OPENAI_BASE_URL in addition to OPENAI_API_BASE (#9995) by @ishaan-jaff in #10423
New feature: Add Python client library for LiteLLM Proxy by @msabramo in #10445
Add key-level multi-instance tpm/rpm/max parallel request limiting by @krrishdholakia in #10458
[UI] Allow adding triton models on LiteLLM UI by @ishaan-jaff in #10456
[Feat] Vector Stores/KnowledgeBases - Allow defining Vector Store Configs by @ishaan-jaff in #10448
Add low-level interface to client library for doing HTTP requests by @msabramo in #10452
Correctly re-raise 504 errors and Add gpt-4o-mini-tts support by @krrishdholakia in #10462
UI - Fix filtering on key alias + support global sorting on keys by @krrishdholakia in #10455
[Bug Fix] Ensure Non-Admin virtual keys can access /mcp routes by @ishaan-jaff in #10473
[Fixes] Azure OpenAI OIDC - allow using litellm defined params for OIDC Auth by @ishaan-jaff in #10394
Add supports_pdf_input: true to Claude 3.7 bedrock models by @RupertoM in #9917
Add llamafile as a provider (#10203) by @ishaan-jaff in #10482
Fix mcp.md in documentation by @1995parham in #10493
docs(realtime): yaml config example for realtime model by @kmontocam in #10489
Fix return finish_reason = "tool_calls" for gemini tool calling by @krrishdholakia in #10485
Add user + team based multi-instance rate limiting by @krrishdholakia in #10497
mypy tweaks by @msabramo in #10490
Add vertex ai meta llama 4 support + handle tool call result in content for vertex ai by @krrishdholakia in #10492
Fix and rewrite of token_counter by @happyherp in #10409
[Fix + Refactor] Trigger Soft Budget Webhooks When Key Crosses Threshold by @ishaan-jaff in #10491
[Bug Fix] Ensure Web Search / File Search cost are only added when the response includes the too call by @ishaan-jaff in #10476
Fixes for test_team_budget_metrics and test_generate_and_update_key by @S1LV3RJ1NX in #10500

New Contributors

@RupertoM made their first contribution in #9917
@1995parham made their first contribution in #10493
@kmontocam made their first contribution in #10489
@happyherp made their first contribution in #10409

Full Changelog: v1.67.5-nightly...v1.67.6-nightly

Docker Run LiteLLM Proxy

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.67.6-nightly

Don't want to maintain your internal proxy? get in touch 🎉

Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Load Test LiteLLM Proxy Results

Name	Status	Median Response Time (ms)	Average Response Time (ms)	Requests/s	Failures/s	Request Count	Failure Count	Min Response Time (ms)	Max Response Time (ms)
/chat/completions	Passed ✅	180.0	222.14133768625408	6.199997054289087	0.0	1855	0	165.41539600001443	4686.558129000048
Aggregated	Passed ✅	180.0	222.14133768625408	6.199997054289087	0.0	1855	0	165.41539600001443	4686.558129000048