What's Changed
- fix: DD tool calls passed in metadata by @mubashir1osmani in #14531
- Added user_email labels to the prometheus monitoring. by @boopesh07 in #14520
- feat: add tool-permission guardrail by @uc4w6c in #14519
- Add sambanova deepseek v3.1 and gpt-oss-120b models by @luisfucros in #14500
- fix: completion chat id by @hanakannzashi in #14548
- feat: Add OVHCloud AI Endpoints as a provider by @eliasto in #14494
- Resolve cache key collision issue where all soft budget alerts use identical cache keys by @Rasmusafj in #14491
- Fix unsupported stop param for grok-code models by @Sameerlite in #14565
- [Feat]Add cancel endpoint support for openai and azure by @Sameerlite in #14561
- Fix: Bedrock cross-region inference profile cost calculation by @timelfrink in #14566
- [Bug Fix] SCIM v2 - ensure group PUSH and PUT ops allow creating non-existent members by @ishaan-jaff in #14581
- s3_endpoint_url returned 404 by @mubashir1osmani in #14559
- Fix: Vertex AI Gemini labels field provider-aware filtering by @timelfrink in #14563
- Add AWS external ID parameter support for Bedrock authentication by @timelfrink in #14582
- [Fix] /responses API - add cancel endpoint + allow non-admins to use this as an llm api endpoint by @ishaan-jaff in #14594
- Fix: handle empty arguments in Bedrock tool call invocation by @pazevedo-hyland in #14583
- fix(proxy): Correctly parse multi-part MCP server aliases from URL paths by @iabhi4 in #14558
- fix: recompute filters after deleting an MCP Server by @uc4w6c in #14542
- Add CompactifAI provider support by @timelfrink in #14532
- feat(proxy): Assign default budget to auto-generated JWT teams by @iabhi4 in #14514
- fix volcengine thinking parameters missing when it set disable by @LingXuanYin in #14569
- [Performance] RPS Improvement +500 RPS when sending the
user
field by @AlexsanderHamir in #14616 - Add Support for Bedrock Guardrails to supportive selective Guarding by @Sameerlite in #14575
- [Feat] Batches - Add bedrock retrieve endpoint support by @ishaan-jaff in #14618
- [Fix]
AttributeError: NoneType object has no attribute 'get'
inparallel_request_limiter_v3.py
by @teremterem in #14609 - added langfuse logging for responses api by @mubashir1osmani in #14597
- Fix: 14473: Guardrail view, edit, and delete behavior by @ARajan1084 in #14622
- (feat) Anthropic - document pricing for cache creation tokens above 1hr by @krrishdholakia in #14620
- Fix error message for missing OCI parameters by @ronaldpereira in #14613
- docs: helicone integration and mcp by @mubashir1osmani in #14600
- Litellm gemini api base update by @Sameerlite in #14604
- DataDog shows spend metrics by @mubashir1osmani in #14555
- fix: improve response api handling and cold storage configuration by @hula-la in #14534
- Describing the
labels
field use in the Vertex AI by @vvidovic in #14448 - fix: avoid deepcopy crash with non-pickleables in Gemini/Vertex by @iabhi4 in #14418
- Fix: MDX compilation error in CompactifAI documentation by @timelfrink in #14625
- Feat/add posthog observability by @carlos-marchal-ph in #14610
- [Security] bump aiohttp==3.12.14, fix CVE-2025-53643. by @ishaan-jaff in #14638
- Remove not needed names by @Sameerlite in #14641
- UI - allow team member to view service account keys they create + Anthropic - include cache creation tokens in prompt token total (separate out during cost tracking) by @krrishdholakia in #14619
- Bedrock Guardrails - support setting bedrock runtime endpoint + Protect
/health/test_connect
to prevent users without model creation permissions from calling it by @krrishdholakia in #14650 - fix: ci/cd tests + lint errors by @mubashir1osmani in #14646
- fix: In Memory Guardrail fails to update by @ARajan1084 in #14653
- fix: iscoroutine removed from hot path +50 RPS by @AlexsanderHamir in #14649
- [Feat]Add last message as default in gaurdrail by @Sameerlite in #14640
- feat: implement middle-truncation for spend log payloads by @akraines in #14637
- Implement AWS Bedrock CountTokens API support by @timelfrink in #14557
- [Fix] Handle Cohere Generate API Deprecation - default to cohere chat endpoints by default by @ishaan-jaff in #14676
- correct the gaurdcontent name by @Sameerlite in #14684
- (Feat) Add TwelveLabs Marengo Embed 2.7 Support to LiteLLM by @Sameerlite in #14674
- [Fix]: BadRequestError cohere chat api - exception mapping by @ishaan-jaff in #14691
- fix: flaky passthrough tests by @mubashir1osmani in #14692
- [Feat] Support for is_streamed_request with datadog by @eycjur in #14673
- [Feat] Add Bedrock Twelve Labs embedding provider support by @ishaan-jaff in #14697
- fix: reduced inits overhead in 7% by @AlexsanderHamir in #14689
- NEW Amazon Bedrock Guardrail Info View in Logs by @ARajan1084 in #14696
- NEW Amazon Bedrock Guardrail Info View in Logs by @ARajan1084 in #14699
- Fix: Bedrock Titan V2 encoding_format parameter support by @timelfrink in #14687
- NEW Amazon Bedrock Guardrail Info View in Logs by @ARajan1084 in #14701
- feature: generic object pool by @AlexsanderHamir in #14702
- NEW Amazon Bedrock Guardrail Info View in Logs by @krrishdholakia in #14698
- fix: Amazon Bedrock incorrect guardrailResponse bug by @ARajan1084 in #14690
- fix: check for AWS exceptions despite a 200 response by @ARajan1084 in #14658
- Added missing dependencies by @AlexsanderHamir in #14706
- Anthropic - account for 1h vs. 5m cache creation token cost difference + UI - add langsmith_sampling_rate as a dynamic param by @krrishdholakia in #14652
- fix: Bedrock guardrail silent failure correction by @ARajan1084 in #14707
- fix contributor PR linting failing by @ishaan-jaff in #14710
- fix: timezone issue of opik by @mrFranklin in #14708
- Update Bedrock documentation for Titan V2 encoding_format support + Anthropic - account for 1h vs. 5m cache creation token cost difference + UI - add langsmith_sampling_rate as a dynamic param by @krrishdholakia in #14700
- Fix/mcp gateway tools list by @uc4w6c in #14695
- fix: cost calculation for responses by @tcx4c70 in #14675
- Added Indochina Time timezone support for budget resets by @michaeltansg in #14666
- Fix: gemini-2.5-flash-image-preview model routing for image generation by @timelfrink in #14715
- Fix document of quick start in proxy deploy by @tosi29 in #14725
- fix: Prevent AttributeError for _get_tags_from_request_kwargs by @gmdfalk in #14735
- build(deps): bump esbuild and vite in /ui/litellm-dashboard by @dependabot[bot] in #14703
- Litellm gemini batch by @FelipeRodriguesGare in #14733
- doc: jump to the correct location for LiteLLM Proxy section by @mrFranklin in #14722
- [Feat] Dynamic Rate Limiter v3 - fixes to ensure priority routing works as expected by @ishaan-jaff in #14734
New Contributors
- @luisfucros made their first contribution in #14500
- @hanakannzashi made their first contribution in #14548
- @eliasto made their first contribution in #14494
- @Rasmusafj made their first contribution in #14491
- @LingXuanYin made their first contribution in #14569
- @ronaldpereira made their first contribution in #14613
- @hula-la made their first contribution in #14534
- @carlos-marchal-ph made their first contribution in #14610
- @akraines made their first contribution in #14637
- @mrFranklin made their first contribution in #14708
- @tcx4c70 made their first contribution in #14675
- @michaeltansg made their first contribution in #14666
- @tosi29 made their first contribution in #14725
- @gmdfalk made their first contribution in #14735
- @FelipeRodriguesGare made their first contribution in #14733
Full Changelog: v1.77.2.rc.1...v1.77.3.dynamic_rates
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.77.3.dynamic_rates
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Failed ❌ | 68 | 117.10864588269038 | 6.407950876456694 | 6.407950876456694 | 1918 | 1918 | 54.43497900000693 | 4616.300067999987 |
Aggregated | Failed ❌ | 68 | 117.10864588269038 | 6.407950876456694 | 6.407950876456694 | 1918 | 1918 | 54.43497900000693 | 4616.300067999987 |