BerriAI/litellm v1.76.1-nightly on GitHub

What's Changed

Litellm dev 08 16 2025 p3 by @krrishdholakia in #13694
GPT-5-chat does not support function by @superpoussin22 in #13612
fix(vertexai-batch): fix vertexai batch file format by @thiagosalvatore in #13576
[Feat] Datadog LLM Observability - Add support for Failure Logging by @ishaan-jaff in #13726
[Feat] DD LLM Observability - Add time to first token, litellm overhead, guardrail overhead latency metrics by @ishaan-jaff in #13734
[Bug Fix] litellm incompatible with newest release of openAI v1.100.0 by @ishaan-jaff in #13728
[Bug Fix] image_edit() function returns APIConnectionError with litellm_proxy - Support for both image edits and image generations by @ishaan-jaff in #13735
[Fix] Cooldowns - don't return raw Azure Exceptions to client by @krrishdholakia in #13529
Responses API - add default api version for openai responses api calls + Openrouter - fix claude-sonnet-4 on openrouter + Azure - Handle openai/v1/responses by @krrishdholakia in #13526
Use namespace as prefix for s3 cache by @michal-otmianowski in #13704
Add Search Functionality for Public Model Names in Model Dashboard by @NANDINI-star in #13687
Add Azure Deployment Name Support in UI by @NANDINI-star in #13685
Fix - gemini prompt caching cost calculation by @krrishdholakia in #13742
Refactor - forward model group headers - reuse same logic as global header forwarding by @krrishdholakia in #13741
Fix Groq streaming ASCII encoding issue by @colesmcintosh in #13675
Add possibility to configure resources for migrations-job in Helm chart by @moandersson in #13617
[Feat] Datadog LLM Observability - Add support for tracing guardrail input/output by @ishaan-jaff in #13767
Models page row UI restructure by @NANDINI-star in #13771
[Bug Fix] Bedrock KB - Using LiteLLM Managed Credentials for Query by @ishaan-jaff in #13787
[Bug Fix] Fixes for using Auto Router with LiteLLM Docker Image by @ishaan-jaff in #13788
[Feat] - UI Allow using Key/Team Based Logging for Langfuse OTEL by @ishaan-jaff in #13791
Add long context support for claude-4-sonnet by @kankute-sameer in #13759
Migrate to aim new firewall api by @hxdror in #13748
[LLM Translation] Adjust max_input_tokens for azure/gpt-5-chat models in JSON configuration by @jugaldb in #13660
Added Qwen3, Deepseek R1 0528 Throughput, GLM 4.5 and GPT-OSS models for Together AI by @Tasmay-Tibrewal in #13637
Fix query passthrough deletion by @NANDINI-star in #13622
[Feat] add fireworks_ai/accounts/fireworks/models/deepseek-v3-0324 by @ishaan-jaff in #13821
New notifications toast UI everywhere by @NANDINI-star in #13813
Fix key edit settings after regenerating key by @NANDINI-star in #13815
[Feat] Add VertexAI qwen API Service by @ishaan-jaff in #13828
Add OTEL tracing for actual LLM API call by @krrishdholakia in #13836
[Performance] Improve LiteLLM Python SDK RPS by +200 RPS by @ishaan-jaff in #13839
Fix(bedrock): fix the api key support for bedrock guardrail in proxy by @0x-fang in #13835
Add rerank endpoint support for deepinfra by @kankute-sameer in #13820
fix : Synchronize cache behavior between acompletion and completion by @UlookEE in #13803
Include predicted output in MLflow tracing by @TomeHirata in #13795
Fix - Ensure Helm chart auto generated master keys follow sk-xxxx format by @ishaan-jaff in #13871
[Fix] Ensure Service Account Keys require team_id field on API Endpoints by @ishaan-jaff in #13873
Fix e2e_ui_test by @NANDINI-star in #13861
Fix Filter Dropdown UX Issue - Load Initial Options by @NANDINI-star in #13858
[Helm charts] Enhance database configuration: add support for optional endpointKey by @jugaldb in #13763
[Feat] Add new VertexAI image models vertex_ai/imagen-4.0-generate-001, vertex_ai/imagen-4.0-ultra-generate-001, vertex_ai/imagen-4.0-fast-generate-001 by @ishaan-jaff in #13874
[Feat] Add new Google AI Studio image models gemini/imagen-4.0-generate-001, gemini/imagen-4.0-ultra-generate-001, gemini/imagen-4.0-fast-generate-001 by @ishaan-jaff in #13876
Update Baseten LiteLLM integration by @philipkiely-baseten in #13783
Fix(Bedrock): fix application inference profile for pass-through endpoints for bedrock by @0x-fang in #13796
Fix e2e_ui_test by @NANDINI-star in #13881
[Performance] Use O(1) Set lookups for model routing by @ishaan-jaff in #13879
Update model metadata for Deepinfra provider by @Toy-97 in #13883
fix: fixing descriptor/response size mismatch on parallel_request_limiter_v3 by @luizrennocosta in #13863
[Feat] Add support for voyage-context-3 embedding model by @kankute-sameer in #13868
🐛 Bug Fix: Updated URL handling for DataRobot provider URL by @carsongee in #13880
Async s3 implementation by @michal-otmianowski in #13852
fix: role chaining and session name with webauthentication for aws bedrock by @RichardoC in #13753
[Bug Fix] JS exception in User Agent Activity: Cannot read properties of undefined by @ishaan-jaff in #13892
[ui/dashboard] add support for host_vllm by @NiuBlibing in #13885
[Documentation] Litellm rerank deepinfra endpoint by @kankute-sameer in #13845
[MCP Gateway] fix StreamableHTTPSessionManager .run() error by @jugaldb in #13666
[Performance] Reduce Significant CPU overhead from litellm_logging.py by @ishaan-jaff in #13895
Fix Ollama transformations crash when tools are used with non-tool trained models by @bcdonadio in #13902
Add openrouter deepseek/deepseek-chat-v3.1 support by @kankute-sameer in #13897
docs: clarify prerequisites and env var for team rate limits by @TeddyAmkie in #13899
[Enhancement] Add support for Mistral model file handling and update documentation by @jinskjoy in #13866
fix permission access on prisma migrate in non-root image by @Ithanil in #13848
feat(utils.py): accept 'api_version' as param for validate_environment by @mainred in #13808
Responses API - support allowed_openai_params + Mistral - handle empty assistant content + support new mistral 'thinking' response block by @krrishdholakia in #13671
fix(openai/image_edits): Support 'mask' parameter for openai image edits by @krrishdholakia in #13673
SSO - Free SSO usage for up to 5 users + remove deprecated dbrx models (dbrx-instruct, llama 3.1) by @krrishdholakia in #13843
Fix calling key with access to model alias by @krrishdholakia in #13830
[Feat] New LLM API - AI/ML API for Image Gen by @ishaan-jaff in #13893
[Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS by @ishaan-jaff in #13905
Added FAQ under deployment docs by @mubashir1osmani in #13912
updated claude-code docs by @mubashir1osmani in #13784
[Feat] UI QA Fixes by @ishaan-jaff in #13915
[BUG] Add back supervisor to non-root image by @ArthurRenault in #13922
Add support for AWS assume_role with a session token by @stevenmanton in #13919
Fix missing and unused imports in custom_guardrail docs example by @uc4w6c in #13914
[UI QA] - Allow setting Team Member RPM/TPM limits when creating a team by @ishaan-jaff in #13943
[Bug fix] - Fix /messages fallback from Anthropic API -> Bedrock API by @ishaan-jaff in #13946
[Bug Fix] Azure Passthrough request with streaming by @ishaan-jaff in #13831
[Bug] Fix: Vertex Mistral not working for streaming by @ishaan-jaff in #13952
Add DeepSeek-v3.1 pricing for Fireworks AI provider by @TeddyAmkie in #13958
feat: add image headers for Copilot by @ckoehler in #13955
Verify if cache entry has expired prior to serving it to client by @michal-otmianowski in #13933
feat: multiple images in openai images/edits endpoint by @mubashir1osmani in #13916
Feature/braintrust span name metadata by @nielsbosma in #13573
fix: remove incorrect web search support for azure/gpt-4.1 family by @kankute-sameer in #13566
Update model prices and context window by @Yuki-Imajuku in #13567
[Feat] New model gemini-2.5-flash-image-preview by @ishaan-jaff in #13979
⚡️ Speed up function _is_debugging_on by 45% by @codeflash-ai[bot] in #13988
[Bug]: Fix tests to reference moved attributes in braintrust_logging module by @ColeFrench in #13978
[Perf] 6.5x faster LiteLLM Python SDK Completion by @ishaan-jaff in #13990
[Perf] Use fastuuid for fast UUID generations - 2.1x Faster by @ishaan-jaff in #13992
bump orjson version to "3.11.2" by @dttran-glo in #13969
feat(constants): expand Nebius provider models and normalize model IDs by @manascb1344 in #13965
Deepinfra Metadata Update 24082025 by @Toy-97 in #13917
Add Noma Security guardrail support by @DorZion in #13572
Add openrouter gpt-5 family models pricing by @edwardsamuel in #13536
docs: Add CometAPI documentation with authentication, usage examples, and error handling by @TensorNull in #13534
Fix token_counter with special token input by @blahgeek in #13374
Enhance logging for containers to log on files both with usual format and json format by @Deviad in #13394
[Bug Fix] LLM Translation - Allow using dynamic api_key for image generation requests by @ishaan-jaff in #14007
[Feature]: Support Gemini requests with only system prompt by @ishaan-jaff in #14010
[Bug]: /responses endpoint proxy ignores extra_headers in GitHub Copilot by @XSAM in #13775
[Feat] langfuse_otel logger - allow using LANGFUSE_OTEL_HOST for configuring host by @ishaan-jaff in #14013
Fix issue #13995: Handle None metadata in batch requests by @xingyaoww in #13996
[Feat] Add support for returning images with gemini/gemini-2.5-flash-image-preview with /chat/completions by @ishaan-jaff in #13983
Update release notes with correct docker tag by @ishaan-jaff in #14014
⚡️ Speed up InMemoryCache.evict_cache by 21% by @KRRT7 in #14012
[Bug Fix] Resolve invalid model name error for Gemini Imagen models by @ikaadil in #13991
feat: Add thinking and reasoning_effort parameter support for GitHub Copilot provider by @timelfrink in #13691
refactor(router): choose weights by 'weight', 'rpm', 'tpm' in one loop for simple_shuffle by @qidu in #13562
Update Pangea Guardrail to support new AIDR endpoint by @ryanmeans in #13160
Ensure that function_call_prompt extends system messages following its current schema by @nagyv in #13243
Remove vector store methods from global scope by @xywei in #12885
fix: make gemini and openai responses return reasoning by default by @aholmberg in #12865
Fix additional anyOf corner cases for Vertex AI Gemini tool calls - issue #11164 by @ericgtkb in #12797
feat: Add support for custom Anthropic-compatible API endpoints by @NoWall57 in #13945
[Bug Fix] Virtual keys with llm_api type cause Internal Server Error when using /anthropic/* and other llm passthrough routes by @ishaan-jaff in #14046
[MCP] Bug fix - adding SSE MCP tools - fix connection test when adding MCPs by @ishaan-jaff in #14048
[Perf] Use fastuuid for +80 RPS when using /chat/completions and other LLM endpoints by @ishaan-jaff in #14016
[Feat] Add xai/grok-code-fast model family by @ishaan-jaff in #14054
Fix LoggingWorker graceful shutdown to prevent CancelledError warnings by @lmwang9527 in #14050
Allow configuration to set threshold before request entry in spend log gets truncated by @WilsonSunBritten in #14042
[Helm charts] Enhance proxy_config configuration: add support for existing configmap by @Const-antine in #14041
📖 Add DataRobot to the provider documentation. by @carsongee in #14038
Fix error saving latency as timedelta on Redis by @dmvieira in #14040
docs: add documentation for LITELLM_ANTHROPIC_DISABLE_URL_SUFFIX envi… by @NoWall57 in #14037
OCI Provider: add oci_key_file as an optional_parameter by @gotsysdba in #14036
Update docs for stream_timeout and timeout by @TeddyAmkie in #14073
[Perf] Proxy /chat/completions - don't print the request params by default (+50 RPS) by @ishaan-jaff in #14015
📖 Added DataRobot provider to sidebar by @carsongee in #14074
[Bug]: Fix Can't set reasoning_effort for DeepSeek-V3.1 on DeepInfra by default by @ishaan-jaff in #14053
docs: usaged-based routing perf warnings by @mubashir1osmani in #14080
[Bug]: grok-4 does not support frequency_penalty, litellm should drop this param for grok-4 by @ishaan-jaff in #14078
docs: Add documentation for MAX_STRING_LENGTH_PROMPT_IN_DB environmen… by @WilsonSunBritten in #14079
feat: add gpt-realtime models - gpt-realtime by @ishaan-jaff in #14082
Fix Next.js Security Vulnerabilities in UI Dashboard by @ishaan-jaff in #14084
[Fix] LiteLLM does not support new web_search tool (Responses API) by @ishaan-jaff in #14083
Add supported text field to anthropic citation response by @TomeHirata in #14026
Bedrock fix structure output by @moshemorad in #14005
Fix collapsible navbar design by @NANDINI-star in #14075
[Bug]: Set user from token user_id for OpenMeter integration by @betterthanbreakfast in #13152
Add Vercel AI Gateway provider by @joshualipman123 in #13144
Fix indentation in get_llm_provider_logic.py by @superpoussin22 in #14088

New Contributors

@michal-otmianowski made their first contribution in #13704
@moandersson made their first contribution in #13617
@Tasmay-Tibrewal made their first contribution in #13637
@UlookEE made their first contribution in #13803
@philipkiely-baseten made their first contribution in #13783
@Toy-97 made their first contribution in #13883
@luizrennocosta made their first contribution in #13863
@carsongee made their first contribution in #13880
@NiuBlibing made their first contribution in #13885
@bcdonadio made their first contribution in #13902
@TeddyAmkie made their first contribution in #13899
@jinskjoy made their first contribution in #13866
@Ithanil made their first contribution in #13848
@mainred made their first contribution in #13808
@ArthurRenault made their first contribution in #13922
@stevenmanton made their first contribution in #13919
@uc4w6c made their first contribution in #13914
@nielsbosma made their first contribution in #13573
@Yuki-Imajuku made their first contribution in #13567
@codeflash-ai[bot] made their first contribution in #13988
@ColeFrench made their first contribution in #13978
@dttran-glo made their first contribution in #13969
@manascb1344 made their first contribution in #13965
@DorZion made their first contribution in #13572
@edwardsamuel made their first contribution in #13536
@blahgeek made their first contribution in #13374
@Deviad made their first contribution in #13394
@XSAM made their first contribution in #13775
@KRRT7 made their first contribution in #14012
@ikaadil made their first contribution in #13991
@timelfrink made their first contribution in #13691
@qidu made their first contribution in #13562
@nagyv made their first contribution in #13243
@xywei made their first contribution in #12885
@ericgtkb made their first contribution in #12797
@NoWall57 made their first contribution in #13945
@lmwang9527 made their first contribution in #14050
@WilsonSunBritten made their first contribution in #14042
@Const-antine made their first contribution in #14041
@dmvieira made their first contribution in #14040
@gotsysdba made their first contribution in #14036
@moshemorad made their first contribution in #14005
@joshualipman123 made their first contribution in #13144

Full Changelog: v1.75.8-stable...v1.76.1-nightly

Docker Run LiteLLM Proxy

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.76.1-nightly

Don't want to maintain your internal proxy? get in touch 🎉

Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Load Test LiteLLM Proxy Results

Name	Status	Median Response Time (ms)	Average Response Time (ms)	Requests/s	Failures/s	Request Count	Failure Count	Min Response Time (ms)	Max Response Time (ms)
/chat/completions	Passed ✅	160.0	192.4000595502936	6.278691415340353	0.0	1879	0	117.62542199994641	1498.513451000008
Aggregated	Passed ✅	160.0	192.4000595502936	6.278691415340353	0.0	1879	0	117.62542199994641	1498.513451000008