github BerriAI/litellm v1.76.1-nightly

latest releases: v1.76.1.rc.1, v1.76.0.rc.1
one day ago

What's Changed

  • Litellm dev 08 16 2025 p3 by @krrishdholakia in #13694
  • GPT-5-chat does not support function by @superpoussin22 in #13612
  • fix(vertexai-batch): fix vertexai batch file format by @thiagosalvatore in #13576
  • [Feat] Datadog LLM Observability - Add support for Failure Logging by @ishaan-jaff in #13726
  • [Feat] DD LLM Observability - Add time to first token, litellm overhead, guardrail overhead latency metrics by @ishaan-jaff in #13734
  • [Bug Fix] litellm incompatible with newest release of openAI v1.100.0 by @ishaan-jaff in #13728
  • [Bug Fix] image_edit() function returns APIConnectionError with litellm_proxy - Support for both image edits and image generations by @ishaan-jaff in #13735
  • [Fix] Cooldowns - don't return raw Azure Exceptions to client by @krrishdholakia in #13529
  • Responses API - add default api version for openai responses api calls + Openrouter - fix claude-sonnet-4 on openrouter + Azure - Handle openai/v1/responses by @krrishdholakia in #13526
  • Use namespace as prefix for s3 cache by @michal-otmianowski in #13704
  • Add Search Functionality for Public Model Names in Model Dashboard by @NANDINI-star in #13687
  • Add Azure Deployment Name Support in UI by @NANDINI-star in #13685
  • Fix - gemini prompt caching cost calculation by @krrishdholakia in #13742
  • Refactor - forward model group headers - reuse same logic as global header forwarding by @krrishdholakia in #13741
  • Fix Groq streaming ASCII encoding issue by @colesmcintosh in #13675
  • Add possibility to configure resources for migrations-job in Helm chart by @moandersson in #13617
  • [Feat] Datadog LLM Observability - Add support for tracing guardrail input/output by @ishaan-jaff in #13767
  • Models page row UI restructure by @NANDINI-star in #13771
  • [Bug Fix] Bedrock KB - Using LiteLLM Managed Credentials for Query by @ishaan-jaff in #13787
  • [Bug Fix] Fixes for using Auto Router with LiteLLM Docker Image by @ishaan-jaff in #13788
  • [Feat] - UI Allow using Key/Team Based Logging for Langfuse OTEL by @ishaan-jaff in #13791
  • Add long context support for claude-4-sonnet by @kankute-sameer in #13759
  • Migrate to aim new firewall api by @hxdror in #13748
  • [LLM Translation] Adjust max_input_tokens for azure/gpt-5-chat models in JSON configuration by @jugaldb in #13660
  • Added Qwen3, Deepseek R1 0528 Throughput, GLM 4.5 and GPT-OSS models for Together AI by @Tasmay-Tibrewal in #13637
  • Fix query passthrough deletion by @NANDINI-star in #13622
  • [Feat] add fireworks_ai/accounts/fireworks/models/deepseek-v3-0324 by @ishaan-jaff in #13821
  • New notifications toast UI everywhere by @NANDINI-star in #13813
  • Fix key edit settings after regenerating key by @NANDINI-star in #13815
  • [Feat] Add VertexAI qwen API Service by @ishaan-jaff in #13828
  • Add OTEL tracing for actual LLM API call by @krrishdholakia in #13836
  • [Performance] Improve LiteLLM Python SDK RPS by +200 RPS by @ishaan-jaff in #13839
  • Fix(bedrock): fix the api key support for bedrock guardrail in proxy by @0x-fang in #13835
  • Add rerank endpoint support for deepinfra by @kankute-sameer in #13820
  • fix : Synchronize cache behavior between acompletion and completion by @UlookEE in #13803
  • Include predicted output in MLflow tracing by @TomeHirata in #13795
  • Fix - Ensure Helm chart auto generated master keys follow sk-xxxx format by @ishaan-jaff in #13871
  • [Fix] Ensure Service Account Keys require team_id field on API Endpoints by @ishaan-jaff in #13873
  • Fix e2e_ui_test by @NANDINI-star in #13861
  • Fix Filter Dropdown UX Issue - Load Initial Options by @NANDINI-star in #13858
  • [Helm charts] Enhance database configuration: add support for optional endpointKey by @jugaldb in #13763
  • [Feat] Add new VertexAI image models vertex_ai/imagen-4.0-generate-001, vertex_ai/imagen-4.0-ultra-generate-001, vertex_ai/imagen-4.0-fast-generate-001 by @ishaan-jaff in #13874
  • [Feat] Add new Google AI Studio image models gemini/imagen-4.0-generate-001, gemini/imagen-4.0-ultra-generate-001, gemini/imagen-4.0-fast-generate-001 by @ishaan-jaff in #13876
  • Update Baseten LiteLLM integration by @philipkiely-baseten in #13783
  • Fix(Bedrock): fix application inference profile for pass-through endpoints for bedrock by @0x-fang in #13796
  • Fix e2e_ui_test by @NANDINI-star in #13881
  • [Performance] Use O(1) Set lookups for model routing by @ishaan-jaff in #13879
  • Update model metadata for Deepinfra provider by @Toy-97 in #13883
  • fix: fixing descriptor/response size mismatch on parallel_request_limiter_v3 by @luizrennocosta in #13863
  • [Feat] Add support for voyage-context-3 embedding model by @kankute-sameer in #13868
  • 🐛 Bug Fix: Updated URL handling for DataRobot provider URL by @carsongee in #13880
  • Async s3 implementation by @michal-otmianowski in #13852
  • fix: role chaining and session name with webauthentication for aws bedrock by @RichardoC in #13753
  • [Bug Fix] JS exception in User Agent Activity: Cannot read properties of undefined by @ishaan-jaff in #13892
  • [ui/dashboard] add support for host_vllm by @NiuBlibing in #13885
  • [Documentation] Litellm rerank deepinfra endpoint by @kankute-sameer in #13845
  • [MCP Gateway] fix StreamableHTTPSessionManager .run() error by @jugaldb in #13666
  • [Performance] Reduce Significant CPU overhead from litellm_logging.py by @ishaan-jaff in #13895
  • Fix Ollama transformations crash when tools are used with non-tool trained models by @bcdonadio in #13902
  • Add openrouter deepseek/deepseek-chat-v3.1 support by @kankute-sameer in #13897
  • docs: clarify prerequisites and env var for team rate limits by @TeddyAmkie in #13899
  • [Enhancement] Add support for Mistral model file handling and update documentation by @jinskjoy in #13866
  • fix permission access on prisma migrate in non-root image by @Ithanil in #13848
  • feat(utils.py): accept 'api_version' as param for validate_environment by @mainred in #13808
  • Responses API - support allowed_openai_params + Mistral - handle empty assistant content + support new mistral 'thinking' response block by @krrishdholakia in #13671
  • fix(openai/image_edits): Support 'mask' parameter for openai image edits by @krrishdholakia in #13673
  • SSO - Free SSO usage for up to 5 users + remove deprecated dbrx models (dbrx-instruct, llama 3.1) by @krrishdholakia in #13843
  • Fix calling key with access to model alias by @krrishdholakia in #13830
  • [Feat] New LLM API - AI/ML API for Image Gen by @ishaan-jaff in #13893
  • [Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS by @ishaan-jaff in #13905
  • Added FAQ under deployment docs by @mubashir1osmani in #13912
  • updated claude-code docs by @mubashir1osmani in #13784
  • [Feat] UI QA Fixes by @ishaan-jaff in #13915
  • [BUG] Add back supervisor to non-root image by @ArthurRenault in #13922
  • Add support for AWS assume_role with a session token by @stevenmanton in #13919
  • Fix missing and unused imports in custom_guardrail docs example by @uc4w6c in #13914
  • [UI QA] - Allow setting Team Member RPM/TPM limits when creating a team by @ishaan-jaff in #13943
  • [Bug fix] - Fix /messages fallback from Anthropic API -> Bedrock API by @ishaan-jaff in #13946
  • [Bug Fix] Azure Passthrough request with streaming by @ishaan-jaff in #13831
  • [Bug] Fix: Vertex Mistral not working for streaming by @ishaan-jaff in #13952
  • Add DeepSeek-v3.1 pricing for Fireworks AI provider by @TeddyAmkie in #13958
  • feat: add image headers for Copilot by @ckoehler in #13955
  • Verify if cache entry has expired prior to serving it to client by @michal-otmianowski in #13933
  • feat: multiple images in openai images/edits endpoint by @mubashir1osmani in #13916
  • Feature/braintrust span name metadata by @nielsbosma in #13573
  • fix: remove incorrect web search support for azure/gpt-4.1 family by @kankute-sameer in #13566
  • Update model prices and context window by @Yuki-Imajuku in #13567
  • [Feat] New model gemini-2.5-flash-image-preview by @ishaan-jaff in #13979
  • ⚡️ Speed up function _is_debugging_on by 45% by @codeflash-ai[bot] in #13988
  • [Bug]: Fix tests to reference moved attributes in braintrust_logging module by @ColeFrench in #13978
  • [Perf] 6.5x faster LiteLLM Python SDK Completion by @ishaan-jaff in #13990
  • [Perf] Use fastuuid for fast UUID generations - 2.1x Faster by @ishaan-jaff in #13992
  • bump orjson version to "3.11.2" by @dttran-glo in #13969
  • feat(constants): expand Nebius provider models and normalize model IDs by @manascb1344 in #13965
  • Deepinfra Metadata Update 24082025 by @Toy-97 in #13917
  • Add Noma Security guardrail support by @DorZion in #13572
  • Add openrouter gpt-5 family models pricing by @edwardsamuel in #13536
  • docs: Add CometAPI documentation with authentication, usage examples, and error handling by @TensorNull in #13534
  • Fix token_counter with special token input by @blahgeek in #13374
  • Enhance logging for containers to log on files both with usual format and json format by @Deviad in #13394
  • [Bug Fix] LLM Translation - Allow using dynamic api_key for image generation requests by @ishaan-jaff in #14007
  • [Feature]: Support Gemini requests with only system prompt by @ishaan-jaff in #14010
  • [Bug]: /responses endpoint proxy ignores extra_headers in GitHub Copilot by @XSAM in #13775
  • [Feat] langfuse_otel logger - allow using LANGFUSE_OTEL_HOST for configuring host by @ishaan-jaff in #14013
  • Fix issue #13995: Handle None metadata in batch requests by @xingyaoww in #13996
  • [Feat] Add support for returning images with gemini/gemini-2.5-flash-image-preview with /chat/completions by @ishaan-jaff in #13983
  • Update release notes with correct docker tag by @ishaan-jaff in #14014
  • ⚡️ Speed up InMemoryCache.evict_cache by 21% by @KRRT7 in #14012
  • [Bug Fix] Resolve invalid model name error for Gemini Imagen models by @ikaadil in #13991
  • feat: Add thinking and reasoning_effort parameter support for GitHub Copilot provider by @timelfrink in #13691
  • refactor(router): choose weights by 'weight', 'rpm', 'tpm' in one loop for simple_shuffle by @qidu in #13562
  • Update Pangea Guardrail to support new AIDR endpoint by @ryanmeans in #13160
  • Ensure that function_call_prompt extends system messages following its current schema by @nagyv in #13243
  • Remove vector store methods from global scope by @xywei in #12885
  • fix: make gemini and openai responses return reasoning by default by @aholmberg in #12865
  • Fix additional anyOf corner cases for Vertex AI Gemini tool calls - issue #11164 by @ericgtkb in #12797
  • feat: Add support for custom Anthropic-compatible API endpoints by @NoWall57 in #13945
  • [Bug Fix] Virtual keys with llm_api type cause Internal Server Error when using /anthropic/* and other llm passthrough routes by @ishaan-jaff in #14046
  • [MCP] Bug fix - adding SSE MCP tools - fix connection test when adding MCPs by @ishaan-jaff in #14048
  • [Perf] Use fastuuid for +80 RPS when using /chat/completions and other LLM endpoints by @ishaan-jaff in #14016
  • [Feat] Add xai/grok-code-fast model family by @ishaan-jaff in #14054
  • Fix LoggingWorker graceful shutdown to prevent CancelledError warnings by @lmwang9527 in #14050
  • Allow configuration to set threshold before request entry in spend log gets truncated by @WilsonSunBritten in #14042
  • [Helm charts] Enhance proxy_config configuration: add support for existing configmap by @Const-antine in #14041
  • 📖 Add DataRobot to the provider documentation. by @carsongee in #14038
  • Fix error saving latency as timedelta on Redis by @dmvieira in #14040
  • docs: add documentation for LITELLM_ANTHROPIC_DISABLE_URL_SUFFIX envi… by @NoWall57 in #14037
  • OCI Provider: add oci_key_file as an optional_parameter by @gotsysdba in #14036
  • Update docs for stream_timeout and timeout by @TeddyAmkie in #14073
  • [Perf] Proxy /chat/completions - don't print the request params by default (+50 RPS) by @ishaan-jaff in #14015
  • 📖 Added DataRobot provider to sidebar by @carsongee in #14074
  • [Bug]: Fix Can't set reasoning_effort for DeepSeek-V3.1 on DeepInfra by default by @ishaan-jaff in #14053
  • docs: usaged-based routing perf warnings by @mubashir1osmani in #14080
  • [Bug]: grok-4 does not support frequency_penalty, litellm should drop this param for grok-4 by @ishaan-jaff in #14078
  • docs: Add documentation for MAX_STRING_LENGTH_PROMPT_IN_DB environmen… by @WilsonSunBritten in #14079
  • feat: add gpt-realtime models - gpt-realtime by @ishaan-jaff in #14082
  • Fix Next.js Security Vulnerabilities in UI Dashboard by @ishaan-jaff in #14084
  • [Fix] LiteLLM does not support new web_search tool (Responses API) by @ishaan-jaff in #14083
  • Add supported text field to anthropic citation response by @TomeHirata in #14026
  • Bedrock fix structure output by @moshemorad in #14005
  • Fix collapsible navbar design by @NANDINI-star in #14075
  • [Bug]: Set user from token user_id for OpenMeter integration by @betterthanbreakfast in #13152
  • Add Vercel AI Gateway provider by @joshualipman123 in #13144
  • Fix indentation in get_llm_provider_logic.py by @superpoussin22 in #14088

New Contributors

Full Changelog: v1.75.8-stable...v1.76.1-nightly

Docker Run LiteLLM Proxy

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.76.1-nightly

Don't want to maintain your internal proxy? get in touch 🎉

Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat

Load Test LiteLLM Proxy Results

Name Status Median Response Time (ms) Average Response Time (ms) Requests/s Failures/s Request Count Failure Count Min Response Time (ms) Max Response Time (ms)
/chat/completions Passed ✅ 160.0 192.4000595502936 6.278691415340353 0.0 1879 0 117.62542199994641 1498.513451000008
Aggregated Passed ✅ 160.0 192.4000595502936 6.278691415340353 0.0 1879 0 117.62542199994641 1498.513451000008

Don't miss a new litellm release

NewReleases is sending notifications on new releases.