What's Changed
- feat(internal_user_endpoints.py): new
/user/bulk_update
endpoint by @krrishdholakia in #12720 - fix(team_endpoints.py): ensure user id correctly added when new team … by @krrishdholakia in #12719
- Teams - allow setting custom key duration + show many user + service account keys have been created by @krrishdholakia in #12722
- Regenerate Key State Management and Authentication Issues by @NANDINI-star in #12729
- Fix AsyncMock error in team endpoints test by @colesmcintosh in #12730
- [Feat] Add
azure_ai/grok-3
model family + Cost tracking by @ishaan-jaff in #12732 - fixed comment in docs for anthropic provider by @jvanmelckebeke in #12725
- [Bug Fix] QA - Use PG Vector Vector Store with LiteLLM by @ishaan-jaff in #12716
- [Bug fix] s3 v2 log uploader crashes when using with guardrails by @ishaan-jaff in #12733
- chore(proxy): loosen rich version from ==13.7.1 to >=13.7.1 by @jlaurendi in #12704
- Add Hosted VLLM rerank provider integration by @jugaldb in #12738
- /streamGenerateContent - non-gemini model support by @krrishdholakia in #12647
- Anthropic - add tool cache control support by @krrishdholakia in #12668
- Health check app on separate port by @jugaldb in #12718
- Guardrails AI - support
llmOutput
based guardrails as pre-call hooks by @krrishdholakia in #12674 - [Prometheus] Move Prometheus to enterprise folder by @jugaldb in #12659
- [jais-30b-chat] added model to prices and context window by @jugaldb in #12739
- feat: integrate Google Cloud Model Armor guardrails by @colesmcintosh in #12492
- Add project_id to cached credentials for VertexAI by @doublerr in #12661
- [Feat] UI - Allow clicking into Vector Stores by @ishaan-jaff in #12741
- fix(lowest_latency.py): Handle ZeroDivisionError with zero completion tokens by @colesmcintosh in #12734
- build(deps): bump on-headers and compression in /docs/my-website by @dependabot[bot] in #12721
- [LLM Translation] Change System prompts to assistant prompts as a workaround for GH Copilot by @jugaldb in #12742
- [LLM Translation - Redis] fix: redis caching for embedding response models by @jugaldb in #12750
- [LLM Translation] Added model name formats by @jugaldb in #12745
- [Feat] LLM API Endpoint - Expose OpenAI Compatible /vector_stores/{vector_store_id}/search endpoint by @ishaan-jaff in #12749
- [Feat] UI Vector Stores - Allow adding Vertex RAG Engine, OpenAI, Azure by @ishaan-jaff in #12752
- feat: add v0 provider support by @colesmcintosh in #12751
New Contributors
- @jvanmelckebeke made their first contribution in #12725
- @jlaurendi made their first contribution in #12704
- @doublerr made their first contribution in #12661
Full Changelog: v1.74.5.dev1...v1.74.6-nightly
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.74.6-nightly
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 220.0 | 237.64228548393717 | 6.242987805521814 | 0.0 | 1868 | 0 | 193.6083079999662 | 1787.4751499999775 |
Aggregated | Passed ✅ | 220.0 | 237.64228548393717 | 6.242987805521814 | 0.0 | 1868 | 0 | 193.6083079999662 | 1787.4751499999775 |