What's Changed
- Update ollama.py for image handling by @rick-github in #2888
- fix(anthropic.py): fix parallel streaming on anthropic.py by @krrishdholakia in #3883
- feat(proxy_server.py): Time to first token Request-level breakdown by @krrishdholakia in #3886
- [BETA-Feature] Add OpenAI
v1/batches
Support on LiteLLM SDK by @ishaan-jaff in #3882 - feat - router add abatch_completion - N Models, M Messages by @ishaan-jaff in #3889
- [Feat] LiteLLM Proxy Add
POST /v1/files
andGET /v1/files
by @ishaan-jaff in #3888 - [Feat] LiteLLM Proxy - Add support for
POST /v1/batches
,GET /v1/batches
by @ishaan-jaff in #3885 - feat(router.py): support fastest response batch completion call by @krrishdholakia in #3887
Full Changelog: v1.38.12...v1.39.2
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.39.2
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 72 | 83.46968387564114 | 6.529958043991633 | 0.0 | 1954 | 0 | 61.38368400002037 | 678.4462749999989 |
Aggregated | Passed ✅ | 72 | 83.46968387564114 | 6.529958043991633 | 0.0 | 1954 | 0 | 61.38368400002037 | 678.4462749999989 |