What's Changed
- [Feat] return
x-litellm-key-remaining-requests-{model}
: 1,x-litellm-key-remaining-tokens-{model}: None
in response headers by @ishaan-jaff in #5259 - [Feat] - Set tpm/rpm limits per Virtual Key + Model by @ishaan-jaff in #5256
- [Feat] add prometheus metric for remaining rpm/tpm limit for (model, api_ley) by @ishaan-jaff in #5257
- [Feat] read model + API key tpm/rpm limits from db by @ishaan-jaff in #5258
- Pass-through endpoints for Gemini - Google AI Studio by @krrishdholakia in #5260
- Fix incorrect message length check in cost calculator by @dhlidongming in #5219
- [PRICING] Use specific llama2 and llama3 model names in Ollama by @kiriloman in #5221
- [Feat-Proxy] set rpm/tpm limits per api key per model by @ishaan-jaff in #5261
- Fixes the
tool_use
indexes not being correctly mapped by @Penagwin in #5232 - [Feat-Proxy] Use model access groups for teams by @ishaan-jaff in #5263
New Contributors
- @dhlidongming made their first contribution in #5219
- @Penagwin made their first contribution in #5232
Full Changelog: v1.43.17...v1.43.18
Docker Run LiteLLM Proxy
docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.43.18
Don't want to maintain your internal proxy? get in touch 🎉
Hosted Proxy Alpha: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
Load Test LiteLLM Proxy Results
Name | Status | Median Response Time (ms) | Average Response Time (ms) | Requests/s | Failures/s | Request Count | Failure Count | Min Response Time (ms) | Max Response Time (ms) |
---|---|---|---|---|---|---|---|---|---|
/chat/completions | Passed ✅ | 84 | 98.74206359221253 | 6.528009406771848 | 0.0 | 1952 | 0 | 67.36751900001536 | 1687.1762119999971 |
Aggregated | Passed ✅ | 84 | 98.74206359221253 | 6.528009406771848 | 0.0 | 1952 | 0 | 67.36751900001536 | 1687.1762119999971 |