BerriAI/litellm v1.43.18 on GitHub

What's Changed

[Feat] return x-litellm-key-remaining-requests-{model}: 1, x-litellm-key-remaining-tokens-{model}: None in response headers by @ishaan-jaff in #5259
[Feat] - Set tpm/rpm limits per Virtual Key + Model by @ishaan-jaff in #5256
[Feat] add prometheus metric for remaining rpm/tpm limit for (model, api_ley) by @ishaan-jaff in #5257
[Feat] read model + API key tpm/rpm limits from db by @ishaan-jaff in #5258
Pass-through endpoints for Gemini - Google AI Studio by @krrishdholakia in #5260
Fix incorrect message length check in cost calculator by @dhlidongming in #5219
[PRICING] Use specific llama2 and llama3 model names in Ollama by @kiriloman in #5221
[Feat-Proxy] set rpm/tpm limits per api key per model by @ishaan-jaff in #5261
Fixes the tool_use indexes not being correctly mapped by @Penagwin in #5232
[Feat-Proxy] Use model access groups for teams by @ishaan-jaff in #5263

Full Changelog: v1.43.17...v1.43.18

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.43.18

Name	Status	Median Response Time (ms)	Average Response Time (ms)	Requests/s	Failures/s	Request Count	Failure Count	Min Response Time (ms)	Max Response Time (ms)
/chat/completions	Passed ✅	84	98.74206359221253	6.528009406771848	0.0	1952	0	67.36751900001536	1687.1762119999971
Aggregated	Passed ✅	84	98.74206359221253	6.528009406771848	0.0	1952	0	67.36751900001536	1687.1762119999971