BerriAI/litellm v1.41.4 on GitHub

What's Changed

fix(router.py): disable cooldowns by @krrishdholakia in #4497
fix(slack_alerting.py): use in-memory cache for checking request status by @krrishdholakia in #4520
feat(vertex_httpx.py): Support cachedContent. by @Manouchehri in #4492
[Fix+Test] /audio/transcriptions - use initialized OpenAI / Azure OpenAI clients by @ishaan-jaff in #4519
[Fix-Proxy] Background health checks use deep copy of model list for _run_background_health_check by @ishaan-jaff in #4518
refactor(azure.py): move azure dall-e calls to httpx client by @krrishdholakia in #4523
feat(dynamic_rate_limiter.py): support dynamic rate limiting on rpm by @krrishdholakia in #4502
[Enterprise] Check if Key should run secret_detection callback by @ishaan-jaff in #4524
[Feat] Control Lakera AI per Request by @ishaan-jaff in #4525

Full Changelog: v1.41.3...v1.41.4

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.41.4

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.41.4

Name	Status	Median Response Time (ms)	Average Response Time (ms)	Requests/s	Failures/s	Request Count	Failure Count	Min Response Time (ms)	Max Response Time (ms)
/chat/completions	Passed ✅	150.0	174.7746187513257	6.304554345115949	0.0	1886	0	120.43884399997751	1842.0690810000337
Aggregated	Passed ✅	150.0	174.7746187513257	6.304554345115949	0.0	1886	0	120.43884399997751	1842.0690810000337