BerriAI/litellm v1.44.8 on GitHub

What's Changed

fix(sagemaker.py): support streaming for messages api by @krrishdholakia in #5376
feat(vertex_httpx.py): support 'functions' param for gemini google ai studio + vertex ai by @krrishdholakia in #5368
gemini context caching (openai format) support by @krrishdholakia in #5381
fix(factory.py): handle missing 'content' in cohere assistant messages by @miraclebakelaser in #5384
fix retry after - cooldown individual models based on their specific 'retry-after' header by @krrishdholakia in #5358
[Feat] Add Vertex AI21 support by @ishaan-jaff in #5391
[Feat] Add cohere rerank and together ai rerank by @ishaan-jaff in #5392
feat - add rerank on litellm proxy / gateway by @ishaan-jaff in #5394
build(deps): bump webpack from 5.93.0 to 5.94.0 in /docs/my-website by @dependabot in #5395
docs: add time.sleep() between streaming calls by @ajeetdsouza in #5402

Full Changelog: v1.44.7...v1.44.8

docker run \
-e STORE_MODEL_IN_DB=True \
-p 4000:4000 \
ghcr.io/berriai/litellm:main-v1.44.8

Name	Status	Median Response Time (ms)	Average Response Time (ms)	Requests/s	Failures/s	Request Count	Failure Count	Min Response Time (ms)	Max Response Time (ms)
/chat/completions	Passed ✅	85	103.94574065444324	6.4691141286237	0.0	1936	0	71.05850200002806	2103.6041039999986
Aggregated	Passed ✅	85	103.94574065444324	6.4691141286237	0.0	1936	0	71.05850200002806	2103.6041039999986