BerriAI/litellm v1.23.7 on GitHub

[FEAT] ui - view total proxy spend / budget by @ishaan-jaff in #1915
[FEAT] Bedrock set timeouts on litellm.completion by @ishaan-jaff in #1919
[FEAT] Use LlamaIndex with Proxy - Support azure deployments for /embeddings - by @ishaan-jaff in #1921
[FIX] Verbose Logger - don't double print CURL command by @ishaan-jaff in #1924
[FEAT] Set timeout for bedrock on proxy by @ishaan-jaff in #1922
feat(proxy_server.py): show admin global spend as time series data by @krrishdholakia in #1920

1. Bedrock Set Timeouts

Usage - litellm.completion

response = litellm.completion(
    model="bedrock/anthropic.claude-instant-v1",
    timeout=0.01,
    messages=[{"role": "user", "content": "hello, write a 20 pg essay"}],
)

Usage on Proxy config.yaml

model_list:
  - model_name: BEDROCK_GROUP
    litellm_params:
      model: bedrock/cohere.command-text-v14
      timeout: 0.0001

2 View total proxy spend / budget

3. Use LlamaIndex with Proxy - Support azure deployments for /embeddings

Send Embedding requests like this

http://0.0.0.0:4000/openai/deployments/azure-embedding-model/embeddings?api-version=2023-07-01-preview

This allow users to use llama index AzureOpenAI with LiteLLM

Use LlamaIndex with LiteLLM Proxy

import os, dotenv

from dotenv import load_dotenv

load_dotenv()

from llama_index.llms import AzureOpenAI
from llama_index.embeddings import AzureOpenAIEmbedding
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext

llm = AzureOpenAI(
    engine="azure-gpt-3.5",
    temperature=0.0,
    azure_endpoint="http://0.0.0.0:4000",
    api_key="sk-1234",
    api_version="2023-07-01-preview",
)

embed_model = AzureOpenAIEmbedding(
    deployment_name="azure-embedding-model",
    azure_endpoint="http://0.0.0.0:4000",
    api_key="sk-1234",
    api_version="2023-07-01-preview",
)


# response = llm.complete("The sky is a beautiful blue and")
# print(response)

documents = SimpleDirectoryReader("llama_index_data").load_data()
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)

Full Changelog: v1.23.5...v1.23.7