- [FEAT] ui - view total proxy spend / budget by @ishaan-jaff in #1915
- [FEAT] Bedrock set timeouts on litellm.completion by @ishaan-jaff in #1919
- [FEAT] Use LlamaIndex with Proxy - Support azure deployments for /embeddings - by @ishaan-jaff in #1921
- [FIX] Verbose Logger - don't double print CURL command by @ishaan-jaff in #1924
- [FEAT] Set timeout for bedrock on proxy by @ishaan-jaff in #1922
- feat(proxy_server.py): show admin global spend as time series data by @krrishdholakia in #1920
1. Bedrock Set Timeouts
Usage - litellm.completion
response = litellm.completion(
model="bedrock/anthropic.claude-instant-v1",
timeout=0.01,
messages=[{"role": "user", "content": "hello, write a 20 pg essay"}],
)
Usage on Proxy config.yaml
model_list:
- model_name: BEDROCK_GROUP
litellm_params:
model: bedrock/cohere.command-text-v14
timeout: 0.0001
2 View total proxy spend / budget
3. Use LlamaIndex with Proxy - Support azure deployments for /embeddings
Send Embedding requests like this
http://0.0.0.0:4000/openai/deployments/azure-embedding-model/embeddings?api-version=2023-07-01-preview
This allow users to use llama index AzureOpenAI with LiteLLM
Use LlamaIndex with LiteLLM Proxy
import os, dotenv
from dotenv import load_dotenv
load_dotenv()
from llama_index.llms import AzureOpenAI
from llama_index.embeddings import AzureOpenAIEmbedding
from llama_index import VectorStoreIndex, SimpleDirectoryReader, ServiceContext
llm = AzureOpenAI(
engine="azure-gpt-3.5",
temperature=0.0,
azure_endpoint="http://0.0.0.0:4000",
api_key="sk-1234",
api_version="2023-07-01-preview",
)
embed_model = AzureOpenAIEmbedding(
deployment_name="azure-embedding-model",
azure_endpoint="http://0.0.0.0:4000",
api_key="sk-1234",
api_version="2023-07-01-preview",
)
# response = llm.complete("The sky is a beautiful blue and")
# print(response)
documents = SimpleDirectoryReader("llama_index_data").load_data()
service_context = ServiceContext.from_defaults(llm=llm, embed_model=embed_model)
index = VectorStoreIndex.from_documents(documents, service_context=service_context)
query_engine = index.as_query_engine()
response = query_engine.query("What did the author do growing up?")
print(response)
Full Changelog: v1.23.5...v1.23.7