BerriAI/litellm v1.18.12 on GitHub

What's Changed

1. UI Improvements by @ishaan-jaff in #1575

New Endpoints - [Spend Tracking]

/spend/keys - View all keys created, ordered by spend
/spend/logs - View all spend logs, if request_id is provided, only logs for that request_id will be returned

Example response for /spend/logs

[{"request_id":"chatcmpl-6dcb2540-d3d7-4e49-bb27-291f863f112e","call_type":"acompletion","api_key":"51**","spend":7.25e-05,"startTime":"2024-01-18T18:07:54.071000Z","endTime":"2024-01-18T18:07:55.510000Z","model":"cohere.command-text-v14","user":"litellm-is-awesome-user","modelParameters":{},"messages":[{"role":"user","content":"what llm are you-444"}],"response":"{\"id\":\"chatcmpl-6dcb2540-d3d7-4e49-bb27-291f863f112e\",\"choices\":[{\"finish_reason\":\"stop\",\"index\":0,\"message\":{\"content\":\" I am an AI chatbot, trained as an LLM by Cohere. Is there anything else you would like to know about LLMs? \",\"role\":\"assistant\"}}],\"created\":1705630075,\"model\":\"cohere.command-text-v14\",\"object\":\"chat.completion\",\"system_fingerprint\":null,\"usage\":{\"prompt_tokens\":7,\"completion_tokens\":31,\"total_tokens\":38}}","usage":{"total_tokens":38,"prompt_tokens":7,"completion_tokens":31},"metadata":{"headers":{"host":"0.0.0.0:8000","accept":"*/*","user-agent":"curl/7.88.1","content-type":"application/json","authorization":"Bearer sk-zNhplEZxJ-yOuZkqRvwO_g","content-length":"224"},"endpoint":"http://0.0.0.0:8000/chat/completions","deployment":"bedrock/cohere.command-text-v14","model_group":"BEDROCK_GROUP","user_api_key":"2b6e4ff4c14f67b6ccd7318948f0d66fe8724099224baeaf4733792a2048aeb3","caching_groups":null,"user_api_key_user_id":"ishaan2@berri.ai","user_api_key_metadata":{}},"cache_hit":""}]

2. [Feat] Make Proxy Auth Exceptions OpenAI compatible by @ishaan-jaff in #1576

🚨 If you use Custom Auth on the Proxy, It will actually raise an now https://docs.litellm.ai/docs/proxy/virtual_keys#custom-auth

Auth Errors Before PR - using OpenAI JS/Python

AuthenticationError: 401 status code (no body)
    at APIError.generate (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/node_modules/openai/error.js:47:20)
    at OpenAI.makeStatusError (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/node_modules/openai/core.js:255:33)
    at OpenAI.makeRequest (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/node_modules/openai/core.js:294:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async runOpenAI (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/test_openai_js.js:12:22) {
  status: 401,
  headers: {
    'content-length': '117',
    'content-type': 'application/json',
    date: 'Mon, 22 Jan 2024 21:21:14 GMT',
    server: 'uvicorn'
  },
  error: undefined,
  code: undefined,
  param: undefined,
  type: undefined
}

Auth Errors After PR - using OpenAI JS/Python

AuthenticationError: 401 Authentication Error: invalid user key - token does not exist
    at APIError.generate (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/node_modules/openai/error.js:47:20)
    at OpenAI.makeStatusError (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/node_modules/openai/core.js:255:33)
    at OpenAI.makeRequest (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/node_modules/openai/core.js:294:30)
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
    at async runOpenAI (/Users/ishaanjaffer/Github/litellm/litellm/proxy/tests/test_openai_js.js:12:22) {
  status: 401,
  headers: {
    'content-length': '131',
    'content-type': 'application/json',
    date: 'Tue, 23 Jan 2024 00:26:52 GMT',
    server: 'uvicorn'
  },
  error: {
    message: 'Authentication Error: invalid user key - token does not exist',
    type: 'auth_error',
    param: 'None',
    code: 401
  },
  code: 401,
  param: 'None',
  type: 'auth_error'

3. [Fix] Streaming - Use same `response_id` across chunks by @ishaan-jaff in #1572

Streaming Responses (for non OpenAI/Azure) After this PR - `id` is the same across chunks

ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='The', role='assistant'))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' sky', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=',', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' a', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' canvas', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' of', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' blue', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' and', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=' gold', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content=',', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())
chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2
ModelResponse(id='chatcmpl-8e5a8a5f-46ba-47d9-8858-c4d81aa4efc2', choices=[StreamingChoices(finish_reason=None, index=0, delta=Delta(content='\n', role=None))], created=1706043294, model='berri-benchmarking-Llama-2-70b-chat-hf-4', object='chat.completion.chunk', system_fingerprint=None, usage=Usage())

[WIP] fix(utils.py): fix proxy streaming spend tracking by @krrishdholakia in #1574
feat(proxy/utils.py): Support keys with budget duration (resets spend on duration end) by @krrishdholakia in #1570

Full Changelog: v1.18.11...v1.18.12

BerriAI/litellm v1.18.12 on GitHub

What's Changed

1. UI Improvements by @ishaan-jaff in #1575

New Endpoints - [Spend Tracking]

2. [Feat] Make Proxy Auth Exceptions OpenAI compatible by @ishaan-jaff in #1576

Auth Errors Before PR - using OpenAI JS/Python

Auth Errors After PR - using OpenAI JS/Python

3. [Fix] Streaming - Use same response_id across chunks by @ishaan-jaff in #1572

Streaming Responses (for non OpenAI/Azure) After this PR - id is the same across chunks

BerriAI/litellm v1.18.12
on GitHub

3. [Fix] Streaming - Use same `response_id` across chunks by @ishaan-jaff in #1572

Streaming Responses (for non OpenAI/Azure) After this PR - `id` is the same across chunks