Model Capabilities
The new capabilities configuration option makes it easy to define what modalities the model supports for input and output. Here is what the configuration looks like:
models:
example_model:
# capabilities: defines what the model accepts for input, output and other metadata
# - optional; omitted or all-zero means no capabilities
# - used in v1/models to inform clients what the model can do
capabilities:
# in: list of modalities understood by the model
# - default: []
# - valid: text, audio, image
in:
- text
- audio
- image
# out: list of modalities generated by the model
# - default: []
# - valid: text, audio, image
out:
- text
- audio
- image
# tools: the model supports function calling
# - default: false
tools: true
# reranker: the model supports the /v1/rerank endpoint
# - default: false
reranker: false
# context: the maximum token context length supported
# - default: 0
# - must be an integer > 0
context: 32000
# capabilities can be written in a very condensed form
image_gen:
capabilities:
in: [text]
out: [image]
speech_to_text:
capabilities:
in: [text]
out: [audio]
transcription:
capabilities:
in: [audio]
out: [text]
reranker:
capabilities:
reranker: trueWhen a client calls v1/models it will generate metadata that is compatible with mistral, openrouter and huggingface chat-ui formats.
{
"data": [
{
"id": "image_gen",
"object": "model",
"created": 1781420051,
"owned_by": "llama-swap",
"architecture": {
"input_modalities": [
"text"
],
"modality": "text->image",
"output_modalities": [
"image"
]
},
"capabilities": {
"image_generation": true
}
},
{
"id": "reranker",
"object": "model",
"created": 1781420051,
"owned_by": "llama-swap",
"capabilities": {
"reranker": true
}
},
{
"id": "speech_to_text",
"object": "model",
"created": 1781420051,
"owned_by": "llama-swap",
"architecture": {
"input_modalities": [
"text"
],
"modality": "text->audio",
"output_modalities": [
"audio"
]
},
"capabilities": {
"audio_speech": true
}
},
{
"id": "transcription",
"object": "model",
"created": 1781420051,
"owned_by": "llama-swap",
"architecture": {
"input_modalities": [
"audio"
],
"modality": "audio->text",
"output_modalities": [
"text"
]
},
"capabilities": {
"audio_transcriptions": true
}
}
],
"object": "list"
}
Other changes
- Implementation of a new scheduler backend (#823). No functional changes for users but will make implementing different scheduling and swapping strategies a bit easier. This is just the first step and the goal is for anyone to customize llama-swap's behaviour with by implementing the new interfaces.
- #839 is a follow up to improve abstractions and implementation boundaries for new schedulers / swappers
- it also resolved the long standing #717! If you have api keys set in the configuration the UI will prompt for a password now :)
- the /metrics endpoint requires an api key now. HTTP Basic Auth is supported so prometheus integration is a single step.