mostlygeek/llama-swap v225 on GitHub

Model Capabilities

The new capabilities configuration option makes it easy to define what modalities the model supports for input and output. Here is what the configuration looks like:

models:
    example_model:    
      # capabilities: defines what the model accepts for input, output and other metadata
      # - optional; omitted or all-zero means no capabilities
      # - used in v1/models to inform clients what the model can do
      capabilities:
        # in: list of modalities understood by the model
        # - default: []
        # - valid: text, audio, image
        in:
          - text
          - audio
          - image
        # out: list of modalities generated by the model
        # - default: []
        # - valid: text, audio, image
        out:
          - text
          - audio
          - image
        # tools: the model supports function calling
        # - default: false
        tools: true
  
        # reranker: the model supports the /v1/rerank endpoint
        # - default: false
        reranker: false
  
        # context: the maximum token context length supported
        # - default: 0
        # - must be an integer > 0
        context: 32000

    # capabilities can be written in a very condensed form
    image_gen:
        capabilities:
            in: [text]
            out: [image]
    speech_to_text:
        capabilities:
            in: [text]
            out: [audio]
    transcription:
        capabilities:
            in: [audio]
            out: [text]
    reranker:
        capabilities:
          reranker: true

When a client calls v1/models it will generate metadata that is compatible with mistral, openrouter and huggingface chat-ui formats.

{
  "data": [
    {
      "id": "image_gen",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "text"
        ],
        "modality": "text->image",
        "output_modalities": [
          "image"
        ]
      },
      "capabilities": {
        "image_generation": true
      }
    },
    {
      "id": "reranker",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "capabilities": {
        "reranker": true
      }
    },
    {
      "id": "speech_to_text",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "text"
        ],
        "modality": "text->audio",
        "output_modalities": [
          "audio"
        ]
      },
      "capabilities": {
        "audio_speech": true
      }
    },
    {
      "id": "transcription",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "audio"
        ],
        "modality": "audio->text",
        "output_modalities": [
          "text"
        ]
      },
      "capabilities": {
        "audio_transcriptions": true
      }
    }
  ],
  "object": "list"
}

Other changes

Implementation of a new scheduler backend (#823). No functional changes for users but will make implementing different scheduling and swapping strategies a bit easier. This is just the first step and the goal is for anyone to customize llama-swap's behaviour with by implementing the new interfaces.
#839 is a follow up to improve abstractions and implementation boundaries for new schedulers / swappers
- it also resolved the long standing #717! If you have api keys set in the configuration the UI will prompt for a password now :)
- the /metrics endpoint requires an api key now. HTTP Basic Auth is supported so prometheus integration is a single step.

Changelog

92b9044 Model capabilities 734 (#842)
62aea0e internal/router,server,shared: refactor auth, libs (#839)
8c660dc main: gofmt
f6877b8 main: show message when listening on network (#836)
9b3a33d Implement new scheduler (#823)