github mostlygeek/llama-swap v225

7 hours ago

Model Capabilities

The new capabilities configuration option makes it easy to define what modalities the model supports for input and output. Here is what the configuration looks like:

models:
    example_model:    
      # capabilities: defines what the model accepts for input, output and other metadata
      # - optional; omitted or all-zero means no capabilities
      # - used in v1/models to inform clients what the model can do
      capabilities:
        # in: list of modalities understood by the model
        # - default: []
        # - valid: text, audio, image
        in:
          - text
          - audio
          - image
        # out: list of modalities generated by the model
        # - default: []
        # - valid: text, audio, image
        out:
          - text
          - audio
          - image
        # tools: the model supports function calling
        # - default: false
        tools: true
  
        # reranker: the model supports the /v1/rerank endpoint
        # - default: false
        reranker: false
  
        # context: the maximum token context length supported
        # - default: 0
        # - must be an integer > 0
        context: 32000

    # capabilities can be written in a very condensed form
    image_gen:
        capabilities:
            in: [text]
            out: [image]
    speech_to_text:
        capabilities:
            in: [text]
            out: [audio]
    transcription:
        capabilities:
            in: [audio]
            out: [text]
    reranker:
        capabilities:
          reranker: true

When a client calls v1/models it will generate metadata that is compatible with mistral, openrouter and huggingface chat-ui formats.

{
  "data": [
    {
      "id": "image_gen",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "text"
        ],
        "modality": "text->image",
        "output_modalities": [
          "image"
        ]
      },
      "capabilities": {
        "image_generation": true
      }
    },
    {
      "id": "reranker",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "capabilities": {
        "reranker": true
      }
    },
    {
      "id": "speech_to_text",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "text"
        ],
        "modality": "text->audio",
        "output_modalities": [
          "audio"
        ]
      },
      "capabilities": {
        "audio_speech": true
      }
    },
    {
      "id": "transcription",
      "object": "model",
      "created": 1781420051,
      "owned_by": "llama-swap",
      "architecture": {
        "input_modalities": [
          "audio"
        ],
        "modality": "audio->text",
        "output_modalities": [
          "text"
        ]
      },
      "capabilities": {
        "audio_transcriptions": true
      }
    }
  ],
  "object": "list"
}

Other changes

  • Implementation of a new scheduler backend (#823). No functional changes for users but will make implementing different scheduling and swapping strategies a bit easier. This is just the first step and the goal is for anyone to customize llama-swap's behaviour with by implementing the new interfaces.
  • #839 is a follow up to improve abstractions and implementation boundaries for new schedulers / swappers
    • it also resolved the long standing #717! If you have api keys set in the configuration the UI will prompt for a password now :)
    • the /metrics endpoint requires an api key now. HTTP Basic Auth is supported so prometheus integration is a single step.

Changelog

Don't miss a new llama-swap release

NewReleases is sending notifications on new releases.