madroidmaq/mlx-omni-server v0.3.1 on GitHub

What's New

Support more MLX inference parameters, such as adapter_path, top_k, min_tokens_to_keep, min_p, presence_penalty, etc.
close #12

Usage Examples

OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:10240/v1",  # MLX Omni Server endpoint
    api_key="not-needed"
)

# Using  extra_body adapter_path
response = client.chat.completions.create(
    model="mlx-community/Llama-3.2-1B-Instruct-4bit",
    messages=[
        {"role": "user", "content": "What's the weather like today?"}
    ],
    extra_body={
        "adapter_path": "path/to/your/adapter",  # Path to fine-tuned adapter
    }
)

Curl :

curl http://localhost:10240/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "mlx-community/Llama-3.2-1B-Instruct-4bit",
    "messages": [
      {
        "role": "user",
        "content": "What'\''s the weather like today?"
      }
    ],
    "adapter_path": "path/to/your/adapter"
  }'

Full Changelog: v0.3.0...v0.3.1