Trans-N-ai/swama v1.1.0 on GitHub

🆕 What's New

Text Embeddings Support

New /v1/embeddings API endpoint - Full OpenAI compatibility
Built-in embedding generation for semantic search and RAG applications
Batch processing with automatic padding and optimization

Enhanced Chat Capabilities

System prompts support - Define AI assistant behavior with system messages
Multi-turn conversations - Maintain conversation history across requests
Full OpenAI ChatGPT API compatibility - Drop-in replacement for chat applications

Intelligent Memory Management

Automatic model eviction - Prevents GPU memory exhaustion with smart cleanup
Usage-based prioritization - Keeps frequently used models in memory
Concurrent inference control - Safe parallel processing with per-model locking

🚀 Usage

Generate Embeddings

curl -X POST http://localhost:28100/v1/embeddings \
  -H "Content-Type: application/json" \
  -d '{
    "input": ["Hello world", "Text embeddings"],
    "model": "mlx-community/Qwen3-Embedding-0.6B-4bit-DWQ"
  }'

Chat with System Prompts

curl -X POST http://localhost:28100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "system", "content": "You are a helpful math tutor."},
      {"role": "user", "content": "Explain quadratic equations."}
    ],
    "model": "qwen3"
  }'

Multi-turn Conversations

curl -X POST http://localhost:28100/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [
      {"role": "user", "content": "My name is Alice."},
      {"role": "assistant", "content": "Nice to meet you, Alice!"},
      {"role": "user", "content": "What is my name?"}
    ],
    "model": "qwen3"
  }'

📦 Download

Download Swama v1.1.0

🔄 Upgrade Notes

If upgrading from a previous version: After installing the new version, open Swama from the menu bar and click "Install Command Line Tool…" to update the CLI tools

🔧 Requirements

macOS 14.0+
Apple Silicon (M1/M2/M3/M4)

What's Changed

Fix: Correct command examples in README files by @djx-trans-n in #2
Fix Chinese ReadMe for downloading by @mmRose-MIAO in #4
FEAT:Support model aliases for api model handler by @mmRose-MIAO in #6
feat: Add comprehensive AI model performance benchmark script (Ollama vs Swama) by @syh-trans-n in #7
feat: add comprehensive AI model benchmark script with expanded model support by @syh-trans-n in #9
Fix support info readme by @Li-Haojie-1106 in #11
add embedding models support by @sxy-trans-n in #13
Add Chat Message Support and Memory Management by @sxy-trans-n in #14

New Contributors

@djx-trans-n made their first contribution in #2
@mmRose-MIAO made their first contribution in #4
@syh-trans-n made their first contribution in #7
@Li-Haojie-1106 made their first contribution in #11

Full Changelog: v1.0.0...v1.1.0