github mudler/LocalAI v3.11.0

13 hours ago

🎉 LocalAI 3.11.0 Release! 🚀




LocalAI 3.11.0 is a massive update for Audio and Multimodal capabilities.

We are introducing Realtime Audio Conversations, a dedicated Music Generation UI, and a massive expansion of ASR (Speech-to-Text) and TTS backends. Whether you want to talk to your AI, clone voices, transcribe with speaker identification, or generate songs, this release has you covered.

Check out the highlights below!


📌 TL;DR

Feature Summary
Realtime Audio Native support for audio conversations, enabling fluid voice interactions similar to OpenAI's Realtime API. Documentation
Music Generation UI New UI interface for MusicGen (Ace-Step), allowing you to generate music from text prompts directly in the browser.
New ASR Backends Added WhisperX (with Speaker Diarization), VibeVoice, Qwen-ASR, and Nvidia NeMo.
TTS Streaming Text-to-Speech now supports streaming mode for lower latency responses. (VoxCPM only for now)
vLLM Omni Added support for vLLM Omni, expanding our high-performance inference capabilities.
Speaker Diarization Native support for identifying different speakers in transcriptions via WhisperX.
Hardware Expansion Expanded build support for CUDA 12/13, L4T (Jetson), SBSA, and better Metal (Apple Silicon) integration with MLX backends
Breaking Changes ExLlama (deprecated) and Bark (unmaintained) backends have been removed.

🚀 New Features & Major Enhancements

🎙️ Realtime Audio Conversations

LocalAI 3.11.0 introduces native support for Realtime Audio Conversations.

  • Enables fluid, low-latency voice interaction with agents.
  • Logic handled directly within the LocalAI pipeline for seamless audio-in/audio-out workflows.
  • Support for STT/TTS and voice-to-voice models (experimental)
  • Support for tool calls

🗣️ Talk to your LocalAI: This brings us one step closer to a fully local, voice-native assistant experience compatible with standard client implementations.

Check here for detailed documentation.


🎵 Music Generation UI & Ace-Step

We have added a dedicated interface for music generation!

  • New Backend: Support for Ace-Step (MusicGen) via the ace-step backend.
  • Web UI Integration: Generate musical clips directly from the LocalAI Web UI.
  • Simple text-to-music workflow (e.g., "Lo-fi hip hop beat for studying").
Screenshot 2026-02-07 at 23-32-00 LocalAI - Generate sound with ace-step-turbo

🎧 Massive ASR (Speech-to-Text) Expansion

This release significantly broadens our transcription capabilities with four new backends:

  1. WhisperX: Provides fast transcription with Speaker Diarization (identifying who is speaking).
  2. VibeVoice: Now supports also ASR alongside TTS.
  3. Qwen-ASR: Support for Qwen's powerful speech recognition models.
  4. Nvidia NeMo: Initial support for NeMo ASR.

🗣️ TTS Streaming & New Voices

Text-to-Speech gets a speed boost and new options:

  • Streaming Support: TTS endpoints now support streaming, reducing the "time-to-first-audio" significantly.
  • VoxCPM: Added support for the VoxCPM backend.
  • Qwen-TTS: Added support for Qwen-TTS models
  • Piper Voices: Added most remaining Piper voices from Hugging Face to the gallery.

🛠️ Hardware & Backend Updates

  • vLLM Omni: A new backend integration for vLLM Omni models.
  • Extended Platform Support: Major work on MLX to improve compatibility across CUDA 12, CUDA 13, L4T (Nvidia Jetson), SBSA, and macOS Metal.
  • GGUF Cleanup: Dropped redundant VRAM estimation logic for GGUF loading, relying on more accurate internal measurements.

⚠️ Breaking Changes

To keep the project lean and maintainable, we have removed some older backends:

  • ExLlama: Removed (deprecated in favor of newer loaders like ExLlamaV2 or llama.cpp).
  • Bark: Removed (the upstream project is unmaintained; we recommend using the new TTS alternatives).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall


❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

  • Star the repo
  • 💬 Contribute code, docs, or feedback
  • 📣 Share with others

Your support keeps this stack alive.


✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

  • chore(exllama): drop backend now almost deprecated by @mudler in #8186

Bug fixes 🐛

  • fix(ui): correctly display selected image model by @dedyf5 in #8208
  • fix(ui): take account of reasoning in token count calculation by @mudler in #8324
  • fix: drop gguf VRAM estimation (now redundant) by @mudler in #8325
  • fix(api): Add missing field in initial OpenAI streaming response by @acon96 in #8341
  • fix(realtime): Include noAction function in prompt template and handle tool_choice by @richiejp in #8372
  • fix: filter GGUF and GGML files from model list by @Yaroslav98214 in #8397
  • fix(qwen-asr): Remove contagious slop (DEFAULT_GOAL) from Makefile by @richiejp in #8431

Exciting New Features 🎉

  • feat(vllm-omni): add new backend by @mudler in #8188
  • feat(vibevoice): add ASR support by @mudler in #8222
  • feat: add VoxCPM tts backend by @mudler in #8109
  • feat(realtime): Add audio conversations by @richiejp in #6245
  • feat(qwen-asr): add support to qwen-asr by @mudler in #8281
  • feat(tts): add support for streaming mode by @mudler in #8291
  • feat(api): Add transcribe response format request parameter & adjust STT backends by @nanoandrew4 in #8318
  • feat(whisperx): add whisperx backend for transcription with speaker diarization by @eureka928 in #8299
  • feat(mlx): Add support for CUDA12, CUDA13, L4T, SBSA and CPU by @mudler in #8380
  • feat(musicgen): add ace-step and UI interface by @mudler in #8396
  • fix(api)!: Stop model prior to deletion by @nanoandrew4 in #8422
  • feat(nemo): add Nemo (only asr for now) backend by @mudler in #8436

🧠 Models

  • chore(model gallery): add qwen3-tts to model gallery by @mudler in #8187
  • chore(model gallery): Add most of not yet present Piper voices from Hugging Face by @rampa3 in #8202
  • chore: drop bark which is unmaintained by @mudler in #8207
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8220
  • chore(model gallery): Add entry for Mistral Small 3.1 with mmproj by @rampa3 in #8247
  • chore(model gallery): Add entry for Magistral Small 1.2 with mmproj by @rampa3 in #8248
  • chore(model gallery): Add mistral-community/pixtral-12b with mmproj by @rampa3 in #8245
  • chore(model gallery): add z-image and z-image-turbo for diffusers by @mudler in #8260
  • fix(qwen3): Be explicit with function calling format by @richiejp in #8265
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #8285
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #8307
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8321
  • chore(model gallery): Rename downloaded filename for Magistral Small mmproj by @rampa3 in #8327
  • chore(model gallery): Add Qwen 3 VL 8B thinking & instruct by @rampa3 in #8329
  • feat(metal): try to extend support to remaining backends by @mudler in #8374
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8381
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #8420
  • chore(models): Add Qwen TTS 0.6b by @richiejp in #8428

👒 Dependencies

  • chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory by @dependabot[bot] in #8175
  • chore: re-enable e2e tests, fixups anthropic API tools support by @mudler in #8296
  • chore(cuda): target 12.8 for 12 to increase compatibility by @mudler in #8297
  • chore(deps): bump appleboy/ssh-action from 1.2.4 to 1.2.5 by @dependabot[bot] in #8352
  • chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory by @dependabot[bot] in #8360
  • chore(deps): bump go.opentelemetry.io/otel/metric from 1.39.0 to 1.40.0 by @dependabot[bot] in #8353
  • chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.19.0 to 1.20.0 by @dependabot[bot] in #8355
  • chore(deps): bump protobuf from 6.33.4 to 6.33.5 in /backend/python/transformers by @dependabot[bot] in #8356
  • chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.39.0 to 1.40.0 by @dependabot[bot] in #8354
  • chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.61.0 to 0.62.0 by @dependabot[bot] in #8359
  • chore(deps): bump sentence-transformers from 5.2.0 to 5.2.2 in /backend/python/transformers by @dependabot[bot] in #8358
  • chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 by @dependabot[bot] in #8357
  • chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory by @dependabot[bot] in #8407
  • feat(audio): set audio content type by @mudler in #8416

Other Changes

  • Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory" by @mudler in #8180
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8182
  • chore: ⬆️ Update ggml-org/llama.cpp to 557515be1e93ed8939dd8a7c7d08765fdbe8be31 by @localai-bot in #8183
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to fa61ea744d1a87fa26a63f8a86e45587bc9534d6 by @localai-bot in #8184
  • chore: ⬆️ Update ggml-org/llama.cpp to bb02f74c612064947e51d23269a1cf810b67c9a7 by @localai-bot in #8196
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 43e829f21966abb96b08c712bccee872dc820914 by @localai-bot in #8215
  • chore: ⬆️ Update ggml-org/llama.cpp to 0440bfd1605333726ea0fb7a836942660bf2f9a6 by @localai-bot in #8216
  • chore: ⬆️ Update ggml-org/llama.cpp to 8f80d1b254aef70a0959e314be368d05debe7294 by @localai-bot in #8229
  • chore: ⬆️ Update ggml-org/llama.cpp to 2b4cbd2834e427024bc7f935a1f232aecac6679b by @localai-bot in #8258
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to e411520407663e1ddf8ff2e5ed4ff3a116fbbc97 by @localai-bot in #8274
  • chore(llama.cpp): bump to 'f6b533d898ce84bae8d9fa8dfc6697ac087800bf' by @mudler in #8275
  • chore: ⬆️ Update ggml-org/llama.cpp to 4fdbc1e4dba428ce0cf9d2ac22232dc170bbca82 by @localai-bot in #8283
  • feat(swagger): update swagger by @localai-bot in #8304
  • chore: ⬆️ Update ggml-org/whisper.cpp to aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075 by @localai-bot in #8305
  • chore: ⬆️ Update ggml-org/llama.cpp to 1488339138d609139c4400d1b80f8a5b1a9a203c by @localai-bot in #8306
  • chore: ⬆️ Update ggml-org/llama.cpp to 41ea26144e55d23f37bb765f88c07588d786567f by @localai-bot in #8317
  • chore: ⬆️ Update ggml-org/llama.cpp to 2634ed207a17db1a54bd8df0555bd8499a6ab691 by @localai-bot in #8336
  • Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory" by @mudler in #8367
  • fix(docs): Promote DEBUG=false in production docker compose by @JonasBernard in #8390
  • chore: ⬆️ Update ggml-org/whisper.cpp to 941bdabbe4561bc6de68981aea01bc5ab05781c5 by @localai-bot in #8398
  • chore: ⬆️ Update ggml-org/llama.cpp to b536eb023368701fe3564210440e2df6151c3e65 by @localai-bot in #8399
  • Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory" by @mudler in #8412
  • feat(swagger): update swagger by @localai-bot in #8418
  • chore: ⬆️ Update ggml-org/llama.cpp to 22cae832188a1f08d18bd0a707a4ba5cd03c7349 by @localai-bot in #8419
  • chore(docs): Document using a local model gallery by @richiejp in #8426
  • chore: ⬆️ Update ggml-org/llama.cpp to b83111815e9a79949257e9d4b087206b320a3063 by @localai-bot in #8434
  • chore: ⬆️ Update ggml-org/llama.cpp to 8872ad2125336d209a9911a82101f80095a9831d by @localai-bot in #8448

New Contributors

Full Changelog: v3.10.1...v3.11.0

Don't miss a new LocalAI release

NewReleases is sending notifications on new releases.