🎉 LocalAI 3.11.0 Release! 🚀
LocalAI 3.11.0 is a massive update for Audio and Multimodal capabilities.
We are introducing Realtime Audio Conversations, a dedicated Music Generation UI, and a massive expansion of ASR (Speech-to-Text) and TTS backends. Whether you want to talk to your AI, clone voices, transcribe with speaker identification, or generate songs, this release has you covered.
Check out the highlights below!
📌 TL;DR
| Feature | Summary |
|---|---|
| Realtime Audio | Native support for audio conversations, enabling fluid voice interactions similar to OpenAI's Realtime API. Documentation |
| Music Generation UI | New UI interface for MusicGen (Ace-Step), allowing you to generate music from text prompts directly in the browser. |
| New ASR Backends | Added WhisperX (with Speaker Diarization), VibeVoice, Qwen-ASR, and Nvidia NeMo. |
| TTS Streaming | Text-to-Speech now supports streaming mode for lower latency responses. (VoxCPM only for now) |
| vLLM Omni | Added support for vLLM Omni, expanding our high-performance inference capabilities. |
| Speaker Diarization | Native support for identifying different speakers in transcriptions via WhisperX. |
| Hardware Expansion | Expanded build support for CUDA 12/13, L4T (Jetson), SBSA, and better Metal (Apple Silicon) integration with MLX backends |
| Breaking Changes | ExLlama (deprecated) and Bark (unmaintained) backends have been removed. |
🚀 New Features & Major Enhancements
🎙️ Realtime Audio Conversations
LocalAI 3.11.0 introduces native support for Realtime Audio Conversations.
- Enables fluid, low-latency voice interaction with agents.
- Logic handled directly within the LocalAI pipeline for seamless audio-in/audio-out workflows.
- Support for STT/TTS and voice-to-voice models (experimental)
- Support for tool calls
🗣️ Talk to your LocalAI: This brings us one step closer to a fully local, voice-native assistant experience compatible with standard client implementations.
Check here for detailed documentation.
🎵 Music Generation UI & Ace-Step
We have added a dedicated interface for music generation!
- New Backend: Support for Ace-Step (MusicGen) via the
ace-stepbackend. - Web UI Integration: Generate musical clips directly from the LocalAI Web UI.
- Simple text-to-music workflow (e.g., "Lo-fi hip hop beat for studying").
🎧 Massive ASR (Speech-to-Text) Expansion
This release significantly broadens our transcription capabilities with four new backends:
- WhisperX: Provides fast transcription with Speaker Diarization (identifying who is speaking).
- VibeVoice: Now supports also ASR alongside TTS.
- Qwen-ASR: Support for Qwen's powerful speech recognition models.
- Nvidia NeMo: Initial support for NeMo ASR.
🗣️ TTS Streaming & New Voices
Text-to-Speech gets a speed boost and new options:
- Streaming Support: TTS endpoints now support streaming, reducing the "time-to-first-audio" significantly.
- VoxCPM: Added support for the VoxCPM backend.
- Qwen-TTS: Added support for Qwen-TTS models
- Piper Voices: Added most remaining Piper voices from Hugging Face to the gallery.
🛠️ Hardware & Backend Updates
- vLLM Omni: A new backend integration for vLLM Omni models.
- Extended Platform Support: Major work on MLX to improve compatibility across CUDA 12, CUDA 13, L4T (Nvidia Jetson), SBSA, and macOS Metal.
- GGUF Cleanup: Dropped redundant VRAM estimation logic for GGUF loading, relying on more accurate internal measurements.
⚠️ Breaking Changes
To keep the project lean and maintainable, we have removed some older backends:
- ExLlama: Removed (deprecated in favor of newer loaders like ExLlamaV2 or llama.cpp).
- Bark: Removed (the upstream project is unmaintained; we recommend using the new TTS alternatives).
🚀 The Complete Local Stack for Privacy-First AI
LocalAI |
The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
LocalAGI |
Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
LocalRecall |
RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI. |
❤️ Thank You
LocalAI is a true FOSS movement — built by contributors, powered by community.
If you believe in privacy-first AI:
- ✅ Star the repo
- 💬 Contribute code, docs, or feedback
- 📣 Share with others
Your support keeps this stack alive.
✅ Full Changelog
📋 Click to expand full changelog
What's Changed
Breaking Changes 🛠
Bug fixes 🐛
- fix(ui): correctly display selected image model by @dedyf5 in #8208
- fix(ui): take account of reasoning in token count calculation by @mudler in #8324
- fix: drop gguf VRAM estimation (now redundant) by @mudler in #8325
- fix(api): Add missing field in initial OpenAI streaming response by @acon96 in #8341
- fix(realtime): Include noAction function in prompt template and handle tool_choice by @richiejp in #8372
- fix: filter GGUF and GGML files from model list by @Yaroslav98214 in #8397
- fix(qwen-asr): Remove contagious slop (DEFAULT_GOAL) from Makefile by @richiejp in #8431
Exciting New Features 🎉
- feat(vllm-omni): add new backend by @mudler in #8188
- feat(vibevoice): add ASR support by @mudler in #8222
- feat: add VoxCPM tts backend by @mudler in #8109
- feat(realtime): Add audio conversations by @richiejp in #6245
- feat(qwen-asr): add support to qwen-asr by @mudler in #8281
- feat(tts): add support for streaming mode by @mudler in #8291
- feat(api): Add transcribe response format request parameter & adjust STT backends by @nanoandrew4 in #8318
- feat(whisperx): add whisperx backend for transcription with speaker diarization by @eureka928 in #8299
- feat(mlx): Add support for CUDA12, CUDA13, L4T, SBSA and CPU by @mudler in #8380
- feat(musicgen): add ace-step and UI interface by @mudler in #8396
- fix(api)!: Stop model prior to deletion by @nanoandrew4 in #8422
- feat(nemo): add Nemo (only asr for now) backend by @mudler in #8436
🧠 Models
- chore(model gallery): add qwen3-tts to model gallery by @mudler in #8187
- chore(model gallery): Add most of not yet present Piper voices from Hugging Face by @rampa3 in #8202
- chore: drop bark which is unmaintained by @mudler in #8207
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8220
- chore(model gallery): Add entry for Mistral Small 3.1 with mmproj by @rampa3 in #8247
- chore(model gallery): Add entry for Magistral Small 1.2 with mmproj by @rampa3 in #8248
- chore(model gallery): Add mistral-community/pixtral-12b with mmproj by @rampa3 in #8245
- chore(model gallery): add z-image and z-image-turbo for diffusers by @mudler in #8260
- fix(qwen3): Be explicit with function calling format by @richiejp in #8265
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8285
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8307
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8321
- chore(model gallery): Rename downloaded filename for Magistral Small mmproj by @rampa3 in #8327
- chore(model gallery): Add Qwen 3 VL 8B thinking & instruct by @rampa3 in #8329
- feat(metal): try to extend support to remaining backends by @mudler in #8374
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8381
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8420
- chore(models): Add Qwen TTS 0.6b by @richiejp in #8428
👒 Dependencies
- chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory by @dependabot[bot] in #8175
- chore: re-enable e2e tests, fixups anthropic API tools support by @mudler in #8296
- chore(cuda): target 12.8 for 12 to increase compatibility by @mudler in #8297
- chore(deps): bump appleboy/ssh-action from 1.2.4 to 1.2.5 by @dependabot[bot] in #8352
- chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory by @dependabot[bot] in #8360
- chore(deps): bump go.opentelemetry.io/otel/metric from 1.39.0 to 1.40.0 by @dependabot[bot] in #8353
- chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.19.0 to 1.20.0 by @dependabot[bot] in #8355
- chore(deps): bump protobuf from 6.33.4 to 6.33.5 in /backend/python/transformers by @dependabot[bot] in #8356
- chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.39.0 to 1.40.0 by @dependabot[bot] in #8354
- chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.61.0 to 0.62.0 by @dependabot[bot] in #8359
- chore(deps): bump sentence-transformers from 5.2.0 to 5.2.2 in /backend/python/transformers by @dependabot[bot] in #8358
- chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 by @dependabot[bot] in #8357
- chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory by @dependabot[bot] in #8407
- feat(audio): set audio content type by @mudler in #8416
Other Changes
- Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory" by @mudler in #8180
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8182
- chore: ⬆️ Update ggml-org/llama.cpp to
557515be1e93ed8939dd8a7c7d08765fdbe8be31by @localai-bot in #8183 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
fa61ea744d1a87fa26a63f8a86e45587bc9534d6by @localai-bot in #8184 - chore: ⬆️ Update ggml-org/llama.cpp to
bb02f74c612064947e51d23269a1cf810b67c9a7by @localai-bot in #8196 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
43e829f21966abb96b08c712bccee872dc820914by @localai-bot in #8215 - chore: ⬆️ Update ggml-org/llama.cpp to
0440bfd1605333726ea0fb7a836942660bf2f9a6by @localai-bot in #8216 - chore: ⬆️ Update ggml-org/llama.cpp to
8f80d1b254aef70a0959e314be368d05debe7294by @localai-bot in #8229 - chore: ⬆️ Update ggml-org/llama.cpp to
2b4cbd2834e427024bc7f935a1f232aecac6679bby @localai-bot in #8258 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
e411520407663e1ddf8ff2e5ed4ff3a116fbbc97by @localai-bot in #8274 - chore(llama.cpp): bump to 'f6b533d898ce84bae8d9fa8dfc6697ac087800bf' by @mudler in #8275
- chore: ⬆️ Update ggml-org/llama.cpp to
4fdbc1e4dba428ce0cf9d2ac22232dc170bbca82by @localai-bot in #8283 - feat(swagger): update swagger by @localai-bot in #8304
- chore: ⬆️ Update ggml-org/whisper.cpp to
aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075by @localai-bot in #8305 - chore: ⬆️ Update ggml-org/llama.cpp to
1488339138d609139c4400d1b80f8a5b1a9a203cby @localai-bot in #8306 - chore: ⬆️ Update ggml-org/llama.cpp to
41ea26144e55d23f37bb765f88c07588d786567fby @localai-bot in #8317 - chore: ⬆️ Update ggml-org/llama.cpp to
2634ed207a17db1a54bd8df0555bd8499a6ab691by @localai-bot in #8336 - Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory" by @mudler in #8367
- fix(docs): Promote DEBUG=false in production docker compose by @JonasBernard in #8390
- chore: ⬆️ Update ggml-org/whisper.cpp to
941bdabbe4561bc6de68981aea01bc5ab05781c5by @localai-bot in #8398 - chore: ⬆️ Update ggml-org/llama.cpp to
b536eb023368701fe3564210440e2df6151c3e65by @localai-bot in #8399 - Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory" by @mudler in #8412
- feat(swagger): update swagger by @localai-bot in #8418
- chore: ⬆️ Update ggml-org/llama.cpp to
22cae832188a1f08d18bd0a707a4ba5cd03c7349by @localai-bot in #8419 - chore(docs): Document using a local model gallery by @richiejp in #8426
- chore: ⬆️ Update ggml-org/llama.cpp to
b83111815e9a79949257e9d4b087206b320a3063by @localai-bot in #8434 - chore: ⬆️ Update ggml-org/llama.cpp to
8872ad2125336d209a9911a82101f80095a9831dby @localai-bot in #8448
New Contributors
- @nanoandrew4 made their first contribution in #8318
- @acon96 made their first contribution in #8341
- @eureka928 made their first contribution in #8299
- @JonasBernard made their first contribution in #8390
- @Yaroslav98214 made their first contribution in #8397
Full Changelog: v3.10.1...v3.11.0
