🎉 LocalAI 3.11.0 Release! 🚀

LocalAI 3.11.0 is a massive update for Audio and Multimodal capabilities.

We are introducing Realtime Audio Conversations, a dedicated Music Generation UI, and a massive expansion of ASR (Speech-to-Text) and TTS backends. Whether you want to talk to your AI, clone voices, transcribe with speaker identification, or generate songs, this release has you covered.

Check out the highlights below!

📌 TL;DR

Feature	Summary
Realtime Audio	Native support for audio conversations, enabling fluid voice interactions similar to OpenAI's Realtime API. Documentation
Music Generation UI	New UI interface for MusicGen (Ace-Step), allowing you to generate music from text prompts directly in the browser.
New ASR Backends	Added WhisperX (with Speaker Diarization), VibeVoice, Qwen-ASR, and Nvidia NeMo.
TTS Streaming	Text-to-Speech now supports streaming mode for lower latency responses. (VoxCPM only for now)
vLLM Omni	Added support for vLLM Omni, expanding our high-performance inference capabilities.
Speaker Diarization	Native support for identifying different speakers in transcriptions via WhisperX.
Hardware Expansion	Expanded build support for CUDA 12/13, L4T (Jetson), SBSA, and better Metal (Apple Silicon) integration with MLX backends
Breaking Changes	ExLlama (deprecated) and Bark (unmaintained) backends have been removed.

🚀 New Features & Major Enhancements

🎙️ Realtime Audio Conversations

LocalAI 3.11.0 introduces native support for Realtime Audio Conversations.

Enables fluid, low-latency voice interaction with agents.
Logic handled directly within the LocalAI pipeline for seamless audio-in/audio-out workflows.
Support for STT/TTS and voice-to-voice models (experimental)
Support for tool calls

🗣️ Talk to your LocalAI: This brings us one step closer to a fully local, voice-native assistant experience compatible with standard client implementations.

Check here for detailed documentation.

🎵 Music Generation UI & Ace-Step

We have added a dedicated interface for music generation!

New Backend: Support for Ace-Step (MusicGen) via the ace-step backend.
Web UI Integration: Generate musical clips directly from the LocalAI Web UI.
Simple text-to-music workflow (e.g., "Lo-fi hip hop beat for studying").

Screenshot 2026-02-07 at 23-32-00 LocalAI - Generate sound with ace-step-turbo

🎧 Massive ASR (Speech-to-Text) Expansion

This release significantly broadens our transcription capabilities with four new backends:

WhisperX: Provides fast transcription with Speaker Diarization (identifying who is speaking).
VibeVoice: Now supports also ASR alongside TTS.
Qwen-ASR: Support for Qwen's powerful speech recognition models.
Nvidia NeMo: Initial support for NeMo ASR.

🗣️ TTS Streaming & New Voices

Text-to-Speech gets a speed boost and new options:

Streaming Support: TTS endpoints now support streaming, reducing the "time-to-first-audio" significantly.
VoxCPM: Added support for the VoxCPM backend.
Qwen-TTS: Added support for Qwen-TTS models
Piper Voices: Added most remaining Piper voices from Hugging Face to the gallery.

🛠️ Hardware & Backend Updates

vLLM Omni: A new backend integration for vLLM Omni models.
Extended Platform Support: Major work on MLX to improve compatibility across CUDA 12, CUDA 13, L4T (Nvidia Jetson), SBSA, and macOS Metal.
GGUF Cleanup: Dropped redundant VRAM estimation logic for GGUF loading, relying on more accurate internal measurements.

⚠️ Breaking Changes

To keep the project lean and maintainable, we have removed some older backends:

ExLlama: Removed (deprecated in favor of newer loaders like ExLlamaV2 or llama.cpp).
Bark: Removed (the upstream project is unmaintained; we recommend using the new TTS alternatives).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall

RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

chore(exllama): drop backend now almost deprecated by @mudler in #8186

Bug fixes 🐛

fix(ui): correctly display selected image model by @dedyf5 in #8208
fix(ui): take account of reasoning in token count calculation by @mudler in #8324
fix: drop gguf VRAM estimation (now redundant) by @mudler in #8325
fix(api): Add missing field in initial OpenAI streaming response by @acon96 in #8341
fix(realtime): Include noAction function in prompt template and handle tool_choice by @richiejp in #8372
fix: filter GGUF and GGML files from model list by @Yaroslav98214 in #8397
fix(qwen-asr): Remove contagious slop (DEFAULT_GOAL) from Makefile by @richiejp in #8431

Exciting New Features 🎉

feat(vllm-omni): add new backend by @mudler in #8188
feat(vibevoice): add ASR support by @mudler in #8222
feat: add VoxCPM tts backend by @mudler in #8109
feat(realtime): Add audio conversations by @richiejp in #6245
feat(qwen-asr): add support to qwen-asr by @mudler in #8281
feat(tts): add support for streaming mode by @mudler in #8291
feat(api): Add transcribe response format request parameter & adjust STT backends by @nanoandrew4 in #8318
feat(whisperx): add whisperx backend for transcription with speaker diarization by @eureka928 in #8299
feat(mlx): Add support for CUDA12, CUDA13, L4T, SBSA and CPU by @mudler in #8380
feat(musicgen): add ace-step and UI interface by @mudler in #8396
fix(api)!: Stop model prior to deletion by @nanoandrew4 in #8422
feat(nemo): add Nemo (only asr for now) backend by @mudler in #8436

🧠 Models

chore(model gallery): add qwen3-tts to model gallery by @mudler in #8187
chore(model gallery): Add most of not yet present Piper voices from Hugging Face by @rampa3 in #8202
chore: drop bark which is unmaintained by @mudler in #8207
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8220
chore(model gallery): Add entry for Mistral Small 3.1 with mmproj by @rampa3 in #8247
chore(model gallery): Add entry for Magistral Small 1.2 with mmproj by @rampa3 in #8248
chore(model gallery): Add mistral-community/pixtral-12b with mmproj by @rampa3 in #8245
chore(model gallery): add z-image and z-image-turbo for diffusers by @mudler in #8260
fix(qwen3): Be explicit with function calling format by @richiejp in #8265
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8285
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8307
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8321
chore(model gallery): Rename downloaded filename for Magistral Small mmproj by @rampa3 in #8327
chore(model gallery): Add Qwen 3 VL 8B thinking & instruct by @rampa3 in #8329
feat(metal): try to extend support to remaining backends by @mudler in #8374
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8381
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8420
chore(models): Add Qwen TTS 0.6b by @richiejp in #8428

👒 Dependencies

chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory by @dependabot[bot] in #8175
chore: re-enable e2e tests, fixups anthropic API tools support by @mudler in #8296
chore(cuda): target 12.8 for 12 to increase compatibility by @mudler in #8297
chore(deps): bump appleboy/ssh-action from 1.2.4 to 1.2.5 by @dependabot[bot] in #8352
chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory by @dependabot[bot] in #8360
chore(deps): bump go.opentelemetry.io/otel/metric from 1.39.0 to 1.40.0 by @dependabot[bot] in #8353
chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.19.0 to 1.20.0 by @dependabot[bot] in #8355
chore(deps): bump protobuf from 6.33.4 to 6.33.5 in /backend/python/transformers by @dependabot[bot] in #8356
chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.39.0 to 1.40.0 by @dependabot[bot] in #8354
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.61.0 to 0.62.0 by @dependabot[bot] in #8359
chore(deps): bump sentence-transformers from 5.2.0 to 5.2.2 in /backend/python/transformers by @dependabot[bot] in #8358
chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 by @dependabot[bot] in #8357
chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory by @dependabot[bot] in #8407
feat(audio): set audio content type by @mudler in #8416

Other Changes

Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/bark in the pip group across 1 directory" by @mudler in #8180
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8182
chore: ⬆️ Update ggml-org/llama.cpp to 557515be1e93ed8939dd8a7c7d08765fdbe8be31 by @localai-bot in #8183
chore: ⬆️ Update leejet/stable-diffusion.cpp to fa61ea744d1a87fa26a63f8a86e45587bc9534d6 by @localai-bot in #8184
chore: ⬆️ Update ggml-org/llama.cpp to bb02f74c612064947e51d23269a1cf810b67c9a7 by @localai-bot in #8196
chore: ⬆️ Update leejet/stable-diffusion.cpp to 43e829f21966abb96b08c712bccee872dc820914 by @localai-bot in #8215
chore: ⬆️ Update ggml-org/llama.cpp to 0440bfd1605333726ea0fb7a836942660bf2f9a6 by @localai-bot in #8216
chore: ⬆️ Update ggml-org/llama.cpp to 8f80d1b254aef70a0959e314be368d05debe7294 by @localai-bot in #8229
chore: ⬆️ Update ggml-org/llama.cpp to 2b4cbd2834e427024bc7f935a1f232aecac6679b by @localai-bot in #8258
chore: ⬆️ Update leejet/stable-diffusion.cpp to e411520407663e1ddf8ff2e5ed4ff3a116fbbc97 by @localai-bot in #8274
chore(llama.cpp): bump to 'f6b533d898ce84bae8d9fa8dfc6697ac087800bf' by @mudler in #8275
chore: ⬆️ Update ggml-org/llama.cpp to 4fdbc1e4dba428ce0cf9d2ac22232dc170bbca82 by @localai-bot in #8283
feat(swagger): update swagger by @localai-bot in #8304
chore: ⬆️ Update ggml-org/whisper.cpp to aa1bc0d1a6dfd70dbb9f60c11df12441e03a9075 by @localai-bot in #8305
chore: ⬆️ Update ggml-org/llama.cpp to 1488339138d609139c4400d1b80f8a5b1a9a203c by @localai-bot in #8306
chore: ⬆️ Update ggml-org/llama.cpp to 41ea26144e55d23f37bb765f88c07588d786567f by @localai-bot in #8317
chore: ⬆️ Update ggml-org/llama.cpp to 2634ed207a17db1a54bd8df0555bd8499a6ab691 by @localai-bot in #8336
Revert "chore(deps): bump torch from 2.7.0 to 2.7.1+xpu in /backend/python/vllm in the pip group across 1 directory" by @mudler in #8367
fix(docs): Promote DEBUG=false in production docker compose by @JonasBernard in #8390
chore: ⬆️ Update ggml-org/whisper.cpp to 941bdabbe4561bc6de68981aea01bc5ab05781c5 by @localai-bot in #8398
chore: ⬆️ Update ggml-org/llama.cpp to b536eb023368701fe3564210440e2df6151c3e65 by @localai-bot in #8399
Revert "chore(deps): bump torch from 2.4.1 to 2.7.1+xpu in /backend/python/rerankers in the pip group across 1 directory" by @mudler in #8412
feat(swagger): update swagger by @localai-bot in #8418
chore: ⬆️ Update ggml-org/llama.cpp to 22cae832188a1f08d18bd0a707a4ba5cd03c7349 by @localai-bot in #8419
chore(docs): Document using a local model gallery by @richiejp in #8426
chore: ⬆️ Update ggml-org/llama.cpp to b83111815e9a79949257e9d4b087206b320a3063 by @localai-bot in #8434
chore: ⬆️ Update ggml-org/llama.cpp to 8872ad2125336d209a9911a82101f80095a9831d by @localai-bot in #8448

New Contributors

@nanoandrew4 made their first contribution in #8318
@acon96 made their first contribution in #8341
@eureka928 made their first contribution in #8299
@JonasBernard made their first contribution in #8390
@Yaroslav98214 made their first contribution in #8397

Full Changelog: v3.10.1...v3.11.0

mudler/LocalAI v3.11.0 on GitHub