🎉 LocalAI 4.0.0 Release! 🚀
LocalAI 4.0.0 is out!
This major release transforms LocalAI into a complete AI orchestration platform. We’ve embedded agentic and hybrid search capabilities directly into the core, completely overhauled the user interface with React for a modern experience, and are thrilled to introduce Agenthub ( link ) a brand new community hub to easily share and import agents. Alongside these massive updates, we've introduced powerful new features like Canvas mode for code artifacts, MCP apps and full MCP client-side support.
| Feature | Summary |
|---|---|
| Agentic Orchestration & Agenthub | Native agent management with memory, skills, and the new Agenthub for community sharing. |
| Revamped React UI | Complete frontend rewrite for lightning-fast performance and modern UX. |
| Canvas Mode | Preview code blocks and artifacts side-by-side in the chat interface. |
| MCP Client-Side | Full Model Context Protocol support, MCP Apps, and tool streaming in chat. |
| WebRTC Realtime | WebRTC support for low-latency realtime audio conversations. |
| New Backends | Added experimental MLX Distributed, fish-speech, ace-step.cpp, and faster-qwen3-tts. |
| Infrastructure | Podman documentation, shell completion, and persistent data path separation. |
🚀 Key Features
🤖 Native Agentic Orchestration & Agenthub
LocalAI now includes agentic capabilities embedded directly in the core. You can manage, import, start, and stop agents via the new UI.
- 🌐 Agenthub: We are launching Agenthub! This is a centralized community space to share common agents and import them effortlessly into your LocalAI instance.
- Agent Management: Full lifecycle management via the React UI. Create Agents, connect them to Slack, configure MCP servers and skills.
- Skills Management: Centralized skill database for AI agents.
- Memory: Agents can utilize memory with Hybrid search (PostgreSQL) or embedded in-memory storage (Chromem).
- Observability: New "Events" column in the Agents list to track observables and status.
- 📚 Documentation: Dive into the new capabilities in our official Agents documentation.
agents.mp4
🎨 Revamped UI & Canvas Mode
The Web interface has been completely migrated to React, bringing a smoother experience and powerful new capabilities:
- Canvas Mode: Enable "canvas mode" in the chat to see code blocks and artifacts generated by the LLM in a dedicated preview bar on the right.
- System View: Tabbed navigation separating Models and Backends for better organization.
- Model Size Warnings: Visual warnings when model storage exceeds system RAM to prevent lockups.
- Traces: Improved trace display using accordions for better readability.
model-fit-canvas-mode.mp4
🔌 MCP Apps & Client-Side Support
We’ve expanded support for the Model Context Protocol (MCP):
- MCP Apps: Select which servers to enable for the chat directly from the UI.
- Tool Streaming: Tools from MCP servers are automatically injected into the standard chat interface.
- Client-Side Support: Full client-side integration for MCP tools and streaming.
- Disable Option: Add
LOCALAI_DISABLE_MCPto completely disable MCP support for security.
🎵 New Backends, Audio & Video Enhancements
- MLX Distributed (Experimental): We've added an experimental backend for running distributed workloads using Apple's MLX framework! Check out the docs here.
- New Audio Backends: Introduced fish-speech, ace-step.cpp, and faster-qwen3-tts (CUDA-only).
- WeRTC Realtime: WebRTC support added to the Realtime API and Talk page for better low-latency audio handling.
- TTS Improvements: Added
sample_ratesupport via post-processing and multi-voice support for Qwen TTS. - Video Generation: Fixed model selection dropdown sync and added
vllm-omnibackend detection.
🛠️ Infrastructure & Developer Experience
- Data Separation: New
--data-pathCLI flag andLOCALAI_DATA_PATHenv var to separate persistent data (agents, skills) from configuration. - Shell Completion: Dynamic completion scripts for bash, zsh, and fish.
- Podman Support: Dedicated documentation for Podman installation and rootless configuration.
- Gallery & Models: Model storage size display with RAM warnings, and fallback URI resolution for backend installation failures.
- Deprecations: HuggingFace backend support removed, and AIO images dropped to focus on main images.
🐞 Fixes & Improvements
- Logging: Fixed watchdog spamming logs when no interval was configured; downgraded health check logs to debug.
- CUDA Detection: Improved GPU vendor checks to prevent false CUDA detection on CPU-only hosts with runtime libs.
- Compatibility: Renamed
json_verbosetoverbose_jsonfor OpenAI spec compliance (fixes Nextcloud integration). - Embedding: Fixed embedding dimension truncation to return full native dimensions.
- Permissions: Changed model install file permissions to 0644 to ensure server readability.
- Windows Docker: Added named volumes to Docker Compose files for Windows compatibility.
- Model Reload: Models now reload automatically after editing YAML config (e.g.,
context_size). - Chat: Fixed issue where thinking/reasoning blocks were sent to the LLM.
- Audio: Fixed img2img pipeline in diffusers backend and Qwen TTS duplicate argument error.
Known issues
- The
diffusersbackend fails to build currently (due to CI limit exhaustion) and it's not currently part of this release (the previous version is still available). We are looking into it but, if you want to help and know someone at Github that could help supporting us with better ARM runners, please reach out!
❤️ Thank You
LocalAI is a true FOSS movement — built by contributors, powered by community.
If you believe in privacy-first AI:
- ✅ Star the repo
- 💬 Contribute code, docs, or feedback
- 📣 Share with others
Your support keeps this stack alive.
✅ Full Changelog
📋 Click to expand full changelog
What's Changed
Breaking Changes 🛠
- Remove HuggingFace backend support by @localai-bot in #8971
- chore: drop AIO images by @mudler in #9004
Bug fixes 🐛
- fix(cli): Fix watchdog running constantly and spamming logs by @nanoandrew4 in #8624
- fix(api): Downgrade health/readiness check to debug by @nanoandrew4 in #8625
- fix: rename json_verbose to verbose_json by @lukasdotcom in #8627
- fix(chatterbox): add support for cuda13/aarch64 by @mudler in #8653
- fix: reload model after editing YAML config (issue #8647) by @localai-bot in #8652
- fix(chat): do not send thinking/reasoning messages to the LLM by @mudler in #8656
- fix: change file permissions from 0600 to 0644 in InstallModel by @localai-bot in #8657
- fix: Add named volumes for Windows Docker compatibility by @localai-bot in #8661
- fix(gallery): add fallback URI resolution for backend installation by @localai-bot in #8663
- fix: whisper breaking on cuda-13 (use absolute path for CUDA directory detection) by @localai-bot in #8678
- fix(gallery): clean up partially downloaded backend on installation failure by @localai-bot in #8679
- fix: properly sync model selection dropdown in video generation UI by @localai-bot in #8680
- fix: allow reranking models configured with known_usecases by @localai-bot in #8681
- fix: return full embedding dimensions instead of truncating trailing zeros (#8721) by @localai-bot in #8755
- fix: Add vllm-omni backend to video generation model detection (#8659) by @localai-bot in #8781
- fix(qwen-tts): duplicate instruct argument in voice design mode by @Weathercold in #8842
- Fix image upload processing and img2img pipeline in diffusers backend by @attilagyorffy in #8879
- fix: gate CUDA directory checks on GPU vendor to prevent false CUDA detection by @sozercan in #8942
- fix(llama-cpp): Set enable_thinking in the correct place by @richiejp in #8973
Exciting New Features 🎉
- feat(traces): Use accordian instead of pop-ups by @richiejp in #8626
- chore: remove install.sh script and documentation references by @localai-bot in #8643
- docs: add Podman installation documentation by @localai-bot in #8646
- Add
sample_ratesupport to TTS API via post-processing resampling by @Copilot in #8650 - feat(backends): add faster-qwen3-tts by @localai-bot in #8664
- feat(models): add model storage size display and RAM warning by @localai-bot in #8675
- feat(ui): add model size estimation by @mudler in #8684
- feat: Add Free RPC to backend.proto for VRAM cleanup by @localai-bot in #8751
- feat(qwen-tts): Support using multiple voices by @nanoandrew4 in #8757
- feat(ui): move to React for frontend by @mudler in #8772
- feat: add WebSocket mode support for the response api by @bittoby in #8676
- feat: Add LOCALAI_DISABLE_MCP environment variable to disable MCP support by @localai-bot in #8816
- feat: add agentic management by @mudler in #8820
- feat: Add shell completion support for bash, zsh, and fish by @localai-bot in #8851
- feat(downloader): add HF_MIRROR environment variable support by @localai-bot in #8847
- feat: add Events column to Agents list page by @localai-bot in #8870
- feat: Add tabs to System view for Models and Backends by @localai-bot in #8885
- feat: Add --data-path CLI flag for persistent data separation by @localai-bot in #8888
- feat(mlx-distributed): add new (experimental) MLX-distributed backend by @mudler in #8801
- chore(size): display size of HF models and allow to specify it from the gallery by @mudler in #8907
- feat(ui): add canvas mode, support history in agent chat by @mudler in #8927
- feat(ui): MCP Apps, mcp streaming and client-side support by @mudler in #8947
- feat: add fish-speech backend by @mudler in #8962
- feat(backends): add ace-step.cpp by @mudler in #8965
- feat(realtime): WebRTC support by @richiejp in #8790
🧠 Models
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8693
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8694
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8695
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8696
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8698
📖 Documentation and examples
- fix(realtime): Add functions to conversation history by @richiejp in #8616
- docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration by @localai-bot in #8621
- docs: Update Home Assistant links in README.md by @loryanstrant in #8688
👒 Dependencies
- chore(deps): bump fyne.io/fyne/v2 from 2.7.2 to 2.7.3 by @dependabot[bot] in #8629
- chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.22.0 to 1.26.0 by @dependabot[bot] in #8630
- chore(deps): bump github.com/gpustack/gguf-parser-go from 0.23.1 to 0.24.0 by @dependabot[bot] in #8631
- chore(deps): bump actions/stale from 10.1.1 to 10.2.0 by @dependabot[bot] in #8633
- chore(deps): bump goreleaser/goreleaser-action from 6 to 7 by @dependabot[bot] in #8634
- chore(deps): bump github.com/mudler/cogito from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1 by @dependabot[bot] in #8632
- chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui by @dependabot[bot] in #8642
- chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template by @dependabot[bot] in #8641
- chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers by @dependabot[bot] in #8640
- chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers by @dependabot[bot] in #8638
- chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers by @dependabot[bot] in #8636
- chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm by @dependabot[bot] in #8635
- chore(deps): bump actions/download-artifact from 7 to 8 by @dependabot[bot] in #8729
- chore(deps): bump actions/upload-artifact from 6 to 7 by @dependabot[bot] in #8730
- chore(deps): bump github.com/openai/openai-go/v3 from 3.19.0 to 3.24.0 by @dependabot[bot] in #8732
- chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.3.0 to 1.4.0 by @dependabot[bot] in #8733
- chore(deps): bump go.opentelemetry.io/otel from 1.40.0 to 1.41.0 by @dependabot[bot] in #8734
- chore(deps): bump go.opentelemetry.io/otel/metric from 1.40.0 to 1.41.0 by @dependabot[bot] in #8735
- chore(deps): bump github.com/google/go-containerregistry from 0.20.7 to 0.21.1 by @dependabot[bot] in #8736
- chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.40.0 to 1.42.0 by @dependabot[bot] in #8915
- chore(deps): bump github.com/openai/openai-go/v3 from 3.24.0 to 3.26.0 by @dependabot[bot] in #8916
- chore(deps): bump go.yaml.in/yaml/v2 from 2.4.3 to 2.4.4 by @dependabot[bot] in #8913
- chore(deps): bump github.com/labstack/echo/v4 from 4.15.0 to 4.15.1 by @dependabot[bot] in #8914
- chore(deps): bump docker/metadata-action from 5 to 6 by @dependabot[bot] in #8917
- chore(deps): bump actions/setup-node from 4 to 6 by @dependabot[bot] in #8920
- chore(deps): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #8919
- chore(deps): bump docker/login-action from 3 to 4 by @dependabot[bot] in #8918
- chore(deps): bump node from 22-slim to 25-slim by @dependabot[bot] in #8922
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8618
- chore: ⬆️ Update ggml-org/llama.cpp to
f75c4e8bf52ea480ece07fd3d9a292f1d7f04bc5by @localai-bot in #8619 - chore: ⬆️ Update ggml-org/llama.cpp to
2b6dfe824de8600c061ef91ce5cc5c307f97112cby @localai-bot in #8622 - chore: ⬆️ Update ggml-org/llama.cpp to
b68a83e641b3ebe6465970b34e99f3f0e0a0b21aby @localai-bot in #8628 - fix(webui): use different icon for System nav item by @localai-bot in #8648
- chore: ⬆️ Update ggml-org/llama.cpp to
418dea39cea85d3496c8b04a118c3b17f3940ad8by @localai-bot in #8649 - feat(swagger): update swagger by @localai-bot in #8654
- chore: ⬆️ Update ggml-org/llama.cpp to
3769fe6eb70b0a0fbb30b80917f1caae68c902f7by @localai-bot in #8655 - chore: ⬆️ Update ggml-org/llama.cpp to
723c71064da0908c19683f8c344715fbf6d986fdby @localai-bot in #8660 - fix(qwen3.5): add qwen3.5 preset and mimick llama.cpp's PEG by @mudler in #8668
- chore: ⬆️ Update ggml-org/whisper.cpp to
9453b4b9be9b73adfc35051083f37cefa039aceeby @localai-bot in #8671 - chore(deps): bump llama.cpp to 'ecbcb7ea9d3303097519723b264a8b5f1e977028' by @mudler in #8672
- docs: add CDI driver config for NVIDIA GPU in containers (fix #8108) by @localai-bot in #8677
- docs: add TLS reverse proxy configuration guide by @localai-bot in #8673
- chore: ⬆️ Update ggml-org/llama.cpp to
05728db18eea59de81ee3a7699739daaf015206bby @localai-bot in #8683 - fix: simplify CI steps, fix gallery agent by @mudler in #8685
- fix: retry when LLM returns empty messages by @mudler in #8704
- feat(swagger): update swagger by @localai-bot in #8706
- feat: Add debug logging for pocket-tts voice issue #8244 by @localai-bot in #8715
- fix(ci): correct transformer backend path typo by @localai-bot in #8712
- chore: ⬆️ Update ggml-org/llama.cpp to
319146247e643695f94a558e8ae686277dd4f8daby @localai-bot in #8707 - fix: Implement responsive line wrapping for model names (#8209) by @localai-bot in #8720
- fix(qwen-tts): ensure all requirements files end with newline by @localai-bot in #8724
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8728
- chore: ⬆️ Update ggml-org/llama.cpp to
4d828bd1ab52773ba9570cc008cf209eb4a8b2f5by @localai-bot in #8727 - chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8743
- chore: ⬆️ Update ggml-org/llama.cpp to
ecd99d6a9acbc436bad085783bcd5d0b9ae9e9e9by @localai-bot in #8762 - chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8770
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8778
- chore: ⬆️ Update ggml-org/llama.cpp to
24d2ee052795063afffc9732465ca1b1c65f4a28by @localai-bot in #8777 - docs: add autonomous development team section to README by @localai-bot in #8780
- feat: Rename 'Whisper' model type to 'STT' in UI by @localai-bot in #8785
- chore: ⬆️ Update ggml-org/whisper.cpp to
30c5194c9691e4e9a98b3dea9f19727397d3f46eby @localai-bot in #8796 - chore: ⬆️ Update ggml-org/llama.cpp to
a0ed91a442ea6b013bd42ebc3887a81792eaefa1by @localai-bot in #8797 - feat: pass-by metadata to predict options by @mudler in #8795
- fix: Add timeout-based wait for model deletion completion by @localai-bot in #8756
- chore: Add LTX-2.3 model to gallery by @localai-bot in #8805
- chore: ⬆️ Update ggml-org/llama.cpp to
566059a26b0ce8faec4ea053605719d399c64cc5by @localai-bot in #8822 - chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8828
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8830
- feat: update descriptions for first 9 models in gallery/index.yaml by @localai-bot in #8831
- chore: ⬆️ Update ggml-org/llama.cpp to
c5a778891ba0ddbd4cbb507c823f970595b1adc2by @localai-bot in #8837 - chore(docs): Populate coding guidelines in CONTRIBUTING.md by @localai-bot in #8840
- fix: Remove debug print statement from soundgeneration.go (C2) by @localai-bot in #8843
- docs: Add comprehensive API error reference documentation by @localai-bot in #8848
- docs: add Table of Contents to README.md by @localai-bot in #8846
- chore: ⬆️ Update leejet/stable-diffusion.cpp to c8fb3d245858d495be1f140efdcfaa0d49de41e5 by @localai-bot in #8841
- docs: clarify SECURITY.md version support table with specific ranges and EOL dates by @localai-bot in #8861
- fix: Correct Talk Interface screenshot reference in README.md (H6) by @localai-bot in #8857
- feat: Add documentation for undocumented API endpoints by @localai-bot in #8852
- docs: add comprehensive development setup instructions to CONTRIBUTING.md (H7) by @localai-bot in #8860
- feat(cli): add configurable backend image fallback tags via CLI options by @localai-bot in #8817
- feat: add MIT license badge to README.md by @localai-bot in #8871
- feat: Create comprehensive troubleshooting guide (M1 task) by @localai-bot in #8856
- docs: expand GPU acceleration guide with L4T, multi-GPU, monitoring, and troubleshooting by @localai-bot in #8858
- feat: Add documentation URLs to CLI help text by @localai-bot in #8874
- feat(functions): add peg-based parsing and allow backends to return tool calls directly by @mudler in #8838
- chore: Update README.md screenshot references and alt text by @localai-bot in #8862
- chore: ⬆️ Update ggml-org/llama.cpp to
35bee031e17ed2b2e8e7278b284a6c8cd120d9f8by @localai-bot in #8872 - docs: Update model compatibility documentation with missing backends by @localai-bot in #8889
- docs: make examples repository link more prominent by @localai-bot in #8895
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8901
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8902
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8904
- feat: Redesign explorer and models pages with react-ui theme by @localai-bot in #8903
- fix(ui): minor visual enhancements by @mudler in #8909
- feat(swagger): update swagger by @localai-bot in #8923
- feat: Standardize CLI flag naming to kebab-case (M12) by @localai-bot in #8912
- chore: ⬆️ Update leejet/stable-diffusion.cpp to
d6dd6d7b555c233bb9bc9f20b4751eb8c9269743by @localai-bot in #8925 - chore: ⬆️ Update ggml-org/llama.cpp to
23fbfcb1ad6c6f76b230e8895254de785000be46by @localai-bot in #8921 - fix: correct grammar in CONTRIBUTING.md documentation section by @localai-bot in #8932
- feat: Expand section index pages with comprehensive navigation (M7) by @localai-bot in #8929
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8939
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8945
- chore: ⬆️ Update ggml-org/llama.cpp to
10e5b148b061569aaee8ae0cf72a703129df0eabby @localai-bot in #8946 - fix: include model name in mmproj file path to prevent model isolation (#8937) by @localai-bot in #8940
- feat(swagger): update swagger by @localai-bot in #8961
- docs: Document GPU auto-fit mode limitations and trade-offs (closes #8562) by @localai-bot in #8954
- chore(ui): improve errors and reporting during model installation by @mudler in #8979
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8980
- fix(collections): start agent pool after http server by @mudler in #8981
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #8985
- chore: ⬆️ Update ggml-org/llama.cpp to
57819b8d4b39d893408e51520dff3d47d1ebb757by @localai-bot in #8983 - chore: ⬆️ Update ace-step/acestep.cpp to
5aa065445541094cba934299cd498bbb9fa5c434by @localai-bot in #8984 - fix(ui): Move routes to /app to avoid conflict with API endpoints by @richiejp in #8978
- fix(conf): Don't overwrite env provided galleries with runtime conf by @richiejp in #8994
- fix(flux.2-klein-9b): Use Qwen3-8b to avoid GGML assertion failure on tensor mismatch by @richiejp in #8995
- fix(acestep-cpp): resolve relative model paths in options by @localai-bot in #8993
- chore: ⬆️ Update ggml-org/llama.cpp to
e30f1fdf74ea9238ff562901aa974c75aab6619bby @localai-bot in #8997
New Contributors
- @loryanstrant made their first contribution in #8688
- @bittoby made their first contribution in #8676
- @Weathercold made their first contribution in #8842
- @attilagyorffy made their first contribution in #8879
Full Changelog: v3.12.1...v4.0.0
