🎉 LocalAI 4.0.0 Release! 🚀

LocalAI 4.0.0 is out!

This major release transforms LocalAI into a complete AI orchestration platform. We’ve embedded agentic and hybrid search capabilities directly into the core, completely overhauled the user interface with React for a modern experience, and are thrilled to introduce Agenthub ( link ) a brand new community hub to easily share and import agents. Alongside these massive updates, we've introduced powerful new features like Canvas mode for code artifacts, MCP apps and full MCP client-side support.

Feature	Summary
Agentic Orchestration & Agenthub	Native agent management with memory, skills, and the new Agenthub for community sharing.
Revamped React UI	Complete frontend rewrite for lightning-fast performance and modern UX.
Canvas Mode	Preview code blocks and artifacts side-by-side in the chat interface.
MCP Client-Side	Full Model Context Protocol support, MCP Apps, and tool streaming in chat.
WebRTC Realtime	WebRTC support for low-latency realtime audio conversations.
New Backends	Added experimental MLX Distributed, fish-speech, ace-step.cpp, and faster-qwen3-tts.
Infrastructure	Podman documentation, shell completion, and persistent data path separation.

🚀 Key Features

🤖 Native Agentic Orchestration & Agenthub

LocalAI now includes agentic capabilities embedded directly in the core. You can manage, import, start, and stop agents via the new UI.

🌐 Agenthub: We are launching Agenthub! This is a centralized community space to share common agents and import them effortlessly into your LocalAI instance.
Agent Management: Full lifecycle management via the React UI. Create Agents, connect them to Slack, configure MCP servers and skills.
Skills Management: Centralized skill database for AI agents.
Memory: Agents can utilize memory with Hybrid search (PostgreSQL) or embedded in-memory storage (Chromem).
Observability: New "Events" column in the Agents list to track observables and status.
📚 Documentation: Dive into the new capabilities in our official Agents documentation.

agents.mp4

🎨 Revamped UI & Canvas Mode

The Web interface has been completely migrated to React, bringing a smoother experience and powerful new capabilities:

Canvas Mode: Enable "canvas mode" in the chat to see code blocks and artifacts generated by the LLM in a dedicated preview bar on the right.
System View: Tabbed navigation separating Models and Backends for better organization.
Model Size Warnings: Visual warnings when model storage exceeds system RAM to prevent lockups.
Traces: Improved trace display using accordions for better readability.

model-fit-canvas-mode.mp4

🔌 MCP Apps & Client-Side Support

We’ve expanded support for the Model Context Protocol (MCP):

MCP Apps: Select which servers to enable for the chat directly from the UI.
Tool Streaming: Tools from MCP servers are automatically injected into the standard chat interface.
Client-Side Support: Full client-side integration for MCP tools and streaming.
Disable Option: Add LOCALAI_DISABLE_MCP to completely disable MCP support for security.

🎵 New Backends, Audio & Video Enhancements

MLX Distributed (Experimental): We've added an experimental backend for running distributed workloads using Apple's MLX framework! Check out the docs here.
New Audio Backends: Introduced fish-speech, ace-step.cpp, and faster-qwen3-tts (CUDA-only).
WeRTC Realtime: WebRTC support added to the Realtime API and Talk page for better low-latency audio handling.
TTS Improvements: Added sample_rate support via post-processing and multi-voice support for Qwen TTS.
Video Generation: Fixed model selection dropdown sync and added vllm-omni backend detection.

🛠️ Infrastructure & Developer Experience

Data Separation: New --data-path CLI flag and LOCALAI_DATA_PATH env var to separate persistent data (agents, skills) from configuration.
Shell Completion: Dynamic completion scripts for bash, zsh, and fish.
Podman Support: Dedicated documentation for Podman installation and rootless configuration.
Gallery & Models: Model storage size display with RAM warnings, and fallback URI resolution for backend installation failures.
Deprecations: HuggingFace backend support removed, and AIO images dropped to focus on main images.

🐞 Fixes & Improvements

Logging: Fixed watchdog spamming logs when no interval was configured; downgraded health check logs to debug.
CUDA Detection: Improved GPU vendor checks to prevent false CUDA detection on CPU-only hosts with runtime libs.
Compatibility: Renamed json_verbose to verbose_json for OpenAI spec compliance (fixes Nextcloud integration).
Embedding: Fixed embedding dimension truncation to return full native dimensions.
Permissions: Changed model install file permissions to 0644 to ensure server readability.
Windows Docker: Added named volumes to Docker Compose files for Windows compatibility.
Model Reload: Models now reload automatically after editing YAML config (e.g., context_size).
Chat: Fixed issue where thinking/reasoning blocks were sent to the LLM.
Audio: Fixed img2img pipeline in diffusers backend and Qwen TTS duplicate argument error.

Known issues

The diffusers backend fails to build currently (due to CI limit exhaustion) and it's not currently part of this release (the previous version is still available). We are looking into it but, if you want to help and know someone at Github that could help supporting us with better ARM runners, please reach out!

❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

Remove HuggingFace backend support by @localai-bot in #8971
chore: drop AIO images by @mudler in #9004

Bug fixes 🐛

fix(cli): Fix watchdog running constantly and spamming logs by @nanoandrew4 in #8624
fix(api): Downgrade health/readiness check to debug by @nanoandrew4 in #8625
fix: rename json_verbose to verbose_json by @lukasdotcom in #8627
fix(chatterbox): add support for cuda13/aarch64 by @mudler in #8653
fix: reload model after editing YAML config (issue #8647) by @localai-bot in #8652
fix(chat): do not send thinking/reasoning messages to the LLM by @mudler in #8656
fix: change file permissions from 0600 to 0644 in InstallModel by @localai-bot in #8657
fix: Add named volumes for Windows Docker compatibility by @localai-bot in #8661
fix(gallery): add fallback URI resolution for backend installation by @localai-bot in #8663
fix: whisper breaking on cuda-13 (use absolute path for CUDA directory detection) by @localai-bot in #8678
fix(gallery): clean up partially downloaded backend on installation failure by @localai-bot in #8679
fix: properly sync model selection dropdown in video generation UI by @localai-bot in #8680
fix: allow reranking models configured with known_usecases by @localai-bot in #8681
fix: return full embedding dimensions instead of truncating trailing zeros (#8721) by @localai-bot in #8755
fix: Add vllm-omni backend to video generation model detection (#8659) by @localai-bot in #8781
fix(qwen-tts): duplicate instruct argument in voice design mode by @Weathercold in #8842
Fix image upload processing and img2img pipeline in diffusers backend by @attilagyorffy in #8879
fix: gate CUDA directory checks on GPU vendor to prevent false CUDA detection by @sozercan in #8942
fix(llama-cpp): Set enable_thinking in the correct place by @richiejp in #8973

Exciting New Features 🎉

feat(traces): Use accordian instead of pop-ups by @richiejp in #8626
chore: remove install.sh script and documentation references by @localai-bot in #8643
docs: add Podman installation documentation by @localai-bot in #8646
Add sample_rate support to TTS API via post-processing resampling by @Copilot in #8650
feat(backends): add faster-qwen3-tts by @localai-bot in #8664
feat(models): add model storage size display and RAM warning by @localai-bot in #8675
feat(ui): add model size estimation by @mudler in #8684
feat: Add Free RPC to backend.proto for VRAM cleanup by @localai-bot in #8751
feat(qwen-tts): Support using multiple voices by @nanoandrew4 in #8757
feat(ui): move to React for frontend by @mudler in #8772
feat: add WebSocket mode support for the response api by @bittoby in #8676
feat: Add LOCALAI_DISABLE_MCP environment variable to disable MCP support by @localai-bot in #8816
feat: add agentic management by @mudler in #8820
feat: Add shell completion support for bash, zsh, and fish by @localai-bot in #8851
feat(downloader): add HF_MIRROR environment variable support by @localai-bot in #8847
feat: add Events column to Agents list page by @localai-bot in #8870
feat: Add tabs to System view for Models and Backends by @localai-bot in #8885
feat: Add --data-path CLI flag for persistent data separation by @localai-bot in #8888
feat(mlx-distributed): add new (experimental) MLX-distributed backend by @mudler in #8801
chore(size): display size of HF models and allow to specify it from the gallery by @mudler in #8907
feat(ui): add canvas mode, support history in agent chat by @mudler in #8927
feat(ui): MCP Apps, mcp streaming and client-side support by @mudler in #8947
feat: add fish-speech backend by @mudler in #8962
feat(backends): add ace-step.cpp by @mudler in #8965
feat(realtime): WebRTC support by @richiejp in #8790

🧠 Models

chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8693
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8694
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8695
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8696
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8698

📖 Documentation and examples

fix(realtime): Add functions to conversation history by @richiejp in #8616
docs: update diffusers multi-GPU documentation to mention tensor_parallel_size configuration by @localai-bot in #8621
docs: Update Home Assistant links in README.md by @loryanstrant in #8688

👒 Dependencies

chore(deps): bump fyne.io/fyne/v2 from 2.7.2 to 2.7.3 by @dependabot[bot] in #8629
chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.22.0 to 1.26.0 by @dependabot[bot] in #8630
chore(deps): bump github.com/gpustack/gguf-parser-go from 0.23.1 to 0.24.0 by @dependabot[bot] in #8631
chore(deps): bump actions/stale from 10.1.1 to 10.2.0 by @dependabot[bot] in #8633
chore(deps): bump goreleaser/goreleaser-action from 6 to 7 by @dependabot[bot] in #8634
chore(deps): bump github.com/mudler/cogito from 0.9.1-0.20260217143801-bb7f986ed2c7 to 0.9.1 by @dependabot[bot] in #8632
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/coqui by @dependabot[bot] in #8642
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/common/template by @dependabot[bot] in #8641
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/transformers by @dependabot[bot] in #8640
chore(deps): bump sentence-transformers from 5.2.2 to 5.2.3 in /backend/python/transformers by @dependabot[bot] in #8638
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/rerankers by @dependabot[bot] in #8636
chore(deps): bump grpcio from 1.76.0 to 1.78.1 in /backend/python/vllm by @dependabot[bot] in #8635
chore(deps): bump actions/download-artifact from 7 to 8 by @dependabot[bot] in #8729
chore(deps): bump actions/upload-artifact from 6 to 7 by @dependabot[bot] in #8730
chore(deps): bump github.com/openai/openai-go/v3 from 3.19.0 to 3.24.0 by @dependabot[bot] in #8732
chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.3.0 to 1.4.0 by @dependabot[bot] in #8733
chore(deps): bump go.opentelemetry.io/otel from 1.40.0 to 1.41.0 by @dependabot[bot] in #8734
chore(deps): bump go.opentelemetry.io/otel/metric from 1.40.0 to 1.41.0 by @dependabot[bot] in #8735
chore(deps): bump github.com/google/go-containerregistry from 0.20.7 to 0.21.1 by @dependabot[bot] in #8736
chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.40.0 to 1.42.0 by @dependabot[bot] in #8915
chore(deps): bump github.com/openai/openai-go/v3 from 3.24.0 to 3.26.0 by @dependabot[bot] in #8916
chore(deps): bump go.yaml.in/yaml/v2 from 2.4.3 to 2.4.4 by @dependabot[bot] in #8913
chore(deps): bump github.com/labstack/echo/v4 from 4.15.0 to 4.15.1 by @dependabot[bot] in #8914
chore(deps): bump docker/metadata-action from 5 to 6 by @dependabot[bot] in #8917
chore(deps): bump actions/setup-node from 4 to 6 by @dependabot[bot] in #8920
chore(deps): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #8919
chore(deps): bump docker/login-action from 3 to 4 by @dependabot[bot] in #8918
chore(deps): bump node from 22-slim to 25-slim by @dependabot[bot] in #8922

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #8618
chore: ⬆️ Update ggml-org/llama.cpp to f75c4e8bf52ea480ece07fd3d9a292f1d7f04bc5 by @localai-bot in #8619
chore: ⬆️ Update ggml-org/llama.cpp to 2b6dfe824de8600c061ef91ce5cc5c307f97112c by @localai-bot in #8622
chore: ⬆️ Update ggml-org/llama.cpp to b68a83e641b3ebe6465970b34e99f3f0e0a0b21a by @localai-bot in #8628
fix(webui): use different icon for System nav item by @localai-bot in #8648
chore: ⬆️ Update ggml-org/llama.cpp to 418dea39cea85d3496c8b04a118c3b17f3940ad8 by @localai-bot in #8649
feat(swagger): update swagger by @localai-bot in #8654
chore: ⬆️ Update ggml-org/llama.cpp to 3769fe6eb70b0a0fbb30b80917f1caae68c902f7 by @localai-bot in #8655
chore: ⬆️ Update ggml-org/llama.cpp to 723c71064da0908c19683f8c344715fbf6d986fd by @localai-bot in #8660
fix(qwen3.5): add qwen3.5 preset and mimick llama.cpp's PEG by @mudler in #8668
chore: ⬆️ Update ggml-org/whisper.cpp to 9453b4b9be9b73adfc35051083f37cefa039acee by @localai-bot in #8671
chore(deps): bump llama.cpp to 'ecbcb7ea9d3303097519723b264a8b5f1e977028' by @mudler in #8672
docs: add CDI driver config for NVIDIA GPU in containers (fix #8108) by @localai-bot in #8677
docs: add TLS reverse proxy configuration guide by @localai-bot in #8673
chore: ⬆️ Update ggml-org/llama.cpp to 05728db18eea59de81ee3a7699739daaf015206b by @localai-bot in #8683
fix: simplify CI steps, fix gallery agent by @mudler in #8685
fix: retry when LLM returns empty messages by @mudler in #8704
feat(swagger): update swagger by @localai-bot in #8706
feat: Add debug logging for pocket-tts voice issue #8244 by @localai-bot in #8715
fix(ci): correct transformer backend path typo by @localai-bot in #8712
chore: ⬆️ Update ggml-org/llama.cpp to 319146247e643695f94a558e8ae686277dd4f8da by @localai-bot in #8707
fix: Implement responsive line wrapping for model names (#8209) by @localai-bot in #8720
fix(qwen-tts): ensure all requirements files end with newline by @localai-bot in #8724
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8728
chore: ⬆️ Update ggml-org/llama.cpp to 4d828bd1ab52773ba9570cc008cf209eb4a8b2f5 by @localai-bot in #8727
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8743
chore: ⬆️ Update ggml-org/llama.cpp to ecd99d6a9acbc436bad085783bcd5d0b9ae9e9e9 by @localai-bot in #8762
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8770
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8778
chore: ⬆️ Update ggml-org/llama.cpp to 24d2ee052795063afffc9732465ca1b1c65f4a28 by @localai-bot in #8777
docs: add autonomous development team section to README by @localai-bot in #8780
feat: Rename 'Whisper' model type to 'STT' in UI by @localai-bot in #8785
chore: ⬆️ Update ggml-org/whisper.cpp to 30c5194c9691e4e9a98b3dea9f19727397d3f46e by @localai-bot in #8796
chore: ⬆️ Update ggml-org/llama.cpp to a0ed91a442ea6b013bd42ebc3887a81792eaefa1 by @localai-bot in #8797
feat: pass-by metadata to predict options by @mudler in #8795
fix: Add timeout-based wait for model deletion completion by @localai-bot in #8756
chore: Add LTX-2.3 model to gallery by @localai-bot in #8805
chore: ⬆️ Update ggml-org/llama.cpp to 566059a26b0ce8faec4ea053605719d399c64cc5 by @localai-bot in #8822
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8828
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8830
feat: update descriptions for first 9 models in gallery/index.yaml by @localai-bot in #8831
chore: ⬆️ Update ggml-org/llama.cpp to c5a778891ba0ddbd4cbb507c823f970595b1adc2 by @localai-bot in #8837
chore(docs): Populate coding guidelines in CONTRIBUTING.md by @localai-bot in #8840
fix: Remove debug print statement from soundgeneration.go (C2) by @localai-bot in #8843
docs: Add comprehensive API error reference documentation by @localai-bot in #8848
docs: add Table of Contents to README.md by @localai-bot in #8846
chore: ⬆️ Update leejet/stable-diffusion.cpp to c8fb3d245858d495be1f140efdcfaa0d49de41e5 by @localai-bot in #8841
docs: clarify SECURITY.md version support table with specific ranges and EOL dates by @localai-bot in #8861
fix: Correct Talk Interface screenshot reference in README.md (H6) by @localai-bot in #8857
feat: Add documentation for undocumented API endpoints by @localai-bot in #8852
docs: add comprehensive development setup instructions to CONTRIBUTING.md (H7) by @localai-bot in #8860
feat(cli): add configurable backend image fallback tags via CLI options by @localai-bot in #8817
feat: add MIT license badge to README.md by @localai-bot in #8871
feat: Create comprehensive troubleshooting guide (M1 task) by @localai-bot in #8856
docs: expand GPU acceleration guide with L4T, multi-GPU, monitoring, and troubleshooting by @localai-bot in #8858
feat: Add documentation URLs to CLI help text by @localai-bot in #8874
feat(functions): add peg-based parsing and allow backends to return tool calls directly by @mudler in #8838
chore: Update README.md screenshot references and alt text by @localai-bot in #8862
chore: ⬆️ Update ggml-org/llama.cpp to 35bee031e17ed2b2e8e7278b284a6c8cd120d9f8 by @localai-bot in #8872
docs: Update model compatibility documentation with missing backends by @localai-bot in #8889
docs: make examples repository link more prominent by @localai-bot in #8895
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8901
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8902
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8904
feat: Redesign explorer and models pages with react-ui theme by @localai-bot in #8903
fix(ui): minor visual enhancements by @mudler in #8909
feat(swagger): update swagger by @localai-bot in #8923
feat: Standardize CLI flag naming to kebab-case (M12) by @localai-bot in #8912
chore: ⬆️ Update leejet/stable-diffusion.cpp to d6dd6d7b555c233bb9bc9f20b4751eb8c9269743 by @localai-bot in #8925
chore: ⬆️ Update ggml-org/llama.cpp to 23fbfcb1ad6c6f76b230e8895254de785000be46 by @localai-bot in #8921
fix: correct grammar in CONTRIBUTING.md documentation section by @localai-bot in #8932
feat: Expand section index pages with comprehensive navigation (M7) by @localai-bot in #8929
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8939
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8945
chore: ⬆️ Update ggml-org/llama.cpp to 10e5b148b061569aaee8ae0cf72a703129df0eab by @localai-bot in #8946
fix: include model name in mmproj file path to prevent model isolation (#8937) by @localai-bot in #8940
feat(swagger): update swagger by @localai-bot in #8961
docs: Document GPU auto-fit mode limitations and trade-offs (closes #8562) by @localai-bot in #8954
chore(ui): improve errors and reporting during model installation by @mudler in #8979
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #8980
fix(collections): start agent pool after http server by @mudler in #8981
chore(model-gallery): ⬆️ update checksum by @localai-bot in #8985
chore: ⬆️ Update ggml-org/llama.cpp to 57819b8d4b39d893408e51520dff3d47d1ebb757 by @localai-bot in #8983
chore: ⬆️ Update ace-step/acestep.cpp to 5aa065445541094cba934299cd498bbb9fa5c434 by @localai-bot in #8984
fix(ui): Move routes to /app to avoid conflict with API endpoints by @richiejp in #8978
fix(conf): Don't overwrite env provided galleries with runtime conf by @richiejp in #8994
fix(flux.2-klein-9b): Use Qwen3-8b to avoid GGML assertion failure on tensor mismatch by @richiejp in #8995
fix(acestep-cpp): resolve relative model paths in options by @localai-bot in #8993
chore: ⬆️ Update ggml-org/llama.cpp to e30f1fdf74ea9238ff562901aa974c75aab6619b by @localai-bot in #8997

New Contributors

@loryanstrant made their first contribution in #8688
@bittoby made their first contribution in #8676
@Weathercold made their first contribution in #8842
@attilagyorffy made their first contribution in #8879

Full Changelog: v3.12.1...v4.0.0

mudler/LocalAI v4.0.0 on GitHub