🎉 LocalAI 4.1.0 Release! 🚀
LocalAI 4.1.0 is out! 🔥
Just weeks after the landmark 4.0, we're back with another massive drop. This release turns LocalAI into a production-grade AI platform: spin up a distributed cluster with smart routing and autoscaling, lock it down with built-in auth and per-user quotas, fine-tune models without leaving the UI, and much more. If 4.0 was the foundation, 4.1 is the control tower.
| Feature | Summary |
|---|---|
| 🌐 Distributed Mode | Run LocalAI as a cluster — smart routing, node groups, drain/resume, min/max autoscaling. |
| 🔐 Users & Auth | Built-in user management with OIDC, invite mode, API keys, and admin impersonation. |
| 📊 Quota System | Per-user usage quotas with predictive analytics and breakdown dashboards. |
| 🧪 Fine-Tuning | (experimental) Fine-tune models with TRL, auto-export to GGUF, and import back — all from the UI. |
| ⚗️ Quantization | (experimental) New backend for on-the-fly model quantization. |
| 🔧 Pipeline Editor | Visual model pipeline editor in the React UI. |
| 🤖 Standalone Agents | Run agents from the CLI with local-ai agent run.
|
| 🧠 Smart Inferencing | Auto inference defaults from Unsloth, tool parsing fallback, and min_p support.
|
| 🎬 Media History | Browse past generated images and media in Studio pages. |
New (long version) Full setup walktrough: https://www.youtube.com/watch?v=cMVNnlqwfw4
🚀 Key Features
🌐 Distributed Mode: scaling LocalAI horizontally
Run LocalAI as a distributed cluster and let it figure out where to send your requests. No more single-node bottlenecks.
- Smart Routing: Requests are routed to nodes ordered by available VRAM — the beefiest, free GPU gets the job.
- Node Groups: Pin models to specific node groups for workload isolation (e.g., "gpu-heavy" vs "cpu-light").
- Autoscaling: Built-in min/max autoscaler with a node reconciler that manages the lifecycle automatically.
- Drain & Resume: Gracefully drain nodes for maintenance and bring them back with a single API call.
- Cluster Dashboard: See your entire cluster status at a glance from the home page.
- Smart Model transfer: Use S3 or transfer via peer to peer
distributed-mode.mp4
🔐 Users, Authentication & Quotas
LocalAI now ships with a complete multi-user platform — perfect for teams, classrooms, or any shared deployment.
- User Management: Create, edit, and manage users from the React UI.
- OIDC/OAuth: Plug in your identity provider for SSO — Google, Keycloak, Authentik, you name it.
- Invite Mode: Restrict registration to invite-only with admin approval.
- API Keys: Per-user API key management.
- Admin Powers: Admins can impersonate users for debugging.
- Quota System: Set per-user usage quotas and enforce limits.
- Usage Analytics: Predictive usage dashboard with per-user breakdown statistics.
Users and quota:
usersquota-1775167475876.mp4
Usage metrics per user:
usage.mp4
🧪 Fine-Tuning & Quantization
No more juggling external tools. Fine-tune and quantize directly inside LocalAI.
- Fine-Tuning with TRL (Experimental): Train LoRA adapters with Hugging Face TRL, auto-export to GGUF, and import the result straight back into LocalAI. Includes a built-in evals framework to validate your work.
- Quantization Backend: Spin up the new quantization backend to create optimized model variants on-the-fly.
quantize-fine-tune.mp4
🎨 UI
The React UI keeps getting better. This release adds serious power-user features:
- Model Pipeline Editor: Visually wire up model pipelines — no YAML editing required.
- Per-Model Backend Logs: Drill into logs scoped to individual models for laser-focused debugging.
- Media History: Studio pages now remember your past generations — images, audio, and more.
- Searchable Model/Backend Selector: Quickly find models and backends with inline search and filtering.
- Structured Error Toasts: Errors now link directly to traces — one click from "something broke" to "here's why."
- Tracing Settings: Inline tracing config restored with a cleaner UI.
talk.mp4
🤖 Agents & Inference
- Standalone Agent Mode: Run agents straight from the terminal with
local-ai agent run. Supports single-turn--promptmode and pool-based configurations frompool.json. - Streaming Tool Calls: Agent mode tool calls now stream in real-time, with interleaved thinking fixed.
- Inferencing Defaults: Automatic inference parameters sourced from Unsloth and applied to all endpoints and gallery models, your models just work better out of the box.
- Tool Parsing Fallback: When native tool call parsing fails, an iterative fallback parser kicks in automatically.
🛠️ Under the Hood
- Repeated Log Merging: Noisy terminals? Repeated log lines are now collapsed automatically.
- Jetson/Tegra GPU Detection: First-class NVIDIA Jetson/Tegra platform detection.
- Intel SYCL Fix: Auto-disables
mmapfor SYCL backends to prevent crashes. - llama.cpp Portability: Bundled
libdl,librt,libpthreadfor improved cross-platform support. - HF_ENDPOINT Mirror: Downloader now rewrites HuggingFace URIs with
HF_ENDPOINTfor corporate/mirror setups. - Transformers >5.0: Bumped to HuggingFace Transformers >5.0 with generic model loading.
- API Improvements: Proper 404s for missing models, unescaped model names, unified inferencing paths with automatic retry on transient errors.
🐞 Fixes & Improvements
- Embeddings: Implemented
encoding_format=base64for the embeddings endpoint. - Kokoro TTS: Fixed phonemization model not downloading during installation.
- Realtime API: Fixed Opus codec backend selection alias in development mode.
- Gallery Filtering: Fixed exact tag matching for model gallery filters.
- Open Responses: Fixed required
ORItemParam.Argumentsfield being omitted;ORItemParam.Summarynow always populated. - Tracing: Fixed settings not loading from
runtime_settings.json. - UI: Fixed watchdog field mapping, model list refresh on deletion, backend display in model config, MCP button ordering.
- Downloads: Fixed directory removal during fallback attempts; improved retry logic.
- Model Paths: Fixed
baseDirassignment to useModelPathcorrectly.
❤️ Thank You
LocalAI is a community-powered FOSS movement. Every star, every PR, every bug report matters.
If you believe in privacy-first, self-hosted AI:
- ⭐ Star the repo — it helps more than you think
- 🛠️ Contribute code, docs, or feedback
- 📣 Share with your team, your community, your world
Let's keep building the future of open AI — together. 💪
✅ Full Changelog
📋 Click to expand full changelog
What's Changed
Bug fixes 🐛
- fix: Change baseDir assignment to use ModelPath by @mudler in #9010
- fix(ui): correctly map watchdog fields by @mudler in #9022
- fix(api): unescape model names by @mudler in #9024
- fix(ui): Add tracing inline settings back and create UI tests by @richiejp in #9027
- Always populate ORItemParam.Summary by @tv42 in #9049
- fix(ui): correctly display backend if specified in the model config, re-order MCP buttons by @mudler in #9053
- fix(ui): Refresh model list on deletion by @richiejp in #9059
- fix(openresponses): do not omit required field ORItemParam.Arguments by @tv42 in #9074
- fix: Add tracing settings loading from runtime_settings.json by @localai-bot in #9081
- fix: use exact tag matching for model gallery tag filtering by @majiayu000 in #9041
- fix(realtime): Set the alias for opus so the development backend can be selected by @richiejp in #9083
- fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend by @mudler in #9099
- fix(download): do not remove dst dir until we try all fallbacks by @mudler in #9100
- fix(auth): do not allow to register in invite mode by @mudler in #9101
- fix(downloader): Rewrite full https HF URI with HF_ENDPOINT by @richiejp in #9107
- fix: implement encoding_format=base64 for embeddings endpoint by @walcz-de in #9135
- fix(coqui,nemo,voxcpm): Add dependencies to allow CI to progress by @richiejp in #9142
- fix(voxcpm): Force using a recent voxcpm version to kick the dependency solver by @richiejp in #9150
- fix: huggingface repo change the file name so Update index.yaml is needed by @ER-EPR in #9163
- fix(kokoro): Download phonemization model during installation by @richiejp in #9165
- fix(oauth/invite): do not register user (prending approval) without correct invite by @mudler in #9189
- fix(inflight): count inflight from load model, but release afterwards by @mudler in #9194
Exciting New Features 🎉
- feat: support streaming mode for tool calls in agent mode, fix interleaved thinking stream by @mudler in #9023
- feat(ui): Per model backend logs and various fixes by @richiejp in #9028
- feat(ui, gallery): Show model backends and add searchable model/backend selector by @richiejp in #9060
- feat: add users and authentication support by @mudler in #9061
- feat(ui, openai): Structured errors and link to traces in error toast by @richiejp in #9068
- feat(ui): Add model pipeline editor by @richiejp in #9070
- feat: add (experimental) fine-tuning support with TRL by @mudler in #9088
- feat(ui): add predictor for usage, user-breakdown statistics by @mudler in #9091
- feat: add quota system by @mudler in #9090
- feat(quantization): add quantization backend by @mudler in #9096
- feat: inferencing default, automatic tool parsing fallback and wire min_p by @mudler in #9092
- feat: Merge repeated log lines in the terminal by @richiejp in #9141
- feat: add distributed mode by @mudler in #9124
- feat(ui): Add media history to studio pages (e.g. past images) by @richiejp in #9151
- feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler by @mudler in #9186
- feat(api): Return 404 when model is not found except for model names in HF format by @richiejp in #9133
- feat(distributed): Avoid resending models to backend nodes by @richiejp in #9193
- feat: add resume endpoint to undrain nodes by @mudler in #9197
👒 Dependencies
- chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.26.0 to 1.27.0 by @dependabot[bot] in #9035
- chore(deps): bump github.com/ebitengine/purego from 0.9.1 to 0.10.0 by @dependabot[bot] in #9034
- chore(deps): bump actions/upload-artifact from 4 to 7 by @dependabot[bot] in #9030
- chore(deps): bump github.com/google/go-containerregistry from 0.21.1 to 0.21.2 by @dependabot[bot] in #9033
- chore(deps): bump playwright from 1.52.0 to 1.58.2 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in #9055
- chore(deps): bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 by @dependabot[bot] in #9121
- chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.0 to 1.4.1 by @dependabot[bot] in #9118
- chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #9110
- chore(deps): bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #9114
- chore(deps): bump github.com/mudler/skillserver from 0.0.5 to 0.0.6 by @dependabot[bot] in #9116
- chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #9173
- chore(deps): bump actions/deploy-pages from 4 to 5 by @dependabot[bot] in #9172
- chore(deps): bump actions/configure-pages from 5 to 6 by @dependabot[bot] in #9174
- chore(deps): bump google.golang.org/grpc from 1.79.1 to 1.79.3 by @dependabot[bot] in #9175
- chore(deps): bump github.com/nats-io/nats.go from 1.49.0 to 1.50.0 by @dependabot[bot] in #9183
- chore(deps): bump github.com/pion/webrtc/v4 from 4.2.9 to 4.2.11 by @dependabot[bot] in #9185
- chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.62.0 to 0.64.0 by @dependabot[bot] in #9178
- chore(deps): bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #9179
- chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/transformers by @dependabot[bot] in #9180
- chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/vllm by @dependabot[bot] in #9177
- chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/coqui by @dependabot[bot] in #9182
- chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/rerankers by @dependabot[bot] in #9181
- chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/common/template by @dependabot[bot] in #9176
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9008
- chore: ⬆️ Update ggml-org/llama.cpp to
3a6f059909ed5dab8587df5df4120315053d57a4by @localai-bot in #9009 - fix: Automatically disable mmap for Intel SYCL backends (#9012) by @localai-bot in #9015
- chore: ⬆️ Update leejet/stable-diffusion.cpp to
862a6586cb6fcec037c14f9ed902329ecec7d990by @localai-bot in #9019 - chore: ⬆️ Update ggml-org/llama.cpp to
88915cb55c14769738fcab7f1c6eaa6dcc9c2b0cby @localai-bot in #9020 - chore: refactor endpoints to use same inferencing path, add automatic retrial mechanism in case of errors by @mudler in #9029
- chore: ⬆️ Update ggml-org/whisper.cpp to
79218f51d02ffe70575ef7fba3496dfc7adda027by @localai-bot in #9037 - chore: ⬆️ Update ggml-org/llama.cpp to
9b342d0a9f2f4892daec065491583ec2be129685by @localai-bot in #9039 - chore: ⬆️ Update ace-step/acestep.cpp to
15740f4301b3ec3020875f1fb975a6cfdb2f6767by @localai-bot in #9038 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
545fac4f3fb0117a4e962b1a04cf933a7e635933by @localai-bot in #9036 - chore: ⬆️ Update ggml-org/llama.cpp to
ee4801e5a6ee7ee4063144ab44ab4e127f76fba8by @localai-bot in #9044 - chore: ⬆️ Update ggml-org/whisper.cpp to
dc9611662265870df22a7230b7586176a99c1955by @localai-bot in #9045 - chore: ⬆️ Update ace-step/acestep.cpp to
ab020a9aefcd364423e0665da12babc6b0c7b507by @localai-bot in #9046 - feat: Add standalone agent run mode inspired by LocalAGI by @localai-bot in #9056
- chore: ⬆️ Update ggml-org/whisper.cpp to
ef3463bb29ef90d25dfabfd1e75993111c52412dby @localai-bot in #9062 - chore: ⬆️ Update ggml-org/llama.cpp to
5744d7ec430e2f875a393770195fda530560773fby @localai-bot in #9063 - docs: Add troubleshooting guide for embedding models (fixes #9064) by @localai-bot in #9065
- feat(swagger): update swagger by @localai-bot in #9075
- chore: ⬆️ Update ggml-org/whisper.cpp to
9386f239401074690479731c1e41683fbbeac557by @localai-bot in #9077 - chore(deps): bump llama-cpp to 'a0bbcdd9b6b83eeeda6f1216088f42c33d464e38' by @mudler in #9079
- feat(swagger): update swagger by @localai-bot in #9085
- chore: ⬆️ Update ggml-org/llama.cpp to
4cb7e0bd61e7e1101e8ab10db5dee70c5717a386by @localai-bot in #9087 - chore: ⬆️ Update ace-step/acestep.cpp to
7326a7bea0c2037982ec924f7364e998df70450cby @localai-bot in #9086 - chore: ⬆️ Update ggml-org/whisper.cpp to
76684141a5d059be71cbe23dc2f0ed552213ba2dby @localai-bot in #9094 - chore: ⬆️ Update ggml-org/llama.cpp to
990e4d96980d0b016a2b07049cc9031642fb9903by @localai-bot in #9095 - chore(transformers): bump to >5.0 and generically load models by @mudler in #9097
- feat(swagger): update swagger by @localai-bot in #9103
- chore: ⬆️ Update ggml-org/llama.cpp to
49bfddeca18e62fa3d39114a23e9fcbdf8a22388by @localai-bot in #9102 - chore: ⬆️ Update ggml-org/llama.cpp to
1772701f99dd3fc13f5783b282c2361eda8ca47cby @localai-bot in #9123 - chore: ⬆️ Update ggml-org/llama.cpp to
9f102a1407ed5d73b8c954f32edab50f8dfa3f58by @localai-bot in #9127 - chore: ⬆️ Update ace-step/acestep.cpp to
6f35c874ee11e86d511b860019b84976f5b52d3aby @localai-bot in #9128 - fix(docs): Use notice instead of alert by @richiejp in #9134
- chore: ⬆️ Update ggml-org/llama.cpp to
a970515bdb0b1d09519106847660b0d0c84d2472by @localai-bot in #9137 - feat(swagger): update swagger by @localai-bot in #9136
- chore: ⬆️ Update ggml-org/llama.cpp to
59d840209a5195c2f6e2e81b5f8339a0637b59d9by @localai-bot in #9144 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
f16a110f8776398ef23a2a6b7b57522c2471637aby @localai-bot in #9167 - chore: ⬆️ Update ggml-org/whisper.cpp to
95ea8f9bfb03a15db08a8989966fd1ae3361e20dby @localai-bot in #9168 - chore: ⬆️ Update ggml-org/llama.cpp to
7c203670f8d746382247ed369fea7fbf10df8ae0by @localai-bot in #9160 - chore(workers): improve logging, set header timeouts by @mudler in #9171
- chore(ci): Scope tests extras backend tests by @richiejp in #9170
- feat(swagger): update swagger by @localai-bot in #9187
- chore: ⬆️ Update ggml-org/llama.cpp to
08f21453aec846867b39878500d725a05bd32683by @localai-bot in #9190 - stablediffusion-ggml: replace hand-maintained enum string arrays with upstream API calls by @Copilot in #9192
- chore: ⬆️ Update leejet/stable-diffusion.cpp to
09b12d5f6d51d862749e8e0ee8baac8f012089e2by @localai-bot in #9195 - chore: ⬆️ Update ggml-org/llama.cpp to
0fcb3760b2b9a3a496ef14621a7e4dad7a8df90fby @localai-bot in #9196
New Contributors
- @tv42 made their first contribution in #9049
- @walcz-de made their first contribution in #9135
- @ER-EPR made their first contribution in #9163
Full Changelog: v4.0.0...v4.1.0
