🎉 LocalAI 4.1.0 Release! 🚀

LocalAI 4.1.0 is out! 🔥

Just weeks after the landmark 4.0, we're back with another massive drop. This release turns LocalAI into a production-grade AI platform: spin up a distributed cluster with smart routing and autoscaling, lock it down with built-in auth and per-user quotas, fine-tune models without leaving the UI, and much more. If 4.0 was the foundation, 4.1 is the control tower.

Feature	Summary
🌐 Distributed Mode	Run LocalAI as a cluster — smart routing, node groups, drain/resume, min/max autoscaling.
🔐 Users & Auth	Built-in user management with OIDC, invite mode, API keys, and admin impersonation.
📊 Quota System	Per-user usage quotas with predictive analytics and breakdown dashboards.
🧪 Fine-Tuning	(experimental) Fine-tune models with TRL, auto-export to GGUF, and import back — all from the UI.
⚗️ Quantization	(experimental) New backend for on-the-fly model quantization.
🔧 Pipeline Editor	Visual model pipeline editor in the React UI.
🤖 Standalone Agents	Run agents from the CLI with `local-ai agent run`.
🧠 Smart Inferencing	Auto inference defaults from Unsloth, tool parsing fallback, and `min_p` support.
🎬 Media History	Browse past generated images and media in Studio pages.

New (long version) Full setup walktrough: https://www.youtube.com/watch?v=cMVNnlqwfw4

🚀 Key Features

🌐 Distributed Mode: scaling LocalAI horizontally

Run LocalAI as a distributed cluster and let it figure out where to send your requests. No more single-node bottlenecks.

Smart Routing: Requests are routed to nodes ordered by available VRAM — the beefiest, free GPU gets the job.
Node Groups: Pin models to specific node groups for workload isolation (e.g., "gpu-heavy" vs "cpu-light").
Autoscaling: Built-in min/max autoscaler with a node reconciler that manages the lifecycle automatically.
Drain & Resume: Gracefully drain nodes for maintenance and bring them back with a single API call.
Cluster Dashboard: See your entire cluster status at a glance from the home page.
Smart Model transfer: Use S3 or transfer via peer to peer

distributed-mode.mp4

🔐 Users, Authentication & Quotas

LocalAI now ships with a complete multi-user platform — perfect for teams, classrooms, or any shared deployment.

User Management: Create, edit, and manage users from the React UI.
OIDC/OAuth: Plug in your identity provider for SSO — Google, Keycloak, Authentik, you name it.
Invite Mode: Restrict registration to invite-only with admin approval.
API Keys: Per-user API key management.
Admin Powers: Admins can impersonate users for debugging.
Quota System: Set per-user usage quotas and enforce limits.
Usage Analytics: Predictive usage dashboard with per-user breakdown statistics.

Users and quota:

usersquota-1775167475876.mp4

Usage metrics per user:

usage.mp4

🧪 Fine-Tuning & Quantization

No more juggling external tools. Fine-tune and quantize directly inside LocalAI.

Fine-Tuning with TRL (Experimental): Train LoRA adapters with Hugging Face TRL, auto-export to GGUF, and import the result straight back into LocalAI. Includes a built-in evals framework to validate your work.
Quantization Backend: Spin up the new quantization backend to create optimized model variants on-the-fly.

quantize-fine-tune.mp4

🎨 UI

The React UI keeps getting better. This release adds serious power-user features:

Model Pipeline Editor: Visually wire up model pipelines — no YAML editing required.
Per-Model Backend Logs: Drill into logs scoped to individual models for laser-focused debugging.
Media History: Studio pages now remember your past generations — images, audio, and more.
Searchable Model/Backend Selector: Quickly find models and backends with inline search and filtering.
Structured Error Toasts: Errors now link directly to traces — one click from "something broke" to "here's why."
Tracing Settings: Inline tracing config restored with a cleaner UI.

talk.mp4

🤖 Agents & Inference

Standalone Agent Mode: Run agents straight from the terminal with local-ai agent run. Supports single-turn --prompt mode and pool-based configurations from pool.json.
Streaming Tool Calls: Agent mode tool calls now stream in real-time, with interleaved thinking fixed.
Inferencing Defaults: Automatic inference parameters sourced from Unsloth and applied to all endpoints and gallery models, your models just work better out of the box.
Tool Parsing Fallback: When native tool call parsing fails, an iterative fallback parser kicks in automatically.

🛠️ Under the Hood

Repeated Log Merging: Noisy terminals? Repeated log lines are now collapsed automatically.
Jetson/Tegra GPU Detection: First-class NVIDIA Jetson/Tegra platform detection.
Intel SYCL Fix: Auto-disables mmap for SYCL backends to prevent crashes.
llama.cpp Portability: Bundled libdl, librt, libpthread for improved cross-platform support.
HF_ENDPOINT Mirror: Downloader now rewrites HuggingFace URIs with HF_ENDPOINT for corporate/mirror setups.
Transformers >5.0: Bumped to HuggingFace Transformers >5.0 with generic model loading.
API Improvements: Proper 404s for missing models, unescaped model names, unified inferencing paths with automatic retry on transient errors.

🐞 Fixes & Improvements

Embeddings: Implemented encoding_format=base64 for the embeddings endpoint.
Kokoro TTS: Fixed phonemization model not downloading during installation.
Realtime API: Fixed Opus codec backend selection alias in development mode.
Gallery Filtering: Fixed exact tag matching for model gallery filters.
Open Responses: Fixed required ORItemParam.Arguments field being omitted; ORItemParam.Summary now always populated.
Tracing: Fixed settings not loading from runtime_settings.json.
UI: Fixed watchdog field mapping, model list refresh on deletion, backend display in model config, MCP button ordering.
Downloads: Fixed directory removal during fallback attempts; improved retry logic.
Model Paths: Fixed baseDir assignment to use ModelPath correctly.

❤️ Thank You

LocalAI is a community-powered FOSS movement. Every star, every PR, every bug report matters.

If you believe in privacy-first, self-hosted AI:

⭐ Star the repo — it helps more than you think
🛠️ Contribute code, docs, or feedback
📣 Share with your team, your community, your world

Let's keep building the future of open AI — together. 💪

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes 🐛

fix: Change baseDir assignment to use ModelPath by @mudler in #9010
fix(ui): correctly map watchdog fields by @mudler in #9022
fix(api): unescape model names by @mudler in #9024
fix(ui): Add tracing inline settings back and create UI tests by @richiejp in #9027
Always populate ORItemParam.Summary by @tv42 in #9049
fix(ui): correctly display backend if specified in the model config, re-order MCP buttons by @mudler in #9053
fix(ui): Refresh model list on deletion by @richiejp in #9059
fix(openresponses): do not omit required field ORItemParam.Arguments by @tv42 in #9074
fix: Add tracing settings loading from runtime_settings.json by @localai-bot in #9081
fix: use exact tag matching for model gallery tag filtering by @majiayu000 in #9041
fix(realtime): Set the alias for opus so the development backend can be selected by @richiejp in #9083
fix(llama.cpp): bundle libdl, librt, libpthread in llama-cpp backend by @mudler in #9099
fix(download): do not remove dst dir until we try all fallbacks by @mudler in #9100
fix(auth): do not allow to register in invite mode by @mudler in #9101
fix(downloader): Rewrite full https HF URI with HF_ENDPOINT by @richiejp in #9107
fix: implement encoding_format=base64 for embeddings endpoint by @walcz-de in #9135
fix(coqui,nemo,voxcpm): Add dependencies to allow CI to progress by @richiejp in #9142
fix(voxcpm): Force using a recent voxcpm version to kick the dependency solver by @richiejp in #9150
fix: huggingface repo change the file name so Update index.yaml is needed by @ER-EPR in #9163
fix(kokoro): Download phonemization model during installation by @richiejp in #9165
fix(oauth/invite): do not register user (prending approval) without correct invite by @mudler in #9189
fix(inflight): count inflight from load model, but release afterwards by @mudler in #9194

Exciting New Features 🎉

feat: support streaming mode for tool calls in agent mode, fix interleaved thinking stream by @mudler in #9023
feat(ui): Per model backend logs and various fixes by @richiejp in #9028
feat(ui, gallery): Show model backends and add searchable model/backend selector by @richiejp in #9060
feat: add users and authentication support by @mudler in #9061
feat(ui, openai): Structured errors and link to traces in error toast by @richiejp in #9068
feat(ui): Add model pipeline editor by @richiejp in #9070
feat: add (experimental) fine-tuning support with TRL by @mudler in #9088
feat(ui): add predictor for usage, user-breakdown statistics by @mudler in #9091
feat: add quota system by @mudler in #9090
feat(quantization): add quantization backend by @mudler in #9096
feat: inferencing default, automatic tool parsing fallback and wire min_p by @mudler in #9092
feat: Merge repeated log lines in the terminal by @richiejp in #9141
feat: add distributed mode by @mudler in #9124
feat(ui): Add media history to studio pages (e.g. past images) by @richiejp in #9151
feat: add node reconciler, allow to schedule to group of nodes, min/max autoscaler by @mudler in #9186
feat(api): Return 404 when model is not found except for model names in HF format by @richiejp in #9133
feat(distributed): Avoid resending models to backend nodes by @richiejp in #9193
feat: add resume endpoint to undrain nodes by @mudler in #9197

👒 Dependencies

chore(deps): bump github.com/anthropics/anthropic-sdk-go from 1.26.0 to 1.27.0 by @dependabot[bot] in #9035
chore(deps): bump github.com/ebitengine/purego from 0.9.1 to 0.10.0 by @dependabot[bot] in #9034
chore(deps): bump actions/upload-artifact from 4 to 7 by @dependabot[bot] in #9030
chore(deps): bump github.com/google/go-containerregistry from 0.21.1 to 0.21.2 by @dependabot[bot] in #9033
chore(deps): bump playwright from 1.52.0 to 1.58.2 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in #9055
chore(deps): bump github.com/google/go-containerregistry from 0.21.2 to 0.21.3 by @dependabot[bot] in #9121
chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.0 to 1.4.1 by @dependabot[bot] in #9118
chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #9110
chore(deps): bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #9114
chore(deps): bump github.com/mudler/skillserver from 0.0.5 to 0.0.6 by @dependabot[bot] in #9116
chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #9173
chore(deps): bump actions/deploy-pages from 4 to 5 by @dependabot[bot] in #9172
chore(deps): bump actions/configure-pages from 5 to 6 by @dependabot[bot] in #9174
chore(deps): bump google.golang.org/grpc from 1.79.1 to 1.79.3 by @dependabot[bot] in #9175
chore(deps): bump github.com/nats-io/nats.go from 1.49.0 to 1.50.0 by @dependabot[bot] in #9183
chore(deps): bump github.com/pion/webrtc/v4 from 4.2.9 to 4.2.11 by @dependabot[bot] in #9185
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.62.0 to 0.64.0 by @dependabot[bot] in #9178
chore(deps): bump actions/upload-pages-artifact from 3 to 4 by @dependabot[bot] in #9179
chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/transformers by @dependabot[bot] in #9180
chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/vllm by @dependabot[bot] in #9177
chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/coqui by @dependabot[bot] in #9182
chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/rerankers by @dependabot[bot] in #9181
chore(deps): bump grpcio from 1.78.1 to 1.80.0 in /backend/python/common/template by @dependabot[bot] in #9176

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9008
chore: ⬆️ Update ggml-org/llama.cpp to 3a6f059909ed5dab8587df5df4120315053d57a4 by @localai-bot in #9009
fix: Automatically disable mmap for Intel SYCL backends (#9012) by @localai-bot in #9015
chore: ⬆️ Update leejet/stable-diffusion.cpp to 862a6586cb6fcec037c14f9ed902329ecec7d990 by @localai-bot in #9019
chore: ⬆️ Update ggml-org/llama.cpp to 88915cb55c14769738fcab7f1c6eaa6dcc9c2b0c by @localai-bot in #9020
chore: refactor endpoints to use same inferencing path, add automatic retrial mechanism in case of errors by @mudler in #9029
chore: ⬆️ Update ggml-org/whisper.cpp to 79218f51d02ffe70575ef7fba3496dfc7adda027 by @localai-bot in #9037
chore: ⬆️ Update ggml-org/llama.cpp to 9b342d0a9f2f4892daec065491583ec2be129685 by @localai-bot in #9039
chore: ⬆️ Update ace-step/acestep.cpp to 15740f4301b3ec3020875f1fb975a6cfdb2f6767 by @localai-bot in #9038
chore: ⬆️ Update leejet/stable-diffusion.cpp to 545fac4f3fb0117a4e962b1a04cf933a7e635933 by @localai-bot in #9036
chore: ⬆️ Update ggml-org/llama.cpp to ee4801e5a6ee7ee4063144ab44ab4e127f76fba8 by @localai-bot in #9044
chore: ⬆️ Update ggml-org/whisper.cpp to dc9611662265870df22a7230b7586176a99c1955 by @localai-bot in #9045
chore: ⬆️ Update ace-step/acestep.cpp to ab020a9aefcd364423e0665da12babc6b0c7b507 by @localai-bot in #9046
feat: Add standalone agent run mode inspired by LocalAGI by @localai-bot in #9056
chore: ⬆️ Update ggml-org/whisper.cpp to ef3463bb29ef90d25dfabfd1e75993111c52412d by @localai-bot in #9062
chore: ⬆️ Update ggml-org/llama.cpp to 5744d7ec430e2f875a393770195fda530560773f by @localai-bot in #9063
docs: Add troubleshooting guide for embedding models (fixes #9064) by @localai-bot in #9065
feat(swagger): update swagger by @localai-bot in #9075
chore: ⬆️ Update ggml-org/whisper.cpp to 9386f239401074690479731c1e41683fbbeac557 by @localai-bot in #9077
chore(deps): bump llama-cpp to 'a0bbcdd9b6b83eeeda6f1216088f42c33d464e38' by @mudler in #9079
feat(swagger): update swagger by @localai-bot in #9085
chore: ⬆️ Update ggml-org/llama.cpp to 4cb7e0bd61e7e1101e8ab10db5dee70c5717a386 by @localai-bot in #9087
chore: ⬆️ Update ace-step/acestep.cpp to 7326a7bea0c2037982ec924f7364e998df70450c by @localai-bot in #9086
chore: ⬆️ Update ggml-org/whisper.cpp to 76684141a5d059be71cbe23dc2f0ed552213ba2d by @localai-bot in #9094
chore: ⬆️ Update ggml-org/llama.cpp to 990e4d96980d0b016a2b07049cc9031642fb9903 by @localai-bot in #9095
chore(transformers): bump to >5.0 and generically load models by @mudler in #9097
feat(swagger): update swagger by @localai-bot in #9103
chore: ⬆️ Update ggml-org/llama.cpp to 49bfddeca18e62fa3d39114a23e9fcbdf8a22388 by @localai-bot in #9102
chore: ⬆️ Update ggml-org/llama.cpp to 1772701f99dd3fc13f5783b282c2361eda8ca47c by @localai-bot in #9123
chore: ⬆️ Update ggml-org/llama.cpp to 9f102a1407ed5d73b8c954f32edab50f8dfa3f58 by @localai-bot in #9127
chore: ⬆️ Update ace-step/acestep.cpp to 6f35c874ee11e86d511b860019b84976f5b52d3a by @localai-bot in #9128
fix(docs): Use notice instead of alert by @richiejp in #9134
chore: ⬆️ Update ggml-org/llama.cpp to a970515bdb0b1d09519106847660b0d0c84d2472 by @localai-bot in #9137
feat(swagger): update swagger by @localai-bot in #9136
chore: ⬆️ Update ggml-org/llama.cpp to 59d840209a5195c2f6e2e81b5f8339a0637b59d9 by @localai-bot in #9144
chore: ⬆️ Update leejet/stable-diffusion.cpp to f16a110f8776398ef23a2a6b7b57522c2471637a by @localai-bot in #9167
chore: ⬆️ Update ggml-org/whisper.cpp to 95ea8f9bfb03a15db08a8989966fd1ae3361e20d by @localai-bot in #9168
chore: ⬆️ Update ggml-org/llama.cpp to 7c203670f8d746382247ed369fea7fbf10df8ae0 by @localai-bot in #9160
chore(workers): improve logging, set header timeouts by @mudler in #9171
chore(ci): Scope tests extras backend tests by @richiejp in #9170
feat(swagger): update swagger by @localai-bot in #9187
chore: ⬆️ Update ggml-org/llama.cpp to 08f21453aec846867b39878500d725a05bd32683 by @localai-bot in #9190
stablediffusion-ggml: replace hand-maintained enum string arrays with upstream API calls by @Copilot in #9192
chore: ⬆️ Update leejet/stable-diffusion.cpp to 09b12d5f6d51d862749e8e0ee8baac8f012089e2 by @localai-bot in #9195
chore: ⬆️ Update ggml-org/llama.cpp to 0fcb3760b2b9a3a496ef14621a7e4dad7a8df90f by @localai-bot in #9196

New Contributors

@tv42 made their first contribution in #9049
@walcz-de made their first contribution in #9135
@ER-EPR made their first contribution in #9163

Full Changelog: v4.0.0...v4.1.0

mudler/LocalAI v4.1.0 on GitHub