Xmas-release 🎅 LocalAI 3.9.0! 🚀
LocalAI 3.9.0 is focused on stability, resource efficiency, and smarter agent workflows. We've addressed critical issues with model loading, improved system resource management, and introduced a new Agent Jobs panel for scheduling and managing background agentic tasks. Whether you're running models locally or orchestrating complex agent workflows, this release makes it faster, more reliable, and easier to manage.
📌 TL;DR
| Feature | Summary |
|---|---|
| Agent Jobs Panel | Schedule and run background tasks with cron or via API — perfect for automated workflows. |
| Smart Memory Reclaimer | Automatically frees up GPU/VRAM by evicting least recently used models when memory is low. |
| LRU Model Eviction | Models are automatically unloaded from memory based on usage patterns to prevent crashes. |
| MLX & CUDA 13 Support | New model backends and enhanced GPU compatibility for modern hardware. |
| UI Polish & Fixes | Cleaned-up navigation, fixed layout overflow, and various improvements. |
| Vibevoice | Added support for the vibevoice backend! |
🚀 New Features
🤖 Agent Jobs Panel: Schedule & Automate Tasks
LocalAI 3.9.0 introduces a new Agent Jobs panel in the web UI and API, allowing you to create, run, and schedule agentic tasks in the background that can be started programmatically via API or from the Web interface.
- Run agent prompts on a schedule using cron syntax, or via API.
- Agents are defined via the model settings, supporting MCP.
- Trigger jobs via API for integration into CI/CD or external tools.
- Optionally send results to a webhook for post-processing.
- Templates and prompts can be dynamically populated with variables.
✅ Use cases: Daily reports, CI integration, automated data processing, scheduled model evaluations.
🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources
We’ve introduced a new Memory Reclaimer that monitors system memory usage and automatically frees up GPU/VRAM when needed.
- Tracks memory consumption across all backends.
- When usage exceeds a configured threshold, it evicts the least recently used (LRU) models.
- Prevents out-of-memory crashes and keeps your system stable during high load.
This is a step toward adaptive resource management, future versions will expand this with more advanced policies and giving more control.
🔁 LRU Model Eviction: Intelligent Model Management
Building on the new reclaimer, LocalAI now supports LRU (Least Recently Used) eviction for loaded models.
- Set a maximum number of models to keep in memory (e.g., limit to 3).
- When a new model is loaded and the limit is reached, the oldest unused model is automatically unloaded.
- Fully compatible with
single_active_backendmode (now defaults to LRU=1 for backward compatibility).
💡 Ideal for servers with limited VRAM or when running multiple models in parallel.
🖥️ UI & UX Polish
- Fixed navbar ordering and login icon — clearer navigation and better visual flow.
- Prevented tool call overflow in chat view — no more clipped or misaligned content.
- Uniformed link paths (e.g.,
/browse/instead ofbrowse) for consistency. - Fixed model selection toggle — header updates correctly when switching models.
- Consistent button styling — uniform colors, hover effects, and accessibility.
📦 Backward Compatibility & Architecture
- Dropped x86_64 Mac support: no longer maintained in GitHub Actions; ARM64 (M1/M2/M3/M4) is now the recommended architecture.
- Updated data storage path from
/usr/shareto/var/lib: follows Linux conventions for mutable data. - Added CUDA 13 support: now available in Docker images and L4T builds.
- New VibeVoice TTS backend real-time text-to-speech with voice cloning support. You can install it from the model gallery!
- StableDiffusion-GGML now supports LoRA: expand your image-generation capabilities.
🛠️ Fixes & Improvements
- Issue: After v3.8.0,
/readyzand/healthzendpoints required authentication, breaking Docker health checks and monitoring tools - Issue: Fixed crashes when importing models from Hugging Face URLs with subfolders (e.g.,
huggingface://user/model/GGUF/model.gguf).
🚀 The Complete Local Stack for Privacy-First AI
LocalAI |
The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
LocalAGI |
Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
LocalRecall |
RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI. |
❤️ Thank You
LocalAI is a true FOSS movement — built by contributors, powered by community.
If you believe in privacy-first AI:
- ✅ Star the repo
- 💬 Contribute code, docs, or feedback
- 📣 Share with others
Your support keeps this stack alive.
✅ Full Changelog
📋 Click to expand full changelog
What's Changed
Breaking Changes 🛠
- chore: switch from /usr/share to /var/lib for data storage by @poretsky in #7361
- chore: drop drawin-x86_64 support by @mudler in #7616
Bug fixes 🐛
- fix: do not require auth for readyz/healthz endpoints by @mudler in #7403
- fix(ui): navbar ordering and login icon by @mudler in #7407
- fix: configure sbsa packages for arm64 by @mudler in #7413
- fix(ui): prevent box overflow in chat view by @mudler in #7430
- fix(ui): Update few links in web UI from 'browse' to '/browse/' by @rampa3 in #7445
- fix(paths): remove trailing slash from requests by @mudler in #7451
- fix(downloader): do not download model files if not necessary by @mudler in #7492
- fix(config): make syncKnownUsecasesFromString idempotent by @mudler in #7493
- fix: make sure to close on errors by @mudler in #7521
- fix(llama.cpp): handle corner cases with tool array content by @mudler in #7528
- fix(7355): Update llama-cpp grpc for v3 interface by @sredman in #7566
- fix(chat-ui): model selection toggle and new chat by @mudler in #7574
- fix: improve ram estimation by @mudler in #7603
- fix(ram): do not read from cgroup by @mudler in #7606
- fix: correctly propagate error during model load by @mudler in #7610
- fix(ci): remove specific version for grpcio packages by @mudler in #7627
- fix(uri): consider subfolders when expanding huggingface URLs by @mintyleaf in #7634
Exciting New Features 🎉
- feat: agent jobs panel by @mudler in #7390
- chore: refactor css, restyle to be slightly minimalistic by @mudler in #7397
- feat(hf-api): return files in nested directories by @mudler in #7396
- feat(agent-jobs): add multimedia support by @mudler in #7398
- feat: add cuda13 images by @mudler in #7404
- fix: use ubuntu 24.04 for cuda13 l4t images by @mudler in #7418
- feat(diffusers): implement dynamic pipeline loader to remove per-pipeline conditionals by @Copilot in #7365
- chore(importers/llama.cpp): add models to 'llama-cpp' subfolder by @mudler in #7450
- feat(vibevoice): add new backend by @mudler in #7494
- feat(ui): allow to order search results by @mudler in #7507
- feat(loader): enhance single active backend to support LRU eviction by @mudler in #7535
- feat(stablediffusion-ggml): add lora support by @mudler in #7542
- feat(ui): add mask to install custom backends by @mudler in #7559
- feat(watchdog): add Memory resource reclaimer by @mudler in #7583
- feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling by @blightbow in #7556
- feat(whisper): Add prompt to condition transcription output by @richiejp in #7624
🧠 Models
- feat(stablediffusion): Passthrough more parameters to support z-image and flux2 by @richiejp in #7414
- Revert "feat(stablediffusion): Passthrough more parameters to support z-image and flux2" by @mudler in #7417
- feat(stablediffusion): Passthrough more parameters to support z-image and flux2 by @richiejp in #7419
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #7433
- fix(stablediffusion-ggml): Correct Z-Image model name by @richiejp in #7436
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #7437
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #7530
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7700
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7707
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7712
📖 Documentation and examples
- chore: Add AGENTS.md by @richiejp in #7688
- chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params by @mudler in #7706
👒 Dependencies
- chore(deps): bump github.com/google/go-containerregistry from 0.19.2 to 0.20.7 by @dependabot[bot] in #7409
- chore(deps): bump appleboy/ssh-action from 1.2.3 to 1.2.4 by @dependabot[bot] in #7410
- chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.38.0 to 1.39.0 by @dependabot[bot] in #7476
- chore(deps): bump actions/stale from 10.1.0 to 10.1.1 by @dependabot[bot] in #7473
- chore(deps): bump protobuf from 6.33.1 to 6.33.2 in /backend/python/transformers by @dependabot[bot] in #7481
- chore(deps): bump github.com/mudler/cogito from 0.5.1 to 0.6.0 by @dependabot[bot] in #7474
- chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.60.0 to 0.61.0 by @dependabot[bot] in #7477
- chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #7475
- chore(deps): bump stable-diffusion.cpp to '8823dc48bcc1598eb9671da7b69e45338d0cc5a5' by @mudler in #7524
- chore(makefile): Add buildargs for sd and cuda when building backend by @richiejp in #7525
- chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory by @dependabot[bot] in #7549
- chore(deps): bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #7585
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #7590
- chore(deps): bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #7586
- chore(deps): bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #7587
- chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7588
- chore(deps): bump github.com/labstack/echo/v4 from 4.13.4 to 4.14.0 by @dependabot[bot] in #7589
- chore(deps): bump github.com/jaypipes/ghw from 0.20.0 to 0.21.1 by @dependabot[bot] in #7591
- chore(deps): bump sentence-transformers from 5.1.0 to 5.2.0 in /backend/python/transformers by @dependabot[bot] in #7594
- chore(memory detection): do not use go-sigar as requires CGO on darwin by @mudler in #7618
- chore(deps): bump cogito to latest and adapt API changes by @mudler in #7655
- chore(refactor): move logging to common package based on slog by @mudler in #7668
- chore(deps): bump xlog to v0.0.3 by @mudler in #7675
- chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7690
- chore(deps): bump github.com/mudler/xlog from 0.0.3 to 0.0.4 by @dependabot[bot] in #7695
- chore(deps): bump github.com/mudler/cogito from 0.7.1 to 0.7.2 by @dependabot[bot] in #7691
- chore(deps): bump github.com/jaypipes/ghw from 0.21.1 to 0.21.2 by @dependabot[bot] in #7694
- chore(deps): bump github.com/containerd/containerd from 1.7.29 to 1.7.30 by @dependabot[bot] in #7692
Other Changes
- chore: ⬆️ Update ggml-org/llama.cpp to
eec1e33a9ed71b79422e39cc489719cf4f8e0777by @localai-bot in #7363 - Initialize sudo reference before its first actual use by @poretsky in #7367
- chore(deps): update diffusers dependency to use GitHub repo for l4t by @mudler in #7369
- chore(l4t/diffusers): bump nvidia l4t index for pytorch 2.9 by @mudler in #7379
- Conventional way of adding extra apt repository by @poretsky in #7362
- Correct user deletion with all its data by @poretsky in #7368
- Clean data directory by @poretsky in #7378
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #7381
- chore(l4t): Update extra index URL for requirements-l4t.txt by @mudler in #7383
- chore: Add Python 3.12 support for l4t build profile by @mudler in #7384
- chore: ⬆️ Update ggml-org/llama.cpp to
4abef75f2cf2eee75eb5083b30a94cf981587394by @localai-bot in #7382 - chore(diffusers): Add PY_STANDALONE_TAG for l4t Python version by @mudler in #7387
- Revert "chore(l4t): Update extra index URL for requirements-l4t.txt" by @mudler in #7388
- chore: drop pinning of python 3.12 by @mudler in #7389
- chore(deps): bump llama.cpp to 'd82b7a7c1d73c0674698d9601b1bbb0200933f29' by @mudler in #7392
- feat(swagger): update swagger by @localai-bot in #7394
- chore: ⬆️ Update ggml-org/llama.cpp to
8c32d9d96d9ae345a0150cae8572859e9aafea0bby @localai-bot in #7395 - feat(swagger): update swagger by @localai-bot in #7400
- chore: ⬆️ Update ggml-org/llama.cpp to
7f8ef50cce40e3e7e4526a3696cb45658190e69aby @mudler in #7402 - chore: ⬆️ Update ggml-org/llama.cpp to
ec18edfcba94dacb166e6523612fc0129cead67aby @localai-bot in #7406 - chore(deps/stable-diffusion-ggml): update stablediffusion-ggml by @richiejp in #7411
- Add Dockerfile for arm64 with nvpl installation by @mudler in #7416
- chore: ⬆️ Update ggml-org/llama.cpp to
61bde8e21f4a1f9a98c9205831ca3e55457b4c78by @localai-bot in #7415 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
5865b5e7034801af1a288a9584631730b25272c6by @localai-bot in #7422 - Messages output fix by @poretsky in #7424
- chore: ⬆️ Update ggml-org/llama.cpp to
e9f9483464e6f01d843d7f0293bd9c7bc6b2221cby @localai-bot in #7421 - chore(ui): uniform buttons by @mudler in #7429
- chore(deps): bump llama.cpp to 'bde188d60f58012ada0725c6dd5ba7c69fe4dd87' by @mudler in #7434
- chore: ⬆️ Update ggml-org/llama.cpp to
8160b38a5fa8a25490ca33ffdd200cda51405688by @localai-bot in #7438 - chore: ⬆️ Update ggml-org/whisper.cpp to
a88b93f85f08fc6045e5d8a8c3f94b7be0ac8bceby @localai-bot in #7448 - chore: ⬆️ Update ggml-org/llama.cpp to
db97837385edfbc772230debbd49e5efae843a71by @localai-bot in #7447 - chore(gallery agent): summary now is at root of the git repository by @mudler in #7463
- chore(gallery agent): strip thinking tags by @mudler in #7464
- chore: ⬆️ Update ggml-org/whisper.cpp to
a8f45ab11d6731e591ae3d0230be3fec6c2efc91by @localai-bot in #7483 - chore(deps/llama-cpp): bump to '2fa51c19b028180b35d316e9ed06f5f0f7ada2c1' by @mudler in #7484
- chore: ⬆️ Update ggml-org/llama.cpp to
086a63e3a5d2dbbb7183a74db453459e544eb55aby @localai-bot in #7496 - chore: ⬆️ Update ggml-org/whisper.cpp to
9f5ed26e43c680bece09df7bdc8c1b7835f0e537by @localai-bot in #7509 - chore: ⬆️ Update ggml-org/llama.cpp to
4dff236a522bd0ed949331d6cb1ee2a1b3615c35by @localai-bot in #7508 - chore: ⬆️ Update ggml-org/llama.cpp to
a81a569577cc38b32558958b048228150be63eaeby @localai-bot in #7529 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
11ab095230b2b67210f5da4d901588d56c71fe3aby @localai-bot in #7539 - chore(l4t13): use pytorch index by @mudler in #7546
- Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory" by @mudler in #7558
- chore: ⬆️ Update ggml-org/whisper.cpp to
2551e4ce98db69027d08bd99bcc3f1a4e2ad2cefby @localai-bot in #7561 - chore: ⬆️ Update leejet/stable-diffusion.cpp to
43a70e819b9254dee0d017305d6992f6bb27f850by @localai-bot in #7562 - chore: ⬆️ Update ggml-org/llama.cpp to
5266379bcae74214af397f36aa81b2a08b15d545by @localai-bot in #7563 - chore: ⬆️ Update ggml-org/llama.cpp to
5c8a717128cc98aa9e5b1c44652f5cf458fd426eby @localai-bot in #7573 - chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server by @mudler in #7584
- chore: ⬆️ Update leejet/stable-diffusion.cpp to
200cb6f2ca07e40fa83b610a4e595f4da06ec709by @localai-bot in #7597 - Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7602
- chore: ⬆️ Update ggml-org/llama.cpp to
ef83fb8601229ff650d952985be47e82d644bfaaby @localai-bot in #7611 - chore: ⬆️ Update ggml-org/whisper.cpp to
3e79e73eee32e924fbd34587f2f2ac5a45a26b61by @localai-bot in #7630 - chore: ⬆️ Update ggml-org/llama.cpp to
d37fc935059211454e9ad2e2a44e8ed78fd6d1ceby @localai-bot in #7629 - chore: ⬆️ Update leejet/stable-diffusion.cpp to bda7fab9f208dff4b67179a68f694b6ddec13326 by @richiejp in #7639
- chore: ⬆️ Update ggml-org/llama.cpp to
f9ec8858edea4a0ecfea149d6815ebfb5ecc3bcdby @localai-bot in #7642 - chore: ⬆️ Update ggml-org/whisper.cpp to
6c22e792cb0ee155b6587ce71a8410c3aeb06949by @localai-bot in #7644 - chore: ⬆️ Update ggml-org/llama.cpp to
ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787by @localai-bot in #7654 - chore(cogito): respect application-level logging and propagate by @mudler in #7656
- chore: ⬆️ Update ggml-org/llama.cpp to
52ab19df633f3de5d4db171a16f2d9edd2342fecby @localai-bot in #7665 - docs: Add
langchain-localaiintegration package to documentation by @mkhludnev in #7677 - chore: allow to set local-ai log format, default to custom one by @mudler in #7679
- chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' by @mudler in #7684
- chore(gallery agent): various fixups by @mudler in #7697
- Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7698
- chore(logging): be consistent and do not emit logs from echo by @mudler in #7710
New Contributors
- @rampa3 made their first contribution in #7445
- @blightbow made their first contribution in #7556
Full Changelog: v3.8.0...v3.9.0
