Xmas-release 🎅 LocalAI 3.9.0! 🚀

LocalAI 3.9.0 is focused on stability, resource efficiency, and smarter agent workflows. We've addressed critical issues with model loading, improved system resource management, and introduced a new Agent Jobs panel for scheduling and managing background agentic tasks. Whether you're running models locally or orchestrating complex agent workflows, this release makes it faster, more reliable, and easier to manage.

📌 TL;DR

Feature	Summary
Agent Jobs Panel	Schedule and run background tasks with cron or via API — perfect for automated workflows.
Smart Memory Reclaimer	Automatically frees up GPU/VRAM by evicting least recently used models when memory is low.
LRU Model Eviction	Models are automatically unloaded from memory based on usage patterns to prevent crashes.
MLX & CUDA 13 Support	New model backends and enhanced GPU compatibility for modern hardware.
UI Polish & Fixes	Cleaned-up navigation, fixed layout overflow, and various improvements.
Vibevoice	Added support for the vibevoice backend!

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

LocalAI 3.9.0 introduces a new Agent Jobs panel in the web UI and API, allowing you to create, run, and schedule agentic tasks in the background that can be started programmatically via API or from the Web interface.

Run agent prompts on a schedule using cron syntax, or via API.
Agents are defined via the model settings, supporting MCP.
Trigger jobs via API for integration into CI/CD or external tools.
Optionally send results to a webhook for post-processing.
Templates and prompts can be dynamically populated with variables.

✅ Use cases: Daily reports, CI integration, automated data processing, scheduled model evaluations.

Screenshot 2025-12-24 at 15-26-32 LocalAI - Agent Jobs

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

We’ve introduced a new Memory Reclaimer that monitors system memory usage and automatically frees up GPU/VRAM when needed.

Screenshot 2025-12-24 at 15-25-30 LocalAI API - 8b3e0eb (8b3e0ebf8aab4071ef7721121f04081c32a5c9da)

Tracks memory consumption across all backends.
When usage exceeds a configured threshold, it evicts the least recently used (LRU) models.
Prevents out-of-memory crashes and keeps your system stable during high load.

This is a step toward adaptive resource management, future versions will expand this with more advanced policies and giving more control.

🔁 LRU Model Eviction: Intelligent Model Management

Building on the new reclaimer, LocalAI now supports LRU (Least Recently Used) eviction for loaded models.

Screenshot 2025-12-24 at 15-27-24 LocalAI - Settings

Set a maximum number of models to keep in memory (e.g., limit to 3).
When a new model is loaded and the limit is reached, the oldest unused model is automatically unloaded.
Fully compatible with single_active_backend mode (now defaults to LRU=1 for backward compatibility).

💡 Ideal for servers with limited VRAM or when running multiple models in parallel.

🖥️ UI & UX Polish

Fixed navbar ordering and login icon — clearer navigation and better visual flow.
Prevented tool call overflow in chat view — no more clipped or misaligned content.
Uniformed link paths (e.g., /browse/ instead of browse) for consistency.
Fixed model selection toggle — header updates correctly when switching models.
Consistent button styling — uniform colors, hover effects, and accessibility.

📦 Backward Compatibility & Architecture

Dropped x86_64 Mac support: no longer maintained in GitHub Actions; ARM64 (M1/M2/M3/M4) is now the recommended architecture.
Updated data storage path from /usr/share to /var/lib: follows Linux conventions for mutable data.
Added CUDA 13 support: now available in Docker images and L4T builds.
New VibeVoice TTS backend real-time text-to-speech with voice cloning support. You can install it from the model gallery!
StableDiffusion-GGML now supports LoRA: expand your image-generation capabilities.

🛠️ Fixes & Improvements

Issue: After v3.8.0, /readyz and /healthz endpoints required authentication, breaking Docker health checks and monitoring tools
Issue: Fixed crashes when importing models from Hugging Face URLs with subfolders (e.g., huggingface://user/model/GGUF/model.gguf).

🚀 The Complete Local Stack for Privacy-First AI

❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

chore: switch from /usr/share to /var/lib for data storage by @poretsky in #7361
chore: drop drawin-x86_64 support by @mudler in #7616

Bug fixes 🐛

fix: do not require auth for readyz/healthz endpoints by @mudler in #7403
fix(ui): navbar ordering and login icon by @mudler in #7407
fix: configure sbsa packages for arm64 by @mudler in #7413
fix(ui): prevent box overflow in chat view by @mudler in #7430
fix(ui): Update few links in web UI from 'browse' to '/browse/' by @rampa3 in #7445
fix(paths): remove trailing slash from requests by @mudler in #7451
fix(downloader): do not download model files if not necessary by @mudler in #7492
fix(config): make syncKnownUsecasesFromString idempotent by @mudler in #7493
fix: make sure to close on errors by @mudler in #7521
fix(llama.cpp): handle corner cases with tool array content by @mudler in #7528
fix(7355): Update llama-cpp grpc for v3 interface by @sredman in #7566
fix(chat-ui): model selection toggle and new chat by @mudler in #7574
fix: improve ram estimation by @mudler in #7603
fix(ram): do not read from cgroup by @mudler in #7606
fix: correctly propagate error during model load by @mudler in #7610
fix(ci): remove specific version for grpcio packages by @mudler in #7627
fix(uri): consider subfolders when expanding huggingface URLs by @mintyleaf in #7634

Exciting New Features 🎉

feat: agent jobs panel by @mudler in #7390
chore: refactor css, restyle to be slightly minimalistic by @mudler in #7397
feat(hf-api): return files in nested directories by @mudler in #7396
feat(agent-jobs): add multimedia support by @mudler in #7398
feat: add cuda13 images by @mudler in #7404
fix: use ubuntu 24.04 for cuda13 l4t images by @mudler in #7418
feat(diffusers): implement dynamic pipeline loader to remove per-pipeline conditionals by @Copilot in #7365
chore(importers/llama.cpp): add models to 'llama-cpp' subfolder by @mudler in #7450
feat(vibevoice): add new backend by @mudler in #7494
feat(ui): allow to order search results by @mudler in #7507
feat(loader): enhance single active backend to support LRU eviction by @mudler in #7535
feat(stablediffusion-ggml): add lora support by @mudler in #7542
feat(ui): add mask to install custom backends by @mudler in #7559
feat(watchdog): add Memory resource reclaimer by @mudler in #7583
feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling by @blightbow in #7556
feat(whisper): Add prompt to condition transcription output by @richiejp in #7624

🧠 Models

feat(stablediffusion): Passthrough more parameters to support z-image and flux2 by @richiejp in #7414
Revert "feat(stablediffusion): Passthrough more parameters to support z-image and flux2" by @mudler in #7417
feat(stablediffusion): Passthrough more parameters to support z-image and flux2 by @richiejp in #7419
chore(model-gallery): ⬆️ update checksum by @localai-bot in #7433
fix(stablediffusion-ggml): Correct Z-Image model name by @richiejp in #7436
chore(model-gallery): ⬆️ update checksum by @localai-bot in #7437
chore(model-gallery): ⬆️ update checksum by @localai-bot in #7530
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7700
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7707
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7712

📖 Documentation and examples

chore: Add AGENTS.md by @richiejp in #7688
chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params by @mudler in #7706

👒 Dependencies

chore(deps): bump github.com/google/go-containerregistry from 0.19.2 to 0.20.7 by @dependabot[bot] in #7409
chore(deps): bump appleboy/ssh-action from 1.2.3 to 1.2.4 by @dependabot[bot] in #7410
chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.38.0 to 1.39.0 by @dependabot[bot] in #7476
chore(deps): bump actions/stale from 10.1.0 to 10.1.1 by @dependabot[bot] in #7473
chore(deps): bump protobuf from 6.33.1 to 6.33.2 in /backend/python/transformers by @dependabot[bot] in #7481
chore(deps): bump github.com/mudler/cogito from 0.5.1 to 0.6.0 by @dependabot[bot] in #7474
chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.60.0 to 0.61.0 by @dependabot[bot] in #7477
chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #7475
chore(deps): bump stable-diffusion.cpp to '8823dc48bcc1598eb9671da7b69e45338d0cc5a5' by @mudler in #7524
chore(makefile): Add buildargs for sd and cuda when building backend by @richiejp in #7525
chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory by @dependabot[bot] in #7549
chore(deps): bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #7585
chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #7590
chore(deps): bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #7586
chore(deps): bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #7587
chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7588
chore(deps): bump github.com/labstack/echo/v4 from 4.13.4 to 4.14.0 by @dependabot[bot] in #7589
chore(deps): bump github.com/jaypipes/ghw from 0.20.0 to 0.21.1 by @dependabot[bot] in #7591
chore(deps): bump sentence-transformers from 5.1.0 to 5.2.0 in /backend/python/transformers by @dependabot[bot] in #7594
chore(memory detection): do not use go-sigar as requires CGO on darwin by @mudler in #7618
chore(deps): bump cogito to latest and adapt API changes by @mudler in #7655
chore(refactor): move logging to common package based on slog by @mudler in #7668
chore(deps): bump xlog to v0.0.3 by @mudler in #7675
chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7690
chore(deps): bump github.com/mudler/xlog from 0.0.3 to 0.0.4 by @dependabot[bot] in #7695
chore(deps): bump github.com/mudler/cogito from 0.7.1 to 0.7.2 by @dependabot[bot] in #7691
chore(deps): bump github.com/jaypipes/ghw from 0.21.1 to 0.21.2 by @dependabot[bot] in #7694
chore(deps): bump github.com/containerd/containerd from 1.7.29 to 1.7.30 by @dependabot[bot] in #7692

Other Changes

chore: ⬆️ Update ggml-org/llama.cpp to eec1e33a9ed71b79422e39cc489719cf4f8e0777 by @localai-bot in #7363
Initialize sudo reference before its first actual use by @poretsky in #7367
chore(deps): update diffusers dependency to use GitHub repo for l4t by @mudler in #7369
chore(l4t/diffusers): bump nvidia l4t index for pytorch 2.9 by @mudler in #7379
Conventional way of adding extra apt repository by @poretsky in #7362
Correct user deletion with all its data by @poretsky in #7368
Clean data directory by @poretsky in #7378
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #7381
chore(l4t): Update extra index URL for requirements-l4t.txt by @mudler in #7383
chore: Add Python 3.12 support for l4t build profile by @mudler in #7384
chore: ⬆️ Update ggml-org/llama.cpp to 4abef75f2cf2eee75eb5083b30a94cf981587394 by @localai-bot in #7382
chore(diffusers): Add PY_STANDALONE_TAG for l4t Python version by @mudler in #7387
Revert "chore(l4t): Update extra index URL for requirements-l4t.txt" by @mudler in #7388
chore: drop pinning of python 3.12 by @mudler in #7389
chore(deps): bump llama.cpp to 'd82b7a7c1d73c0674698d9601b1bbb0200933f29' by @mudler in #7392
feat(swagger): update swagger by @localai-bot in #7394
chore: ⬆️ Update ggml-org/llama.cpp to 8c32d9d96d9ae345a0150cae8572859e9aafea0b by @localai-bot in #7395
feat(swagger): update swagger by @localai-bot in #7400
chore: ⬆️ Update ggml-org/llama.cpp to 7f8ef50cce40e3e7e4526a3696cb45658190e69a by @mudler in #7402
chore: ⬆️ Update ggml-org/llama.cpp to ec18edfcba94dacb166e6523612fc0129cead67a by @localai-bot in #7406
chore(deps/stable-diffusion-ggml): update stablediffusion-ggml by @richiejp in #7411
Add Dockerfile for arm64 with nvpl installation by @mudler in #7416
chore: ⬆️ Update ggml-org/llama.cpp to 61bde8e21f4a1f9a98c9205831ca3e55457b4c78 by @localai-bot in #7415
chore: ⬆️ Update leejet/stable-diffusion.cpp to 5865b5e7034801af1a288a9584631730b25272c6 by @localai-bot in #7422
Messages output fix by @poretsky in #7424
chore: ⬆️ Update ggml-org/llama.cpp to e9f9483464e6f01d843d7f0293bd9c7bc6b2221c by @localai-bot in #7421
chore(ui): uniform buttons by @mudler in #7429
chore(deps): bump llama.cpp to 'bde188d60f58012ada0725c6dd5ba7c69fe4dd87' by @mudler in #7434
chore: ⬆️ Update ggml-org/llama.cpp to 8160b38a5fa8a25490ca33ffdd200cda51405688 by @localai-bot in #7438
chore: ⬆️ Update ggml-org/whisper.cpp to a88b93f85f08fc6045e5d8a8c3f94b7be0ac8bce by @localai-bot in #7448
chore: ⬆️ Update ggml-org/llama.cpp to db97837385edfbc772230debbd49e5efae843a71 by @localai-bot in #7447
chore(gallery agent): summary now is at root of the git repository by @mudler in #7463
chore(gallery agent): strip thinking tags by @mudler in #7464
chore: ⬆️ Update ggml-org/whisper.cpp to a8f45ab11d6731e591ae3d0230be3fec6c2efc91 by @localai-bot in #7483
chore(deps/llama-cpp): bump to '2fa51c19b028180b35d316e9ed06f5f0f7ada2c1' by @mudler in #7484
chore: ⬆️ Update ggml-org/llama.cpp to 086a63e3a5d2dbbb7183a74db453459e544eb55a by @localai-bot in #7496
chore: ⬆️ Update ggml-org/whisper.cpp to 9f5ed26e43c680bece09df7bdc8c1b7835f0e537 by @localai-bot in #7509
chore: ⬆️ Update ggml-org/llama.cpp to 4dff236a522bd0ed949331d6cb1ee2a1b3615c35 by @localai-bot in #7508
chore: ⬆️ Update ggml-org/llama.cpp to a81a569577cc38b32558958b048228150be63eae by @localai-bot in #7529
chore: ⬆️ Update leejet/stable-diffusion.cpp to 11ab095230b2b67210f5da4d901588d56c71fe3a by @localai-bot in #7539
chore(l4t13): use pytorch index by @mudler in #7546
Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory" by @mudler in #7558
chore: ⬆️ Update ggml-org/whisper.cpp to 2551e4ce98db69027d08bd99bcc3f1a4e2ad2cef by @localai-bot in #7561
chore: ⬆️ Update leejet/stable-diffusion.cpp to 43a70e819b9254dee0d017305d6992f6bb27f850 by @localai-bot in #7562
chore: ⬆️ Update ggml-org/llama.cpp to 5266379bcae74214af397f36aa81b2a08b15d545 by @localai-bot in #7563
chore: ⬆️ Update ggml-org/llama.cpp to 5c8a717128cc98aa9e5b1c44652f5cf458fd426e by @localai-bot in #7573
chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server by @mudler in #7584
chore: ⬆️ Update leejet/stable-diffusion.cpp to 200cb6f2ca07e40fa83b610a4e595f4da06ec709 by @localai-bot in #7597
Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7602
chore: ⬆️ Update ggml-org/llama.cpp to ef83fb8601229ff650d952985be47e82d644bfaa by @localai-bot in #7611
chore: ⬆️ Update ggml-org/whisper.cpp to 3e79e73eee32e924fbd34587f2f2ac5a45a26b61 by @localai-bot in #7630
chore: ⬆️ Update ggml-org/llama.cpp to d37fc935059211454e9ad2e2a44e8ed78fd6d1ce by @localai-bot in #7629
chore: ⬆️ Update leejet/stable-diffusion.cpp to bda7fab9f208dff4b67179a68f694b6ddec13326 by @richiejp in #7639
chore: ⬆️ Update ggml-org/llama.cpp to f9ec8858edea4a0ecfea149d6815ebfb5ecc3bcd by @localai-bot in #7642
chore: ⬆️ Update ggml-org/whisper.cpp to 6c22e792cb0ee155b6587ce71a8410c3aeb06949 by @localai-bot in #7644
chore: ⬆️ Update ggml-org/llama.cpp to ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787 by @localai-bot in #7654
chore(cogito): respect application-level logging and propagate by @mudler in #7656
chore: ⬆️ Update ggml-org/llama.cpp to 52ab19df633f3de5d4db171a16f2d9edd2342fec by @localai-bot in #7665
docs: Add langchain-localai integration package to documentation by @mkhludnev in #7677
chore: allow to set local-ai log format, default to custom one by @mudler in #7679
chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' by @mudler in #7684
chore(gallery agent): various fixups by @mudler in #7697
Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7698
chore(logging): be consistent and do not emit logs from echo by @mudler in #7710

New Contributors

@rampa3 made their first contribution in #7445
@blightbow made their first contribution in #7556

Full Changelog: v3.8.0...v3.9.0

mudler/LocalAI v3.9.0
on GitHub

Xmas-release 🎅 LocalAI 3.9.0! 🚀

📌 TL;DR

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

🔁 LRU Model Eviction: Intelligent Model Management

🖥️ UI & UX Polish

📦 Backward Compatibility & Architecture

🛠️ Fixes & Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Breaking Changes 🛠

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

mudler/LocalAI v3.9.0 on GitHub

Xmas-release 🎅 LocalAI 3.9.0! 🚀

📌 TL;DR

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

🔁 LRU Model Eviction: Intelligent Model Management

🖥️ UI & UX Polish

📦 Backward Compatibility & Architecture

🛠️ Fixes & Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Breaking Changes 🛠

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

mudler/LocalAI v3.9.0
on GitHub