github mudler/LocalAI v3.9.0

7 hours ago

Xmas-release 🎅 LocalAI 3.9.0! 🚀




LocalAI 3.9.0 is focused on stability, resource efficiency, and smarter agent workflows. We've addressed critical issues with model loading, improved system resource management, and introduced a new Agent Jobs panel for scheduling and managing background agentic tasks. Whether you're running models locally or orchestrating complex agent workflows, this release makes it faster, more reliable, and easier to manage.

📌 TL;DR

Feature Summary
Agent Jobs Panel Schedule and run background tasks with cron or via API — perfect for automated workflows.
Smart Memory Reclaimer Automatically frees up GPU/VRAM by evicting least recently used models when memory is low.
LRU Model Eviction Models are automatically unloaded from memory based on usage patterns to prevent crashes.
MLX & CUDA 13 Support New model backends and enhanced GPU compatibility for modern hardware.
UI Polish & Fixes Cleaned-up navigation, fixed layout overflow, and various improvements.
Vibevoice Added support for the vibevoice backend!

🚀 New Features

🤖 Agent Jobs Panel: Schedule & Automate Tasks

LocalAI 3.9.0 introduces a new Agent Jobs panel in the web UI and API, allowing you to create, run, and schedule agentic tasks in the background that can be started programmatically via API or from the Web interface.

  • Run agent prompts on a schedule using cron syntax, or via API.
  • Agents are defined via the model settings, supporting MCP.
  • Trigger jobs via API for integration into CI/CD or external tools.
  • Optionally send results to a webhook for post-processing.
  • Templates and prompts can be dynamically populated with variables.

✅ Use cases: Daily reports, CI integration, automated data processing, scheduled model evaluations.

Screenshot 2025-12-24 at 15-26-32 LocalAI - Agent Jobs

🧠 Smart Memory Reclaimer: Auto-Optimize GPU Resources

We’ve introduced a new Memory Reclaimer that monitors system memory usage and automatically frees up GPU/VRAM when needed.

Screenshot 2025-12-24 at 15-25-30 LocalAI API - 8b3e0eb (8b3e0ebf8aab4071ef7721121f04081c32a5c9da)
  • Tracks memory consumption across all backends.
  • When usage exceeds a configured threshold, it evicts the least recently used (LRU) models.
  • Prevents out-of-memory crashes and keeps your system stable during high load.

This is a step toward adaptive resource management, future versions will expand this with more advanced policies and giving more control.


🔁 LRU Model Eviction: Intelligent Model Management

Building on the new reclaimer, LocalAI now supports LRU (Least Recently Used) eviction for loaded models.

Screenshot 2025-12-24 at 15-27-24 LocalAI - Settings
  • Set a maximum number of models to keep in memory (e.g., limit to 3).
  • When a new model is loaded and the limit is reached, the oldest unused model is automatically unloaded.
  • Fully compatible with single_active_backend mode (now defaults to LRU=1 for backward compatibility).

💡 Ideal for servers with limited VRAM or when running multiple models in parallel.


🖥️ UI & UX Polish

  • Fixed navbar ordering and login icon — clearer navigation and better visual flow.
  • Prevented tool call overflow in chat view — no more clipped or misaligned content.
  • Uniformed link paths (e.g., /browse/ instead of browse) for consistency.
  • Fixed model selection toggle — header updates correctly when switching models.
  • Consistent button styling — uniform colors, hover effects, and accessibility.

📦 Backward Compatibility & Architecture

  • Dropped x86_64 Mac support: no longer maintained in GitHub Actions; ARM64 (M1/M2/M3/M4) is now the recommended architecture.
  • Updated data storage path from /usr/share to /var/lib: follows Linux conventions for mutable data.
  • Added CUDA 13 support: now available in Docker images and L4T builds.
  • New VibeVoice TTS backend real-time text-to-speech with voice cloning support. You can install it from the model gallery!
  • StableDiffusion-GGML now supports LoRA: expand your image-generation capabilities.

🛠️ Fixes & Improvements

  • Issue: After v3.8.0, /readyz and /healthz endpoints required authentication, breaking Docker health checks and monitoring tools
  • Issue: Fixed crashes when importing models from Hugging Face URLs with subfolders (e.g., huggingface://user/model/GGUF/model.gguf).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall


❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

  • Star the repo
  • 💬 Contribute code, docs, or feedback
  • 📣 Share with others

Your support keeps this stack alive.


✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Breaking Changes 🛠

  • chore: switch from /usr/share to /var/lib for data storage by @poretsky in #7361
  • chore: drop drawin-x86_64 support by @mudler in #7616

Bug fixes 🐛

  • fix: do not require auth for readyz/healthz endpoints by @mudler in #7403
  • fix(ui): navbar ordering and login icon by @mudler in #7407
  • fix: configure sbsa packages for arm64 by @mudler in #7413
  • fix(ui): prevent box overflow in chat view by @mudler in #7430
  • fix(ui): Update few links in web UI from 'browse' to '/browse/' by @rampa3 in #7445
  • fix(paths): remove trailing slash from requests by @mudler in #7451
  • fix(downloader): do not download model files if not necessary by @mudler in #7492
  • fix(config): make syncKnownUsecasesFromString idempotent by @mudler in #7493
  • fix: make sure to close on errors by @mudler in #7521
  • fix(llama.cpp): handle corner cases with tool array content by @mudler in #7528
  • fix(7355): Update llama-cpp grpc for v3 interface by @sredman in #7566
  • fix(chat-ui): model selection toggle and new chat by @mudler in #7574
  • fix: improve ram estimation by @mudler in #7603
  • fix(ram): do not read from cgroup by @mudler in #7606
  • fix: correctly propagate error during model load by @mudler in #7610
  • fix(ci): remove specific version for grpcio packages by @mudler in #7627
  • fix(uri): consider subfolders when expanding huggingface URLs by @mintyleaf in #7634

Exciting New Features 🎉

  • feat: agent jobs panel by @mudler in #7390
  • chore: refactor css, restyle to be slightly minimalistic by @mudler in #7397
  • feat(hf-api): return files in nested directories by @mudler in #7396
  • feat(agent-jobs): add multimedia support by @mudler in #7398
  • feat: add cuda13 images by @mudler in #7404
  • fix: use ubuntu 24.04 for cuda13 l4t images by @mudler in #7418
  • feat(diffusers): implement dynamic pipeline loader to remove per-pipeline conditionals by @Copilot in #7365
  • chore(importers/llama.cpp): add models to 'llama-cpp' subfolder by @mudler in #7450
  • feat(vibevoice): add new backend by @mudler in #7494
  • feat(ui): allow to order search results by @mudler in #7507
  • feat(loader): enhance single active backend to support LRU eviction by @mudler in #7535
  • feat(stablediffusion-ggml): add lora support by @mudler in #7542
  • feat(ui): add mask to install custom backends by @mudler in #7559
  • feat(watchdog): add Memory resource reclaimer by @mudler in #7583
  • feat(mlx): add thread-safe LRU prompt cache and min_p/top_k sampling by @blightbow in #7556
  • feat(whisper): Add prompt to condition transcription output by @richiejp in #7624

🧠 Models

  • feat(stablediffusion): Passthrough more parameters to support z-image and flux2 by @richiejp in #7414
  • Revert "feat(stablediffusion): Passthrough more parameters to support z-image and flux2" by @mudler in #7417
  • feat(stablediffusion): Passthrough more parameters to support z-image and flux2 by @richiejp in #7419
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #7433
  • fix(stablediffusion-ggml): Correct Z-Image model name by @richiejp in #7436
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #7437
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #7530
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7700
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7707
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7712

📖 Documentation and examples

  • chore: Add AGENTS.md by @richiejp in #7688
  • chore(deps): Bump llama.cpp to '5b6c9bc0f3c8f55598b9999b65aff7ce4119bc15' and refactor usage of base params by @mudler in #7706

👒 Dependencies

  • chore(deps): bump github.com/google/go-containerregistry from 0.19.2 to 0.20.7 by @dependabot[bot] in #7409
  • chore(deps): bump appleboy/ssh-action from 1.2.3 to 1.2.4 by @dependabot[bot] in #7410
  • chore(deps): bump go.opentelemetry.io/otel/sdk/metric from 1.38.0 to 1.39.0 by @dependabot[bot] in #7476
  • chore(deps): bump actions/stale from 10.1.0 to 10.1.1 by @dependabot[bot] in #7473
  • chore(deps): bump protobuf from 6.33.1 to 6.33.2 in /backend/python/transformers by @dependabot[bot] in #7481
  • chore(deps): bump github.com/mudler/cogito from 0.5.1 to 0.6.0 by @dependabot[bot] in #7474
  • chore(deps): bump go.opentelemetry.io/otel/exporters/prometheus from 0.60.0 to 0.61.0 by @dependabot[bot] in #7477
  • chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 by @dependabot[bot] in #7475
  • chore(deps): bump stable-diffusion.cpp to '8823dc48bcc1598eb9671da7b69e45338d0cc5a5' by @mudler in #7524
  • chore(makefile): Add buildargs for sd and cuda when building backend by @richiejp in #7525
  • chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory by @dependabot[bot] in #7549
  • chore(deps): bump actions/upload-artifact from 5 to 6 by @dependabot[bot] in #7585
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 by @dependabot[bot] in #7590
  • chore(deps): bump peter-evans/create-pull-request from 7 to 8 by @dependabot[bot] in #7586
  • chore(deps): bump actions/download-artifact from 6 to 7 by @dependabot[bot] in #7587
  • chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7588
  • chore(deps): bump github.com/labstack/echo/v4 from 4.13.4 to 4.14.0 by @dependabot[bot] in #7589
  • chore(deps): bump github.com/jaypipes/ghw from 0.20.0 to 0.21.1 by @dependabot[bot] in #7591
  • chore(deps): bump sentence-transformers from 5.1.0 to 5.2.0 in /backend/python/transformers by @dependabot[bot] in #7594
  • chore(memory detection): do not use go-sigar as requires CGO on darwin by @mudler in #7618
  • chore(deps): bump cogito to latest and adapt API changes by @mudler in #7655
  • chore(refactor): move logging to common package based on slog by @mudler in #7668
  • chore(deps): bump xlog to v0.0.3 by @mudler in #7675
  • chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7690
  • chore(deps): bump github.com/mudler/xlog from 0.0.3 to 0.0.4 by @dependabot[bot] in #7695
  • chore(deps): bump github.com/mudler/cogito from 0.7.1 to 0.7.2 by @dependabot[bot] in #7691
  • chore(deps): bump github.com/jaypipes/ghw from 0.21.1 to 0.21.2 by @dependabot[bot] in #7694
  • chore(deps): bump github.com/containerd/containerd from 1.7.29 to 1.7.30 by @dependabot[bot] in #7692

Other Changes

  • chore: ⬆️ Update ggml-org/llama.cpp to eec1e33a9ed71b79422e39cc489719cf4f8e0777 by @localai-bot in #7363
  • Initialize sudo reference before its first actual use by @poretsky in #7367
  • chore(deps): update diffusers dependency to use GitHub repo for l4t by @mudler in #7369
  • chore(l4t/diffusers): bump nvidia l4t index for pytorch 2.9 by @mudler in #7379
  • Conventional way of adding extra apt repository by @poretsky in #7362
  • Correct user deletion with all its data by @poretsky in #7368
  • Clean data directory by @poretsky in #7378
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #7381
  • chore(l4t): Update extra index URL for requirements-l4t.txt by @mudler in #7383
  • chore: Add Python 3.12 support for l4t build profile by @mudler in #7384
  • chore: ⬆️ Update ggml-org/llama.cpp to 4abef75f2cf2eee75eb5083b30a94cf981587394 by @localai-bot in #7382
  • chore(diffusers): Add PY_STANDALONE_TAG for l4t Python version by @mudler in #7387
  • Revert "chore(l4t): Update extra index URL for requirements-l4t.txt" by @mudler in #7388
  • chore: drop pinning of python 3.12 by @mudler in #7389
  • chore(deps): bump llama.cpp to 'd82b7a7c1d73c0674698d9601b1bbb0200933f29' by @mudler in #7392
  • feat(swagger): update swagger by @localai-bot in #7394
  • chore: ⬆️ Update ggml-org/llama.cpp to 8c32d9d96d9ae345a0150cae8572859e9aafea0b by @localai-bot in #7395
  • feat(swagger): update swagger by @localai-bot in #7400
  • chore: ⬆️ Update ggml-org/llama.cpp to 7f8ef50cce40e3e7e4526a3696cb45658190e69a by @mudler in #7402
  • chore: ⬆️ Update ggml-org/llama.cpp to ec18edfcba94dacb166e6523612fc0129cead67a by @localai-bot in #7406
  • chore(deps/stable-diffusion-ggml): update stablediffusion-ggml by @richiejp in #7411
  • Add Dockerfile for arm64 with nvpl installation by @mudler in #7416
  • chore: ⬆️ Update ggml-org/llama.cpp to 61bde8e21f4a1f9a98c9205831ca3e55457b4c78 by @localai-bot in #7415
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 5865b5e7034801af1a288a9584631730b25272c6 by @localai-bot in #7422
  • Messages output fix by @poretsky in #7424
  • chore: ⬆️ Update ggml-org/llama.cpp to e9f9483464e6f01d843d7f0293bd9c7bc6b2221c by @localai-bot in #7421
  • chore(ui): uniform buttons by @mudler in #7429
  • chore(deps): bump llama.cpp to 'bde188d60f58012ada0725c6dd5ba7c69fe4dd87' by @mudler in #7434
  • chore: ⬆️ Update ggml-org/llama.cpp to 8160b38a5fa8a25490ca33ffdd200cda51405688 by @localai-bot in #7438
  • chore: ⬆️ Update ggml-org/whisper.cpp to a88b93f85f08fc6045e5d8a8c3f94b7be0ac8bce by @localai-bot in #7448
  • chore: ⬆️ Update ggml-org/llama.cpp to db97837385edfbc772230debbd49e5efae843a71 by @localai-bot in #7447
  • chore(gallery agent): summary now is at root of the git repository by @mudler in #7463
  • chore(gallery agent): strip thinking tags by @mudler in #7464
  • chore: ⬆️ Update ggml-org/whisper.cpp to a8f45ab11d6731e591ae3d0230be3fec6c2efc91 by @localai-bot in #7483
  • chore(deps/llama-cpp): bump to '2fa51c19b028180b35d316e9ed06f5f0f7ada2c1' by @mudler in #7484
  • chore: ⬆️ Update ggml-org/llama.cpp to 086a63e3a5d2dbbb7183a74db453459e544eb55a by @localai-bot in #7496
  • chore: ⬆️ Update ggml-org/whisper.cpp to 9f5ed26e43c680bece09df7bdc8c1b7835f0e537 by @localai-bot in #7509
  • chore: ⬆️ Update ggml-org/llama.cpp to 4dff236a522bd0ed949331d6cb1ee2a1b3615c35 by @localai-bot in #7508
  • chore: ⬆️ Update ggml-org/llama.cpp to a81a569577cc38b32558958b048228150be63eae by @localai-bot in #7529
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 11ab095230b2b67210f5da4d901588d56c71fe3a by @localai-bot in #7539
  • chore(l4t13): use pytorch index by @mudler in #7546
  • Revert "chore(deps): bump torch from 2.5.1+cxx11.abi to 2.7.1+cpu in /backend/python/diffusers in the pip group across 1 directory" by @mudler in #7558
  • chore: ⬆️ Update ggml-org/whisper.cpp to 2551e4ce98db69027d08bd99bcc3f1a4e2ad2cef by @localai-bot in #7561
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 43a70e819b9254dee0d017305d6992f6bb27f850 by @localai-bot in #7562
  • chore: ⬆️ Update ggml-org/llama.cpp to 5266379bcae74214af397f36aa81b2a08b15d545 by @localai-bot in #7563
  • chore: ⬆️ Update ggml-org/llama.cpp to 5c8a717128cc98aa9e5b1c44652f5cf458fd426e by @localai-bot in #7573
  • chore(llama.cpp): Add Missing llama.cpp Options to gRPC Server by @mudler in #7584
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 200cb6f2ca07e40fa83b610a4e595f4da06ec709 by @localai-bot in #7597
  • Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7602
  • chore: ⬆️ Update ggml-org/llama.cpp to ef83fb8601229ff650d952985be47e82d644bfaa by @localai-bot in #7611
  • chore: ⬆️ Update ggml-org/whisper.cpp to 3e79e73eee32e924fbd34587f2f2ac5a45a26b61 by @localai-bot in #7630
  • chore: ⬆️ Update ggml-org/llama.cpp to d37fc935059211454e9ad2e2a44e8ed78fd6d1ce by @localai-bot in #7629
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to bda7fab9f208dff4b67179a68f694b6ddec13326 by @richiejp in #7639
  • chore: ⬆️ Update ggml-org/llama.cpp to f9ec8858edea4a0ecfea149d6815ebfb5ecc3bcd by @localai-bot in #7642
  • chore: ⬆️ Update ggml-org/whisper.cpp to 6c22e792cb0ee155b6587ce71a8410c3aeb06949 by @localai-bot in #7644
  • chore: ⬆️ Update ggml-org/llama.cpp to ce734a8a2f9fb6eb4f0383ab1370a1b0014ab787 by @localai-bot in #7654
  • chore(cogito): respect application-level logging and propagate by @mudler in #7656
  • chore: ⬆️ Update ggml-org/llama.cpp to 52ab19df633f3de5d4db171a16f2d9edd2342fec by @localai-bot in #7665
  • docs: Add langchain-localai integration package to documentation by @mkhludnev in #7677
  • chore: allow to set local-ai log format, default to custom one by @mudler in #7679
  • chore(deps): bump llama.cpp to '0e1ccf15c7b6d05c720551b537857ecf6194d420' by @mudler in #7684
  • chore(gallery agent): various fixups by @mudler in #7697
  • Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7698
  • chore(logging): be consistent and do not emit logs from echo by @mudler in #7710

New Contributors

Full Changelog: v3.8.0...v3.9.0

Don't miss a new LocalAI release

NewReleases is sending notifications on new releases.