github mudler/LocalAI v3.2.0

latest releases: v3.5.0, v3.4.0, v3.3.2...
one month ago




🚀 LocalAI 3.2.0

Welcome to LocalAI 3.2.0! This is a release that refactors our architecture to be more flexible and lightweight.

The core is now separated from all the backends, making LocalAI faster to download, easier to manage, portable, and much more smaller.

TL;DR – What’s New in LocalAI 3.2.0 🎉

  • 🧩 Modular Backends: All backends now live outside the main binary in our new Backend Gallery. This means you can update, add, or manage backends independently of LocalAI releases.
  • 📉 Leaner Than Ever: The LocalAI binary and container images are drastically smaller, making for faster downloads and a reduced footprint.
  • 🤖 Smart Backend Installation: It just works! When you install a model, LocalAI automatically detects your hardware (CPU, NVIDIA, AMD, Intel) and downloads the necessary backend. No more manual configuration!
  • 🛠️ Simplified Build Process: The new modular architecture significantly simplifies the build process for contributors and power users.
  • ⚡️ Intel GPU Support for Whisper: Transcription with Whisper can now be accelerated on Intel GPUs using SYCL, bringing more hardware options to our users.
  • 🗣️ Enhanced Realtime Audio: We've added speech started and stopped events for more interactive applications and OpenAI-compatible support for the input_audio field in the chat API.
  • 🧠 Massive Model Expansion: The gallery has been updated with over 50 new models, including the latest from Qwen3, Gemma, Mistral, Nemotron, and more!

Note: CI is in the process of building all the backends for this release and will be available soon - if you hit any issue, please try in a few, thanks for understanding!
Note: Some parts of the documentation and the installation scripts (that download the release binaries) have to yet be adapted to the latest changes and/or might not reflect the current state

A New Modular Architecture 🧩

The biggest change in v3.2.0 is the complete separation of inference backends from the core LocalAI binary. Backends like llama.cpp, whisper.cpp, piper, and stablediffusion-ggml are no longer bundled in.

This fundamental shift makes LocalAI:

  • Lighter: Significantly smaller binary and container image sizes.
  • More Flexible: Update backends anytime from the gallery without waiting for a new LocalAI release.
  • Easier to Maintain: A cleaner, more streamlined codebase for faster development.
  • Easier to Customize: you can build your own backends and install them in your LocalAI instances.

Smart, Automatic Backend Installation 🤖

To make the new modular system seamless, LocalAI now features automatic backend installation.

When you install a model from the gallery (or a YAML file), LocalAI intelligently detects the required backend and your system's capabilities, then downloads the correct version for you. Whether you're running on a standard CPU, an NVIDIA GPU, an AMD GPU, or an Intel GPU, LocalAI handles it automatically.

For advanced use cases or to override auto-detection, you can use the LOCALAI_FORCE_META_BACKEND_CAPABILITY environment variable. Here are the available options:

  • default: Forces CPU-only backend. This is the fallback if no specific hardware is detected.
  • nvidia: Forces backends compiled with CUDA support for NVIDIA GPUs.
  • amd: Forces backends compiled with ROCm support for AMD GPUs.
  • intel: Forces backends compiled with SYCL/oneAPI support for Intel GPUs.

The Backend Gallery & CLI Control 🖼️

You are in full control. You can browse, install, and manage all available backends directly from the WebUI or using the new CLI commands:

# List all available backends in the gallery
local-ai backends list

# Install a specific backend (e.g., llama-cpp)
local-ai backends install llama-cpp

# Uninstall a backend
local-ai backends uninstall llama-cpp

For development, offline or air-gapped environments, you can now also install backends directly from a local OCI tar file:

local-ai backends install "ocifile://<PATH_TO_TAR_FILE>"

Other Key Improvements

  • 🗣️ Enhanced Realtime and Audio APIs: Building voice-activated applications is now easier.
    • The new speech started and stopped events give you precise control over realtime audio streams.
    • We now support the input_audio field in the /v1/chat/completions endpoint for multimodal audio inputs, improving OpenAI compatibility.
  • ⚡️ Intel GPU Acceleration for Whisper: Our Whisper backend now supports SYCL, enabling hardware-accelerated transcriptions on Intel GPUs.
  • ✅ UI and Bug Fixes: We've squashed several bugs for a smoother experience, including a fix that correctly shows the download status for backend images in the gallery, so you always know what's happening.
  • 🧠 Massive Model Gallery Expansion: Our model gallery has never been bigger! We've added over 50 new and updated models, with a focus on powerful new releases like qwen3, devstral-small, and nemotron.

🚨 Important Note for Upgrading

Due to the new modular architecture, if you have existing models installed with a version prior to 3.2.0, they might not have a specific backend assigned.

After upgrading, you may need to install the required backend manually for these models to work. You can do this easily from the WebUI or via the CLI: local-ai backends install <backend_name>.

The Complete Local Stack for Privacy-First AI

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

Thank you! ❤️

A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 34,100 stars, and LocalAGI has already rocketed past 900+ stars!

As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!

👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI

Full changelog 👇

👉 Click to expand 👈

What's Changed

Breaking Changes 🛠

  • feat: do not bundle llama-cpp anymore by @mudler in #5790
  • feat: refactor build process, drop embedded backends by @mudler in #5875

Bug fixes 🐛

  • fix(gallery): automatically install model from name by @mudler in #5757
  • fix: Diffusers and XPU fixes by @richiejp in #5737
  • fix(gallery): correctly show status for downloading OCI images by @mudler in #5774
  • fix: explorer page should not have login by @mudler in #5855
  • fix: dockerfile typo by @LeonSijiaLu in #5823
  • fix(docs): Resolve logo overlap on tablet view by @dedyf5 in #5853
  • fix: do not pass by environ to ffmpeg by @mudler in #5871
  • fix(p2p): adapt to backend changes, general improvements by @mudler in #5889

Exciting New Features 🎉

  • feat(llama.cpp): allow to set kv-overrides by @mudler in #5745
  • feat(backends): add metas in the gallery by @mudler in #5784
  • feat(system): detect and allow to override capabilities by @mudler in #5785
  • chore(cli): add backends CLI to manipulate and install backends by @mudler in #5787
  • feat(whisper): Enable SYCL by @richiejp in #5802
  • feat(cli): allow to install backends from OCI tar files by @mudler in #5816
  • feat(cli): add command to create custom OCI images from directories by @mudler in #5844
  • feat(realtime): Add speech started and stopped events by @richiejp in #5856
  • fix: autoload backends when installing models from YAML files by @mudler in #5859
  • feat: split piper from main binary by @mudler in #5858
  • feat: remove stablediffusion-ggml from main binary by @mudler in #5861
  • feat: split whisper from main binary by @mudler in #5863
  • feat(openai): support input_audio chat api field by @mgoltzsche in #5870
  • fix(realtime): Reset speech started flag on commit by @richiejp in #5879
  • fix(build): Add and update ONEAPI_VERSION by @richiejp in #5874

🧠 Models

  • chore(model gallery): add qwen3-55b-a3b-total-recall-v1.3-i1 by @mudler in #5746
  • chore(model gallery): add qwen3-55b-a3b-total-recall-deep-40x by @mudler in #5747
  • chore(model gallery): add qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1 by @mudler in #5748
  • chore(model gallery): add mistral-small-3.2-46b-the-brilliant-raconteur-ii-instruct-2506 by @mudler in #5749
  • chore(model gallery): add qwen3-22b-a3b-the-harley-quinn by @mudler in #5750
  • chore(model gallery): add gemma-3-4b-it-max-horror-uncensored-dbl-x-imatrix by @mudler in #5751
  • chore(model gallery): add qwen3-33b-a3b-stranger-thoughts-abliterated-uncensored by @mudler in #5755
  • chore(model gallery): add thedrummer_anubis-70b-v1.1 by @mudler in #5771
  • chore(model gallery): add steelskull_l3.3-shakudo-70b by @mudler in #5772
  • chore(model gallery): add pinkpixel_crystal-think-v2 by @mudler in #5773
  • chore(model gallery): add helpingai_dhanishtha-2.0-preview by @mudler in #5791
  • chore(model gallery): add agentica-org_deepswe-preview by @mudler in #5792
  • chore(model gallery): add zerofata_ms3.2-paintedfantasy-visage-33b by @mudler in #5793
  • chore(model gallery): add ockerman0_anubislemonade-70b-v1 by @mudler in #5794
  • chore(model gallery): add sicariussicariistuff_impish_llama_4b by @mudler in #5799
  • chore(model gallery): add nano_imp_1b-q8_0 by @mudler in #5800
  • chore(model gallery): add compumacy-experimental-32b by @mudler in #5803
  • chore(model gallery): add mini-hydra by @mudler in #5804
  • chore(model gallery): add zonui-3b-i1 by @mudler in #5805
  • chore(model gallery): add huihui-jan-nano-abliterated by @mudler in #5806
  • chore(model gallery): add cognitivecomputations_dolphin-mistral-24b-venice-edition by @mudler in #5813
  • chore(model gallery): add ockerman0_anubislemonade-70b-v1.1 by @mudler in #5814
  • chore(model gallery): add qwen3-8b-shiningvaliant3 by @mudler in #5815
  • chore(model gallery): add lyranovaheart_starfallen-snow-fantasy-24b-ms3.2-v0.0 by @mudler in #5818
  • chore(model gallery): add zerofata_l3.3-geneticlemonade-opus-70b by @mudler in #5819
  • chore(model gallery): add huggingfacetb_smollm3-3b by @mudler in #5820
  • chore(model gallery): add delta-vector_plesio-70b by @mudler in #5825
  • chore(model gallery): add thedrummer_big-tiger-gemma-27b-v3 by @mudler in #5826
  • chore(model gallery): add thedrummer_tiger-gemma-12b-v3 by @mudler in #5827
  • chore(model gallery): add microsoft_nextcoder-32b by @mudler in #5832
  • chore(model gallery): add huihui-ai_huihui-gemma-3n-e4b-it-abliterated by @mudler in #5833
  • chore(model gallery): add mistralai_devstral-small-2507 by @mudler in #5834
  • chore(model gallery): add nvidia_llama-3_3-nemotron-super-49b-genrm-multilingual by @mudler in #5837
  • chore(model gallery): add mistral-2x24b-moe-power-coder-magistral-devstral-reasoning-ultimate-neo-max-44b by @mudler in #5838
  • chore(model gallery): add impish_magic_24b-i1 by @mudler in #5839
  • chore(model gallery): add google_medgemma-4b-it by @mudler in #5842
  • chore(model gallery): add google_medgemma-27b-it by @mudler in #5843
  • chore(model gallery): add zhi-create-qwen3-32b-i1 by @mudler in #5847
  • chore(model gallery): add sophosympatheia_strawberrylemonade-70b-v1.1 by @mudler in #5848
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #5865
  • chore(model gallery): add omega-qwen3-atom-8b by @mudler in #5883
  • chore(model gallery): add dream-org_dream-v0-instruct-7b by @mudler in #5884
  • chore(model gallery): add entfane_math-genius-7b by @mudler in #5885
  • chore(model gallery): add menlo_lucy by @mudler in #5886
  • chore(model gallery): add qwen3-235b-a22b-instruct-2507 by @mudler in #5887
  • chore(model gallery): add qwen3-coder-480b-a35b-instruct by @mudler in #5888

📖 Documentation and examples

  • fix(docs): Improve Header Responsiveness - Hide "Star us on GitHub!" on Mobile by @dedyf5 in #5770

👒 Dependencies

  • chore: ⬆️ Update ggml-org/llama.cpp to 27208bf657cfe7262791df473927225e48efe482 by @localai-bot in #5753
  • chore: ⬆️ Update ggml-org/llama.cpp to caf5681fcb47dfe9bafee94ef9aa8f669ac986c7 by @localai-bot in #5758
  • chore: ⬆️ Update ggml-org/llama.cpp to 0a5a3b5cdfd887cf0f8e09d9ff89dee130cfcdde by @localai-bot in #5759
  • chore: ⬆️ Update ggml-org/whisper.cpp to bca021c9740b267c2973fba56555be052006023a by @localai-bot in #5776
  • chore: ⬆️ Update ggml-org/llama.cpp to de569441470332ff922c23fb0413cc957be75b25 by @localai-bot in #5777
  • chore: ⬆️ Update ggml-org/whisper.cpp to d9999d54c868b8bfcd376aa26067e787d53e679e by @localai-bot in #5782
  • chore: ⬆️ Update ggml-org/llama.cpp to e75ba4c0434eb759eb7ff74e034ebe729053e575 by @localai-bot in #5783
  • chore(bark-cpp): generalize and move to bark-cpp by @mudler in #5786
  • chore: ⬆️ Update PABannier/bark.cpp to 5d5be84f089ab9ea53b7a793f088d3fbf7247495 by @localai-bot in #4786
  • chore: ⬆️ Update ggml-org/llama.cpp to bee28421be25fd447f61cb6db64d556cbfce32ec by @localai-bot in #5788
  • chore: ⬆️ Update ggml-org/llama.cpp to ef797db357e44ecb7437fa9d22f4e1614104b342 by @localai-bot in #5795
  • chore: ⬆️ Update ggml-org/llama.cpp to a0374a67e2924f2e845cdc59dd67d9a44065a89c by @localai-bot in #5798
  • chore: ⬆️ Update ggml-org/llama.cpp to 6491d6e4f1caf0ad2221865b4249ae6938a6308c by @localai-bot in #5801
  • chore: ⬆️ Update ggml-org/llama.cpp to 12f55c302b35cfe900b84c5fe67c262026af9c44 by @localai-bot in #5808
  • chore: ⬆️ Update ggml-org/whisper.cpp to 869335f2d58d04010535be9ae23a69a9da12a169 by @localai-bot in #5809
  • chore: ⬆️ Update ggml-org/llama.cpp to 6efcd65945a98cf6883cdd9de4c8ccd8c79d219a by @localai-bot in #5817
  • chore: ⬆️ Update ggml-org/llama.cpp to 0b8855775c6b873931d40b77a5e42558aacbde52 by @localai-bot in #5830
  • chore: ⬆️ Update ggml-org/llama.cpp to f5e96b368f1acc7f53c390001b936517c4d18999 by @localai-bot in #5835
  • chore: ⬆️ Update ggml-org/llama.cpp to c31e60647def83d671bac5ab5b35579bf25d9aa1 by @localai-bot in #5840
  • chore: ⬆️ Update ggml-org/whisper.cpp to 3775c503d5133d3d8b99d7d062e87a54064b0eb8 by @localai-bot in #5841
  • chore: ⬆️ Update ggml-org/whisper.cpp to a16da91365700f396da916d16a7f5a2ec99364b9 by @localai-bot in #5846
  • chore: ⬆️ Update ggml-org/llama.cpp to 982e347255723fe6d02e60ee30cfdd0559c884c5 by @localai-bot in #5845
  • chore: ⬆️ Update ggml-org/whisper.cpp to 032697b9a850dc2615555e2a93a683cc3dd58559 by @localai-bot in #5849
  • chore: ⬆️ Update ggml-org/llama.cpp to bdca38376f7e8dd928defe01ce6a16218a64b040 by @localai-bot in #5850
  • chore: ⬆️ Update ggml-org/llama.cpp to 4a4f426944e79b79e389f9ed7b34831cb9b637ad by @localai-bot in #5852
  • chore: ⬆️ Update ggml-org/llama.cpp to 496957e1cbcb522abc63aa18521036e40efce985 by @localai-bot in #5854
  • chore: ⬆️ Update ggml-org/llama.cpp to d6fb3f6b49b27ef1c0f4cf5128e041f7e7dc03af by @localai-bot in #5857
  • chore(deps): bump securego/gosec from 2.22.5 to 2.22.7 by @dependabot[bot] in #5878
  • chore: ⬆️ Update richiejp/stable-diffusion.cpp to 10c6501bd05a697e014f1bee3a84e5664290c489 by @localai-bot in #5732
  • fix(stablediffusion-cpp): Switch back to upstream and update by @richiejp in #5880

Other Changes

  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5752
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5775
  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5781
  • chore: ⬆️ Update ggml-org/llama.cpp to bf9087f59aab940cf312b85a67067ce33d9e365a by @localai-bot in #5860
  • chore: ⬆️ Update ggml-org/llama.cpp to a979ca22db0d737af1e548a73291193655c6be99 by @localai-bot in #5862
  • chore: ⬆️ Update ggml-org/llama.cpp to 2be60cbc2707359241c2784f9d2e30d8fc7cdabb by @localai-bot in #5867
  • chore: ⬆️ Update ggml-org/whisper.cpp to 1f5cf0b2888402d57bb17b2029b2caa97e5f3baf by @localai-bot in #5876
  • chore: ⬆️ Update ggml-org/llama.cpp to 6c9ee3b17e19dcc82ab93d52ae46fdd0226d4777 by @localai-bot in #5877
  • chore: drop vllm for cuda 11 by @mudler in #5881
  • chore: ⬆️ Update ggml-org/llama.cpp to acd6cb1c41676f6bbb25c2a76fa5abeb1719301e by @localai-bot in #5882
  • fix: rename Dockerfile.go --> Dockerfile.golang to avoid IDE errors by @dave-gray101 in #5892
  • chore(Makefile): drop unused targets by @mudler in #5893
  • chore: ⬆️ Update ggml-org/llama.cpp to a86f52b2859dae4db5a7a0bbc0f1ad9de6b43ec6 by @localai-bot in #5894
  • fix: untangle pkg and core by @dave-gray101 in #5896
  • Update quickstart.md by @Shinrai in #5898

New Contributors

Full Changelog: v3.1.1...v3.2.0

Don't miss a new LocalAI release

NewReleases is sending notifications on new releases.