🚀 LocalAI 3.2.0

Welcome to LocalAI 3.2.0! This is a release that refactors our architecture to be more flexible and lightweight.

The core is now separated from all the backends, making LocalAI faster to download, easier to manage, portable, and much more smaller.

TL;DR – What’s New in LocalAI 3.2.0 🎉

🧩 Modular Backends: All backends now live outside the main binary in our new Backend Gallery. This means you can update, add, or manage backends independently of LocalAI releases.
📉 Leaner Than Ever: The LocalAI binary and container images are drastically smaller, making for faster downloads and a reduced footprint.
🤖 Smart Backend Installation: It just works! When you install a model, LocalAI automatically detects your hardware (CPU, NVIDIA, AMD, Intel) and downloads the necessary backend. No more manual configuration!
🛠️ Simplified Build Process: The new modular architecture significantly simplifies the build process for contributors and power users.
⚡️ Intel GPU Support for Whisper: Transcription with Whisper can now be accelerated on Intel GPUs using SYCL, bringing more hardware options to our users.
🗣️ Enhanced Realtime Audio: We've added speech started and stopped events for more interactive applications and OpenAI-compatible support for the input_audio field in the chat API.
🧠 Massive Model Expansion: The gallery has been updated with over 50 new models, including the latest from Qwen3, Gemma, Mistral, Nemotron, and more!

Note: CI is in the process of building all the backends for this release and will be available soon - if you hit any issue, please try in a few, thanks for understanding!
Note: Some parts of the documentation and the installation scripts (that download the release binaries) have to yet be adapted to the latest changes and/or might not reflect the current state

A New Modular Architecture 🧩

The biggest change in v3.2.0 is the complete separation of inference backends from the core LocalAI binary. Backends like llama.cpp, whisper.cpp, piper, and stablediffusion-ggml are no longer bundled in.

This fundamental shift makes LocalAI:

Lighter: Significantly smaller binary and container image sizes.
More Flexible: Update backends anytime from the gallery without waiting for a new LocalAI release.
Easier to Maintain: A cleaner, more streamlined codebase for faster development.
Easier to Customize: you can build your own backends and install them in your LocalAI instances.

Smart, Automatic Backend Installation 🤖

To make the new modular system seamless, LocalAI now features automatic backend installation.

When you install a model from the gallery (or a YAML file), LocalAI intelligently detects the required backend and your system's capabilities, then downloads the correct version for you. Whether you're running on a standard CPU, an NVIDIA GPU, an AMD GPU, or an Intel GPU, LocalAI handles it automatically.

For advanced use cases or to override auto-detection, you can use the LOCALAI_FORCE_META_BACKEND_CAPABILITY environment variable. Here are the available options:

default: Forces CPU-only backend. This is the fallback if no specific hardware is detected.
nvidia: Forces backends compiled with CUDA support for NVIDIA GPUs.
amd: Forces backends compiled with ROCm support for AMD GPUs.
intel: Forces backends compiled with SYCL/oneAPI support for Intel GPUs.

The Backend Gallery & CLI Control 🖼️

You are in full control. You can browse, install, and manage all available backends directly from the WebUI or using the new CLI commands:

# List all available backends in the gallery
local-ai backends list

# Install a specific backend (e.g., llama-cpp)
local-ai backends install llama-cpp

# Uninstall a backend
local-ai backends uninstall llama-cpp

For development, offline or air-gapped environments, you can now also install backends directly from a local OCI tar file:

local-ai backends install "ocifile://<PATH_TO_TAR_FILE>"

Other Key Improvements

🗣️ Enhanced Realtime and Audio APIs: Building voice-activated applications is now easier.
- The new speech started and stopped events give you precise control over realtime audio streams.
- We now support the input_audio field in the /v1/chat/completions endpoint for multimodal audio inputs, improving OpenAI compatibility.
⚡️ Intel GPU Acceleration for Whisper: Our Whisper backend now supports SYCL, enabling hardware-accelerated transcriptions on Intel GPUs.
✅ UI and Bug Fixes: We've squashed several bugs for a smoother experience, including a fix that correctly shows the download status for backend images in the gallery, so you always know what's happening.
🧠 Massive Model Gallery Expansion: Our model gallery has never been bigger! We've added over 50 new and updated models, with a focus on powerful new releases like qwen3, devstral-small, and nemotron.

🚨 Important Note for Upgrading

Due to the new modular architecture, if you have existing models installed with a version prior to 3.2.0, they might not have a specific backend assigned.

After upgrading, you may need to install the required backend manually for these models to work. You can do this easily from the WebUI or via the CLI: local-ai backends install <backend_name>.

The Complete Local Stack for Privacy-First AI

LocalAI

The free, Open Source OpenAI alternative. Acts as a drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI

A powerful Local AI agent management platform. Serves as a drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall

A RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Designed to work alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall

Thank you! ❤️

A massive THANK YOU to our incredible community and our sponsors! LocalAI has over 34,100 stars, and LocalAGI has already rocketed past 900+ stars!

As a reminder, LocalAI is real FOSS (Free and Open Source Software) and its sibling projects are community-driven and not backed by VCs or a company. We rely on contributors donating their spare time and our sponsors to provide us the hardware! If you love open-source, privacy-first AI, please consider starring the repos, contributing code, reporting bugs, or spreading the word!

👉 Check out the reborn LocalAGI v2 today: https://github.com/mudler/LocalAGI

Full changelog 👇

👉 Click to expand 👈

What's Changed

Breaking Changes 🛠

feat: do not bundle llama-cpp anymore by @mudler in #5790
feat: refactor build process, drop embedded backends by @mudler in #5875

Bug fixes 🐛

fix(gallery): automatically install model from name by @mudler in #5757
fix: Diffusers and XPU fixes by @richiejp in #5737
fix(gallery): correctly show status for downloading OCI images by @mudler in #5774
fix: explorer page should not have login by @mudler in #5855
fix: dockerfile typo by @LeonSijiaLu in #5823
fix(docs): Resolve logo overlap on tablet view by @dedyf5 in #5853
fix: do not pass by environ to ffmpeg by @mudler in #5871
fix(p2p): adapt to backend changes, general improvements by @mudler in #5889

Exciting New Features 🎉

feat(llama.cpp): allow to set kv-overrides by @mudler in #5745
feat(backends): add metas in the gallery by @mudler in #5784
feat(system): detect and allow to override capabilities by @mudler in #5785
chore(cli): add backends CLI to manipulate and install backends by @mudler in #5787
feat(whisper): Enable SYCL by @richiejp in #5802
feat(cli): allow to install backends from OCI tar files by @mudler in #5816
feat(cli): add command to create custom OCI images from directories by @mudler in #5844
feat(realtime): Add speech started and stopped events by @richiejp in #5856
fix: autoload backends when installing models from YAML files by @mudler in #5859
feat: split piper from main binary by @mudler in #5858
feat: remove stablediffusion-ggml from main binary by @mudler in #5861
feat: split whisper from main binary by @mudler in #5863
feat(openai): support input_audio chat api field by @mgoltzsche in #5870
fix(realtime): Reset speech started flag on commit by @richiejp in #5879
fix(build): Add and update ONEAPI_VERSION by @richiejp in #5874

🧠 Models

chore(model gallery): add qwen3-55b-a3b-total-recall-v1.3-i1 by @mudler in #5746
chore(model gallery): add qwen3-55b-a3b-total-recall-deep-40x by @mudler in #5747
chore(model gallery): add qwen3-42b-a3b-stranger-thoughts-deep20x-abliterated-uncensored-i1 by @mudler in #5748
chore(model gallery): add mistral-small-3.2-46b-the-brilliant-raconteur-ii-instruct-2506 by @mudler in #5749
chore(model gallery): add qwen3-22b-a3b-the-harley-quinn by @mudler in #5750
chore(model gallery): add gemma-3-4b-it-max-horror-uncensored-dbl-x-imatrix by @mudler in #5751
chore(model gallery): add qwen3-33b-a3b-stranger-thoughts-abliterated-uncensored by @mudler in #5755
chore(model gallery): add thedrummer_anubis-70b-v1.1 by @mudler in #5771
chore(model gallery): add steelskull_l3.3-shakudo-70b by @mudler in #5772
chore(model gallery): add pinkpixel_crystal-think-v2 by @mudler in #5773
chore(model gallery): add helpingai_dhanishtha-2.0-preview by @mudler in #5791
chore(model gallery): add agentica-org_deepswe-preview by @mudler in #5792
chore(model gallery): add zerofata_ms3.2-paintedfantasy-visage-33b by @mudler in #5793
chore(model gallery): add ockerman0_anubislemonade-70b-v1 by @mudler in #5794
chore(model gallery): add sicariussicariistuff_impish_llama_4b by @mudler in #5799
chore(model gallery): add nano_imp_1b-q8_0 by @mudler in #5800
chore(model gallery): add compumacy-experimental-32b by @mudler in #5803
chore(model gallery): add mini-hydra by @mudler in #5804
chore(model gallery): add zonui-3b-i1 by @mudler in #5805
chore(model gallery): add huihui-jan-nano-abliterated by @mudler in #5806
chore(model gallery): add cognitivecomputations_dolphin-mistral-24b-venice-edition by @mudler in #5813
chore(model gallery): add ockerman0_anubislemonade-70b-v1.1 by @mudler in #5814
chore(model gallery): add qwen3-8b-shiningvaliant3 by @mudler in #5815
chore(model gallery): add lyranovaheart_starfallen-snow-fantasy-24b-ms3.2-v0.0 by @mudler in #5818
chore(model gallery): add zerofata_l3.3-geneticlemonade-opus-70b by @mudler in #5819
chore(model gallery): add huggingfacetb_smollm3-3b by @mudler in #5820
chore(model gallery): add delta-vector_plesio-70b by @mudler in #5825
chore(model gallery): add thedrummer_big-tiger-gemma-27b-v3 by @mudler in #5826
chore(model gallery): add thedrummer_tiger-gemma-12b-v3 by @mudler in #5827
chore(model gallery): add microsoft_nextcoder-32b by @mudler in #5832
chore(model gallery): add huihui-ai_huihui-gemma-3n-e4b-it-abliterated by @mudler in #5833
chore(model gallery): add mistralai_devstral-small-2507 by @mudler in #5834
chore(model gallery): add nvidia_llama-3_3-nemotron-super-49b-genrm-multilingual by @mudler in #5837
chore(model gallery): add mistral-2x24b-moe-power-coder-magistral-devstral-reasoning-ultimate-neo-max-44b by @mudler in #5838
chore(model gallery): add impish_magic_24b-i1 by @mudler in #5839
chore(model gallery): add google_medgemma-4b-it by @mudler in #5842
chore(model gallery): add google_medgemma-27b-it by @mudler in #5843
chore(model gallery): add zhi-create-qwen3-32b-i1 by @mudler in #5847
chore(model gallery): add sophosympatheia_strawberrylemonade-70b-v1.1 by @mudler in #5848
chore(model-gallery): ⬆️ update checksum by @localai-bot in #5865
chore(model gallery): add omega-qwen3-atom-8b by @mudler in #5883
chore(model gallery): add dream-org_dream-v0-instruct-7b by @mudler in #5884
chore(model gallery): add entfane_math-genius-7b by @mudler in #5885
chore(model gallery): add menlo_lucy by @mudler in #5886
chore(model gallery): add qwen3-235b-a22b-instruct-2507 by @mudler in #5887
chore(model gallery): add qwen3-coder-480b-a35b-instruct by @mudler in #5888

📖 Documentation and examples

fix(docs): Improve Header Responsiveness - Hide "Star us on GitHub!" on Mobile by @dedyf5 in #5770

👒 Dependencies

chore: ⬆️ Update ggml-org/llama.cpp to 27208bf657cfe7262791df473927225e48efe482 by @localai-bot in #5753
chore: ⬆️ Update ggml-org/llama.cpp to caf5681fcb47dfe9bafee94ef9aa8f669ac986c7 by @localai-bot in #5758
chore: ⬆️ Update ggml-org/llama.cpp to 0a5a3b5cdfd887cf0f8e09d9ff89dee130cfcdde by @localai-bot in #5759
chore: ⬆️ Update ggml-org/whisper.cpp to bca021c9740b267c2973fba56555be052006023a by @localai-bot in #5776
chore: ⬆️ Update ggml-org/llama.cpp to de569441470332ff922c23fb0413cc957be75b25 by @localai-bot in #5777
chore: ⬆️ Update ggml-org/whisper.cpp to d9999d54c868b8bfcd376aa26067e787d53e679e by @localai-bot in #5782
chore: ⬆️ Update ggml-org/llama.cpp to e75ba4c0434eb759eb7ff74e034ebe729053e575 by @localai-bot in #5783
chore(bark-cpp): generalize and move to bark-cpp by @mudler in #5786
chore: ⬆️ Update PABannier/bark.cpp to 5d5be84f089ab9ea53b7a793f088d3fbf7247495 by @localai-bot in #4786
chore: ⬆️ Update ggml-org/llama.cpp to bee28421be25fd447f61cb6db64d556cbfce32ec by @localai-bot in #5788
chore: ⬆️ Update ggml-org/llama.cpp to ef797db357e44ecb7437fa9d22f4e1614104b342 by @localai-bot in #5795
chore: ⬆️ Update ggml-org/llama.cpp to a0374a67e2924f2e845cdc59dd67d9a44065a89c by @localai-bot in #5798
chore: ⬆️ Update ggml-org/llama.cpp to 6491d6e4f1caf0ad2221865b4249ae6938a6308c by @localai-bot in #5801
chore: ⬆️ Update ggml-org/llama.cpp to 12f55c302b35cfe900b84c5fe67c262026af9c44 by @localai-bot in #5808
chore: ⬆️ Update ggml-org/whisper.cpp to 869335f2d58d04010535be9ae23a69a9da12a169 by @localai-bot in #5809
chore: ⬆️ Update ggml-org/llama.cpp to 6efcd65945a98cf6883cdd9de4c8ccd8c79d219a by @localai-bot in #5817
chore: ⬆️ Update ggml-org/llama.cpp to 0b8855775c6b873931d40b77a5e42558aacbde52 by @localai-bot in #5830
chore: ⬆️ Update ggml-org/llama.cpp to f5e96b368f1acc7f53c390001b936517c4d18999 by @localai-bot in #5835
chore: ⬆️ Update ggml-org/llama.cpp to c31e60647def83d671bac5ab5b35579bf25d9aa1 by @localai-bot in #5840
chore: ⬆️ Update ggml-org/whisper.cpp to 3775c503d5133d3d8b99d7d062e87a54064b0eb8 by @localai-bot in #5841
chore: ⬆️ Update ggml-org/whisper.cpp to a16da91365700f396da916d16a7f5a2ec99364b9 by @localai-bot in #5846
chore: ⬆️ Update ggml-org/llama.cpp to 982e347255723fe6d02e60ee30cfdd0559c884c5 by @localai-bot in #5845
chore: ⬆️ Update ggml-org/whisper.cpp to 032697b9a850dc2615555e2a93a683cc3dd58559 by @localai-bot in #5849
chore: ⬆️ Update ggml-org/llama.cpp to bdca38376f7e8dd928defe01ce6a16218a64b040 by @localai-bot in #5850
chore: ⬆️ Update ggml-org/llama.cpp to 4a4f426944e79b79e389f9ed7b34831cb9b637ad by @localai-bot in #5852
chore: ⬆️ Update ggml-org/llama.cpp to 496957e1cbcb522abc63aa18521036e40efce985 by @localai-bot in #5854
chore: ⬆️ Update ggml-org/llama.cpp to d6fb3f6b49b27ef1c0f4cf5128e041f7e7dc03af by @localai-bot in #5857
chore(deps): bump securego/gosec from 2.22.5 to 2.22.7 by @dependabot[bot] in #5878
chore: ⬆️ Update richiejp/stable-diffusion.cpp to 10c6501bd05a697e014f1bee3a84e5664290c489 by @localai-bot in #5732
fix(stablediffusion-cpp): Switch back to upstream and update by @richiejp in #5880

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5752
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5775
docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #5781
chore: ⬆️ Update ggml-org/llama.cpp to bf9087f59aab940cf312b85a67067ce33d9e365a by @localai-bot in #5860
chore: ⬆️ Update ggml-org/llama.cpp to a979ca22db0d737af1e548a73291193655c6be99 by @localai-bot in #5862
chore: ⬆️ Update ggml-org/llama.cpp to 2be60cbc2707359241c2784f9d2e30d8fc7cdabb by @localai-bot in #5867
chore: ⬆️ Update ggml-org/whisper.cpp to 1f5cf0b2888402d57bb17b2029b2caa97e5f3baf by @localai-bot in #5876
chore: ⬆️ Update ggml-org/llama.cpp to 6c9ee3b17e19dcc82ab93d52ae46fdd0226d4777 by @localai-bot in #5877
chore: drop vllm for cuda 11 by @mudler in #5881
chore: ⬆️ Update ggml-org/llama.cpp to acd6cb1c41676f6bbb25c2a76fa5abeb1719301e by @localai-bot in #5882
fix: rename Dockerfile.go --> Dockerfile.golang to avoid IDE errors by @dave-gray101 in #5892
chore(Makefile): drop unused targets by @mudler in #5893
chore: ⬆️ Update ggml-org/llama.cpp to a86f52b2859dae4db5a7a0bbc0f1ad9de6b43ec6 by @localai-bot in #5894
fix: untangle pkg and core by @dave-gray101 in #5896
Update quickstart.md by @Shinrai in #5898

New Contributors

@dedyf5 made their first contribution in #5770
@Shinrai made their first contribution in #5898

Full Changelog: v3.1.1...v3.2.0

mudler/LocalAI v3.2.0 on GitHub