🎉 LocalAI 4.2.0 Release! 🚀

LocalAI 4.2.0 is out!

This release teaches LocalAI to see and hear. New /v1/voice/* and /v1/audio/diarization endpoints, a full face-recognition pipeline with antispoofing, word-level timestamps for faster-whisper, and a client-cancellable Whisper. There is also a drop-in Ollama API, video generation in stable-diffusion.ggml, a redesigned chat with i18n and admin-configurable branding, eleven new backends, an interactive model config editor with autocomplete, and a hardened distributed mode v2. vLLM finally hits feature parity with llama.cpp and gets tensor-parallel distributed workers.

📌 TL;DR

Feature	Summary
🎙️ Voice Recognition	New `/v1/voice/*`. Verify, identify, embed and analyze speakers.
👤 Face Recognition + Liveness	1:1 verify, 1:N identify, detect, analyze, embed, and reject spoofed photos.
🎬 Diarization	New `/v1/audio/diarization` endpoint, "who spoke when?" via sherpa-onnx + vibevoice.cpp.
🗣️ Better Transcriptions	Word-level timestamps, client-cancellable Whisper, segments + duration + language on the stream-done event.
🦙 Ollama API	Drop-in compatibility. Point your `ollama` client straight at LocalAI.
🎬 Video Generation	`stable-diffusion.ggml` now generates video (i2v, first-last-frame).
💬 Redesigned UI	Chat redesign, Nord palette, i18n (5 languages), admin-configurable branding.
✏️ Interactive Model Editor	Autocomplete-driven config editor in the UI.
📦 Universal Importer	Imports across most backends, not just llama.cpp.
🚦 Concurrency Groups	Per-model exclusive groups for safe backend loading.
🧪 11 New Backends	sglang, ik-llama-cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, tinygrad-multimodal, LocalVQE, vibevoice-cpp, insightface (liveness), voice-rec.
⚡ vLLM @ parity	Feature parity with llama.cpp + tensor-parallel distributed workers + full `engine_args`.
🛰️ Distributed v2	Hardened orchestrator, round-robin replicas, scoped Upgrade All, NATS install/upgrade split.

🚀 New Features & Major Enhancements

🎙️ Voice Recognition

LocalAI is now ears-on. New /v1/voice/* endpoints let you verify, identify, analyze and embed speakers, powered by a SpeechBrain + ONNX Python backend.

1:1 Verify, "is this the same speaker?"
1:N Identify, "who is talking, out of my enrolled users?"
Embeddings, voice fingerprints for your own pipelines
Analyze, age, gender, emotion attributes per segment

🔥 Pairs naturally with the new diarization endpoint for full speaker pipelines.

voice.mp4

👤 Face Recognition & Antispoofing

A complete face-biometrics pipeline, built on InsightFace + ONNX.

1:1 Verify, match two faces
1:N Identify, resolve a face against an enrolled set
Detection & Analysis, find faces, extract attributes (age, gender, emotion, race)
Embeddings, facial fingerprints for your own stack
🆕 Antispoofing (liveness), reject spoofed photos and videos

✅ Samples never leave your machine. They go only to the running backend.

face.mp4

🎬 Diarization & a smarter audio pipeline

Audio is a first-class citizen now.

/v1/audio/diarization, segments speech by speaker turn (sherpa-onnx + vibevoice.cpp)
Word-level timestamps for faster-whisper
Client cancellation for Whisper via the ggml abort_callback. Stop a transcription mid-flight and free the GPU.
Stream-done metadata on /v1/audio/transcriptions. segments, duration and language on the final event.
Audio transformations UI (LocalVQE), explore audio FX directly from the React UI
Transcription error visibility, handler errors land in the access log and on the client

🦙 Ollama drop-in API

Point your existing Ollama client at LocalAI. Everything keeps working. Another front door, same engine.

OLLAMA_HOST=http://localhost:8080 ollama run qwen3

🎬 Video Generation

The stable-diffusion.ggml backend now generates video, with curated gallery entries for Wan 2.1 FLF2V 14B 720P and Wan i2v 720p, plus a new stablediffusion-ggml-development meta backend to track the cutting edge.

🎨 React UI: total refresh

A massive UI cycle landed in 4.2:

💬 Chat redesign, cleaner layout, faster perceived latency, better message density
🎨 Editorial refresh with the Nord palette, calmer, more focused, dark-mode-first
🌍 Multilingual / i18n, English, Italiano, Español, Deutsch, 简体中文
🪪 Brandable instance, admin-configurable name, tagline, and assets (logo, favicon)
✏️ Interactive model config editor, autocomplete over known fields, live validation, automatic file-renaming on save
🧰 Backend management UX, revamped backend list with concrete versions
🛟 Better error UX, distributed backend management errors surface cleanly

💡 Self-host with your branding. The login page, sidebar, footer, and browser tab all pick up the instance name and logo.

chat.mp4

i18n.mp4

🔄 Backend & model lifecycle

Backend versioning with automatic upgrade detection
Pin models so they survive the reaper
On-demand toggle per model to control auto-load
Concurrency groups, per-model exclusive groups so heavy backends won't trample each other
Universal importer, single flow that imports across most backends, with clean multi-shard GGUF handling and dedicated importers for vibevoice-cpp and whisper.cpp HF repos

importer.mp4

model-editor.mp4

🧪 New Backends!

Backend	What it brings
sglang	High-throughput LLM serving + speculative decoding (EAGLE/EAGLE3/DFLASH/MTP)
ik-llama.cpp	ikawrakow's llama.cpp fork
TurboQuant	Quant-focused llama.cpp fork
sam.cpp	Segment Anything detection
Kokoros	Rust-native Kokoro TTS
qwen3tts.cpp	Qwen3 TTS
tinygrad-multimodal (experimental)	tinygrad-powered multimodal
vibevoice.cpp	Diarization-grade speech
LocalVQE	Audio transformations / FX
insightface	Face antispoofing
voice-rec	Speaker recognition / embeddings

⚡ vLLM at parity (and beyond)

vLLM parity with llama.cpp, same feature surface, same ergonomics
vLLM engine_args, the full AsyncEngineArgs exposed via a generic YAML map
Tensor-parallel distributed workers, fan a single model across nodes
CUDA 13 builds for vLLM, vLLM-omni and sglang
L4T arm64 (CUDA 13), vLLM/vLLM-omni/sglang variants for Jetson-class arm64
MLX backend refactored, shared helpers and enhanced functionality
llama.cpp split_mode for explicit multi-GPU placement
Speculative decoding wired through for llama.cpp, Gemma 4 thinking support added
Vision / mtmd marker propagated from the backend via ModelMetadata

🛰️ Distributed Mode v2

Distributed mode keeps maturing. This release was a hardening pass across the orchestration loop:

Orchestrator resilience, auto-upgrade routing, worker bind-wait, RAG-init crash, log-spam fixes
Round-robin across replicas of the same model
Upgrade All scoped to nodes that actually have the backend installed
NATS install / upgrade split, backend.upgrade no longer piggybacks on install
Cached-replica lookup honors NodeSelector, the reconciler no longer scales up empty backends
VRAM/RAM reporting correct on NVIDIA unified-memory hosts
Agent nodes, queue loops stop on teardown, dead-letter cap added
Autoscaling, load-model extracted from Route() and applied during autoscale

🔐 Auth & Security

Settings API, env-supplied ApiKeys are stripped before persisting (no accidental leaks)
grpc-server hardening, removed unsafe sprintf() in the C++ grpc server
OIDC, bumped go-oidc/v3 to 3.18.0
Security hardening pass across the codebase
AI coding assistants policy, LocalAI now follows the Linux kernel's DCO/attribution guidelines (Assisted-by: trailer, no AI co-authors)

🖥️ Hardware & deployment

CUDA 13 for vLLM, vLLM-omni, and sglang
NVIDIA L4T arm64 (CUDA 13) for Jetson-class boards
ROCm 7.x bumped to latest
gfx1151 (Strix Halo / Ryzen AI MAX) support, AMDGPU_TARGETS exposed as a build-arg
Intel GPU, latest oneapi-basekit (b70 support) across Intel images
arm64 CI, cpu-whisperx and cpu-faster-whisper now ship arm64 images
whisperx, ROCm/HIPBLAS target dropped (pinned to rocm6.4 wheels)

🛠️ Under the Hood

Better CLI errors with actionable guidance
golangci-lint baseline (new-from-merge-base) keeps drift in check
Coding-agent discoverability, new APIs let coding agents introspect and configure LocalAI
Autoparser, prefers backend-emitted chat deltas, correct logprob passthrough, strips partial reasoning tags during warm-up
Reasoning + tools, no more empty content from thinking models in retry loops
Streaming hygiene, deduped content, deduped tool calls, recovered reasoning, unique tool_call IDs in deferred flushes
HTTP, handler-error status now visible in the access log + transcription error surface
Backend monitor accepts model as a query parameter
Config loader, YAML backup files are ignored
GGUF thinking probe respects explicit reasoning config
Inference defaults refreshed from Unsloth
Embeddings on collection upload, dim changes handled gracefully
Python backends, JIT subprocesses use tempfile.gettempdir() instead of hardcoded /tmp
Draft model paths, relative paths now resolve against the models dir
whisper-cpp: implement streaming transcription and context cancellation

🐞 Notable fixes

Cascading user deletion on PostgreSQL, deleting a user removes all owned data
Importer emits all shards for multi-part GGUF models
Open Responses parses OpenAI-spec nested tool_choice and uses the correct setter
llama-cpp: server-chat.cpp included in grpc-server TU, common -> llama-common rename, turboquant common.h detection
ik-llama-cpp: adapted to common_grammar in sampling.h, patched clip.cpp for the new ggml_quantize_chunk signature
Kokoros: trait stubs (face_verify, face_analyze, audio_transcription_stream), CI publish
stable-diffusion.ggml: MP4 container forced in ffmpeg mux, new i2v options
Gallery: orphaned meta-backend uninstall, gemma-4 URIs, flux-kontext param overrides, Wan dedup, z-image-turbo load, Qwen3.5 typo override, tag-casing normalization
Streaming: content + tool-call dedup, reasoning recovery, unique tool-call IDs in deferred flush
Realtime: consume ChatDeltas when the C++ autoparser clears Response
Tool-calls: use SetFunctionCallNameString when forcing a specific tool
Faster-whisper: cast segment timestamps to int after multiplication
mlx-vlm: pinned to v0.4.4 to unblock CUDA builds
vLLM: dropped flash-attn wheel to avoid torch 2.10 ABI mismatch
Downloader: list supported URL schemes in DownloadFile errors
Backend: resolve relative draft_model paths against the models dir
CI: wire AMDGPU_TARGETS through the backend workflow, switch gallery-agent to sigs.k8s.io/yaml, recover rerankers + vllm-omni on aarch64, unbreak master CI for docs/kokoros/vibevoice-cpp ABI

🆕 Gallery additions

Wan 2.1 FLF2V 14B 720P (video)
Wan i2v 720p (image-to-video)
stablediffusion-ggml-development meta backend
chroma1-hd (diffusers)
Gemma 4 (+ mmproj)
EmbeddingGemma
Qwen 3.5, Qwen-ASR, OCR entries for llama.cpp
Qwen3-VL Reranker, Qwen3-VL Embedding (tagged)
A steady stream of automated gallery-agent model additions throughout the cycle 🤖

🚀 The Complete Local Stack for Privacy-First AI

❤️ Thank You

LocalAI is a true FOSS movement, built by contributors, powered by community.

If you believe in privacy-first, self-hosted AI:

⭐ Star the repo
💬 Contribute code, docs, translations or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes 🐛

fix(autoscaling): extract load model from Route() and use as well when doing autoscale by @mudler in #9270
fix(nodes): better detection if nodes goes down or model is not available by @mudler in #9274
fix: try to add whisperx and faster-whisper for more variants by @mudler in #9278
fix: thinking models with tools returning empty content (reasoning-only retry loop) by @mudler in #9290
fix(streaming): deduplicate tool call emissions during streaming by @mudler in #9292
fix(streaming): skip chat deltas for role-init elements to prevent first token duplication by @mudler in #9299
Fix load of z-image-turbo by @thelittlefireman in #9264
fix(agents): handle embedding model dim changes on collection upload by @mudler in #9365
fix(gallery): correct gemma-4 model URIs returning 404 by @mvanhorn in #9379
fix(ui): rename model config files on save to prevent duplicates by @mudler in #9388
fix(ci): switch gallery-agent to sigs.k8s.io/yaml by @mudler in #9397
fix(llama-cpp): rename linked target common -> llama-common by @mudler in #9408
fix(vision): propagate mtmd media marker from backend via ModelMetadata by @mudler in #9412
fix(turboquant): resolve common.h by detecting llama-common vs common target by @mudler in #9413
fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg by @keithmattix in #9410
fix(kokoros): implement audio_transcription_stream trait stub by @mudler in #9422
fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc by @mudler in #9423
fix(distributed): stop queue loops on agent nodes + dead-letter cap by @mudler in #9433
fix(gallery): allow uninstalling orphaned meta backends + force reinstall by @mudler in #9434
fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux by @mudler in #9435
fix(settings): strip env-supplied ApiKeys from the request before persisting by @SAY-5 in #9438
fix(api): remove duplicate /api/traces endpoint that broke React UI by @pjbrzozowski in #9427
fix(distributed): pass ExternalURI through NATS backend install by @russell in #9446
fix(ci): wire AMDGPU_TARGETS through backend build workflow by @russell in #9445
fix(config): ignore yaml backup files in model loader by @leinasi2014 in #9443
[gallery] Fix duplicate sha256 keys in Wan models by @sec171 in #9461
fix(tests): update InstallBackend call sites for new URI/Name/Alias params by @mudler in #9467
Fix: Add model parameter to neutts-air gallery definition by @localai-bot in #8793
fix(gallery-agent): process blacklist command on recently-closed PRs by @mudler in #9473
Respect explicit reasoning config during GGUF thinking probe by @leinasi2014 in #9463
fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush by @mudler in #9470
fix(backend-monitor): accept model as a query parameter by @Dennisadira in #9411
fix(kokoros): Build and publish the backend images from CI/CD by @richiejp in #9487
fix: remove unsafe sprintf() in grpc-server.cpp by @orbisai0security in #9486
fix(kokoros): implement face_verify and face_analyze trait stubs by @mudler in #9499
fix(ik-llama-cpp): adapt to common_grammar struct in sampling.h by @mudler in #9512
fix(llama-cpp): include server-chat.cpp in grpc-server translation unit by @mudler in #9511
fix(importer): emit all shards for multi-part GGUF models by @mudler in #9513
fix(openresponses): parse OpenAI-spec nested tool_choice + use correct setter by @walcz-de in #9509
fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) by @Anai-Guo in #9526
fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature by @mudler in #9531
fix(realtime): consume ChatDeltas when C++ autoparser clears Response by @richiejp in #9538
fix: add hipblaslt library by @eglia in #9541
fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts by @mudler in #9545
fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch by @richiejp in #9557
fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds by @mudler in #9568
fix(gallery): normalize inconsistent tag casing/plurals across gallery models by @Anai-Guo in #9574
fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) by @Anai-Guo in #9580
fix(diffusers): drop compel from requirements to unblock pip resolver by @mudler in #9632
fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds by @russell in #9626
fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups by @localai-bot in #9652
fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam by @localai-bot in #9657
fix(faster-whisper): cast segment timestamps to int after multiplication by @arteven in #9674
fix(python-backend): make JIT subprocesses work on hosts of any size by @richiejp in #9679
fix(distributed): scope Upgrade All to nodes that have the backend installed by @mudler in #9678
fix(backend): resolve relative draft_model paths against the models dir by @localai-bot in #9680
fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) by @localai-bot in #9682
fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 by @localai-bot in #9688
fix(distributed): round-robin replicas of the same model by @localai-bot in #9695
fix(downloader): list supported URL schemes in DownloadFile error by @Anai-Guo in #9689
fix(auth): cascade user deletion across all owned data on PostgreSQL by @localai-bot in #9702
fix(http): make handler-error status visible in access log + transcription errors by @localai-bot in #9707
fix(distributed): make backend upgrade actually re-install on workers by @localai-bot in #9708
fix(distributed): split NATS backend.upgrade off install + dedup loads by @localai-bot in #9717
fix(gallery): keep auto-upgrade off non-dev backends when -development is installed by @mudler in #9736

Exciting New Features 🎉

feat(ui): Interactive model config editor with autocomplete by @richiejp in #9149
feat: track files being staged by @mudler in #9275
feat: Add Kokoros backend by @richiejp in #9212
feat(api): add ollama compatibility by @mudler in #9284
feat(sam.cpp): add sam.cpp detection backend by @mudler in #9288
feat(swagger): update swagger by @localai-bot in #9300
chore(qwen3-asr): pass prompt as context to transcribe by @mudler in #9301
feat: Add toggle mechanism to enable/disable models from loading on demand by @neurocis in #9304
feat: allow to pin models and skip from reaping by @mudler in #9309
feat(swagger): update swagger by @localai-bot in #9310
feat: backend versioning, upgrade detection and auto-upgrade by @mudler in #9315
feat(swagger): update swagger by @localai-bot in #9318
feat(qwen3tts.cpp): add new backend by @mudler in #9316
feat(ux): backend management enhancement by @mudler in #9325
feat(rocm): bump to 7.x by @mudler in #9323
feat(backends): add ik-llama-cpp by @mudler in #9326
feat(swagger): update swagger by @localai-bot in #9329
feat(vllm): parity with llama.cpp backend by @mudler in #9328
feat: refactor shared helpers and enhance MLX backend functionality by @mudler in #9335
feat: wire transcription for llama.cpp, add streaming support by @mudler in #9353
feat(backend): add turboquant llama.cpp-fork backend by @mudler in #9355
feat(swagger): update swagger by @localai-bot in #9356
feat(backend): add tinygrad multimodal backend (experimental) by @mudler in #9364
feat(backends): add sglang by @mudler in #9359
refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer by @mudler in #9380
feat(stable-diffusion.ggml): add support for video generation by @mudler in #9420
feat(distributed): sync state with frontends, better backend management reporting by @mudler in #9426
feat(swagger): update swagger by @localai-bot in #9431
feat(gallery): add Wan 2.1 FLF2V 14B 720P by @mudler in #9440
feat(gallery): add wan i2v 720p by @mudler in #9457
feat: improve CLI error messages with actionable guidance by @localai-bot in #8880
chore(whisperx): drop ROCm/hipblas build target by @mudler in #9474
feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis by @mudler in #9480
feat(importer): expand importer flow to almost all backends by @mudler in #9466
feat(swagger): update swagger by @localai-bot in #9498
feat: voice recognition by @mudler in #9500
feat(insightface): add antispoofing (liveness) detection by @mudler in #9515
feat(swagger): update swagger by @localai-bot in #9518
feat: add biometrics UI by @mudler in #9524
feat: Add Sherpa ONNX backend for ASR and TTS by @richiejp in #8523
[intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 by @arbrick in #9543
feat(react-ui): editorial refresh with Nord palette and polished primitives by @mudler in #9550
feat: surface distributed backend management errors by @mudler in #9552
feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang by @mudler in #9553
feat(llama-cpp): expose split_mode option for multi-GPU placement by @mudler in #9560
ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 by @mudler in #9573
[intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) by @arbrick in #9578
feat: Log backend exit code by @richiejp in #9581
feat(distributed): support multiple replicas of one model on the same node by @mudler in #9583
feat(swagger): update swagger by @localai-bot in #9587
feat: localai assistant chat modality by @mudler in #9602
chore: add golangci-lint with new-from-merge-base baseline by @richiejp in #9603
feat(swagger): update swagger by @localai-bot in #9607
feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map by @richiejp in #9563
feat(vibevoice-cpp): add purego TTS+ASR backend by @mudler in #9610
feat: react chat redesign by @mudler in #9616
feat(llama-cpp): bump to d775992 and adapt to spec params refactor by @mudler in #9618
feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp by @Anai-Guo in #9629
feat(importers): whisper.cpp HF repos pick a quant + nest under whisper/models by @mudler in #9630
feat(branding): admin-configurable instance name, tagline, and assets by @mudler in #9635
feat(swagger): update swagger by @localai-bot in #9643
feat(react-ui): add multilingual (i18n) support by @mudler in #9642
feat(ci): allow routing apt traffic through an alternate Ubuntu mirror by @mudler in #9650
feat: add LocalVQE backend and audio transformations UI by @richiejp in #9640
feat(swagger): update swagger by @localai-bot in #9660
feat(concurrency-groups): per-model exclusive groups for backend loading by @mudler in #9662
feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp by @mudler in #9654
feat(vllm, distributed): tensor parallel distributed workers by @richiejp in #9612
feat: support word-level timestamps for faster-whisper by @eglia in #9621
feat(importers): add vibevoice-cpp importer for GGUF bundles by @localai-bot in #9685
feat(gallery): Speed up load times and clean gallery entries by @richiejp in #9211
feat(swagger): update swagger by @localai-bot in #9699
feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos by @richiejp in #9686
feat(api/transcription): include segments + duration + language on stream done event by @localai-bot in #9709
feat(whisper): honor client cancellation via ggml abort_callback by @localai-bot in #9710
chore: Security hardening by @richiejp in #9719
ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) by @localai-bot in #9726
feat(swagger): update swagger by @localai-bot in #9723
ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization by @localai-bot in #9727
ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) by @localai-bot in #9730
ci: consolidate llama-cpp-darwin into the matrix-driven Darwin flow by @mudler in #9731
feat(whisper-cpp): implement streaming transcription by @localai-bot in #9751

🧠 Models

chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9399
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9400
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9425
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9436
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9464
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9481
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9491
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9505
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9555
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9558
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9611
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9615
Add tags to qwen3-vl-reranker and Qwen3-VL-Embedding to the gallery by @ER-EPR in #9628
chore(model gallery): add chroma1-hd diffusers model by @Anai-Guo in #9646
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9653
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9681
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9703
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9720

📖 Documentation and examples

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9268
docs(agents): capture vllm backend lessons + runtime lib packaging by @mudler in #9333
chore(agents): Update the backend creation instructions to include Rust and extra tests by @richiejp in #9490

👒 Dependencies

chore: ⬆️ Update ggml-org/llama.cpp to 66c4f9ded01b29d9120255be1ed8d5835bcbb51d by @localai-bot in #9269
chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' by @mudler in #9282
chore: ⬆️ Update leejet/stable-diffusion.cpp to e8323cabb0e4511ba18a50b1cb34cf1f87fc71ef by @localai-bot in #9281
chore: ⬆️ Update ggml-org/llama.cpp to d132f22fc92f36848f7ccf2fc9987cd0b0120825 by @localai-bot in #9302
chore: ⬆️ Update PABannier/sam3.cpp to 01832ef85fcc8eb6488f1d01cd247f07e96ff5a9 by @localai-bot in #9311
chore: ⬆️ Update ggml-org/llama.cpp to e62fa13c2497b2cd1958cb496e9489e86bbd5182 by @localai-bot in #9312
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9321
chore: ⬆️ Update leejet/stable-diffusion.cpp to 6b675a5ede9b0edf0a0f44191e8b79d7ef27615a by @localai-bot in #9320
chore: ⬆️ Update ggml-org/llama.cpp to ff5ef8278615a2462b79b50abdf3cc95cfb31c6f by @localai-bot in #9319
chore: ⬆️ Update ggml-org/llama.cpp to 1e9d771e2c2f1113a5ebdd0dc15bafe57dce64be by @localai-bot in #9330
chore(deps): bump softprops/action-gh-release from 2 to 3 by @dependabot[bot] in #9336
chore(deps): bump actions/upload-pages-artifact from 4 to 5 by @dependabot[bot] in #9337
chore(deps): bump github.com/testcontainers/testcontainers-go from 0.41.0 to 0.42.0 by @dependabot[bot] in #9338
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9346
chore(deps): bump sentence-transformers from 5.2.3 to 5.4.0 in /backend/python/transformers by @dependabot[bot] in #9342
chore: ⬆️ Update ggml-org/llama.cpp to e97492369888f5311e4d1f3beb325a36bbed70e9 by @localai-bot in #9347
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 55d3c05bf7b377deaa5dc84d255d9740a345a206 by @localai-bot in #9348
chore(deps): bump github.com/google/go-containerregistry from 0.21.3 to 0.21.5 by @dependabot[bot] in #9343
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats from 0.41.0 to 0.42.0 by @dependabot[bot] in #9341
chore(deps): bump github.com/swaggo/echo-swagger from 1.4.1 to 1.5.2 by @dependabot[bot] in #9344
chore(deps): bump github.com/charmbracelet/glamour from 0.10.0 to 1.0.0 by @dependabot[bot] in #9340
chore: ⬆️ Update ggml-org/llama.cpp to fae3a28070fe4026f87bd6a544aba1b2d1896566 by @localai-bot in #9357
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9358
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9369
chore: ⬆️ Update ggml-org/llama.cpp to b3d758750a268bf93f084ccfa3060fb9a203192a by @localai-bot in #9370
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 1163af96cf6bb4a4b819f998f84c153a49768b99 by @localai-bot in #9368
chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates by @dependabot[bot] in #9373
chore: ⬆️ Update leejet/stable-diffusion.cpp to c41c5ded7af85e01b7fe442ff7950c720706d53a by @localai-bot in #9366
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9384
chore: ⬆️ Update ikawrakow/ik_llama.cpp to eaf83865a132f66e8f49efe0e78491625942f068 by @localai-bot in #9382
chore: ⬆️ Update leejet/stable-diffusion.cpp to a564fdf642780d1df123f1c413b19961375b8346 by @localai-bot in #9383
chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' by @mudler in #9385
chore: ⬆️ Update ggml-org/llama.cpp to 4fbdabdc61c04d1262b581e1b8c0c3b119f688ff by @localai-bot in #9381
chore: bump inference defaults from unsloth by @github-actions[bot] in #9396
chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in #9376
chore: ⬆️ Update ggml-org/whisper.cpp to 166c20b473d5f4d04052e699f992f625ea2a2fdd by @localai-bot in #9403
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 52efa12fdae390d1dca6ecd7ca00010fe51f651e by @localai-bot in #9404
chore: ⬆️ Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb by @localai-bot in #9415
chore: ⬆️ Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d by @localai-bot in #9417
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe by @localai-bot in #9418
chore: ⬆️ Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 by @localai-bot in #9428
chore: ⬆️ Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad by @localai-bot in #9429
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 by @localai-bot in #9430
chore: ⬆️ Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e by @localai-bot in #9447
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9451
chore: ⬆️ Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe by @localai-bot in #9450
chore: ⬆️ Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 by @localai-bot in #9448
chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 by @dependabot[bot] in #9452
chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 by @dependabot[bot] in #9453
chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 by @dependabot[bot] in #9454
chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 by @dependabot[bot] in #9456
chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 by @dependabot[bot] in #9455
chore: ⬆️ Update ggml-org/llama.cpp to 5a4cd6741fc33227cdacb329f355ab21f8481de2 by @localai-bot in #9479
chore: ⬆️ Update leejet/stable-diffusion.cpp to c97702e1057c2fe13a7074cd9069cb9dd6edc1bf by @localai-bot in #9495
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9522
chore: ⬆️ Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 by @localai-bot in #9520
chore: ⬆️ Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a by @localai-bot in #9521
chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in #9544
chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in #9546
chore: ⬆️ Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 by @localai-bot in #9548
chore: ⬆️ Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 by @localai-bot in #9549
chore: ⬆️ Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d by @localai-bot in #9497
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 by @localai-bot in #9570
chore: ⬆️ Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a by @localai-bot in #9571
chore: ⬆️ Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 by @localai-bot in #9572
chore: ⬆️ Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 by @localai-bot in #9577
chore: ⬆️ Update ggml-org/llama.cpp to 665abc609740d397d30c0d8ef4157dbf900bd1a3 by @localai-bot in #9584
chore: ⬆️ Update ikawrakow/ik_llama.cpp to d6f3e4e28fbf75e6181e6ea32e734de9ce9304fd by @localai-bot in #9585
chore: ⬆️ Update leejet/stable-diffusion.cpp to a81677f59c92d90343aebca51dfed7decf0a0cb0 by @localai-bot in #9586
chore(deps): bump github.com/testcontainers/testcontainers-go/modules/postgres from 0.41.0 to 0.42.0 by @dependabot[bot] in #9591
chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.1 to 2.28.2 by @dependabot[bot] in #9593
chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui by @dependabot[bot] in #9594
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 453a027c17e4d63a7f16b871197a396240a65138 by @localai-bot in #9608
chore: ⬆️ Update leejet/stable-diffusion.cpp to 3d6064b37ef4607917f8acf2ca8c8906d5087413 by @localai-bot in #9617
chore: ⬆️ Update ikawrakow/ik_llama.cpp to a8aecbf15933295af96504f9a693998322185b5c by @localai-bot in #9625
chore: ⬆️ Update ggml-org/llama.cpp to beb42fffa45eded44804a1fd4916146222371581 by @localai-bot in #9624
deps: update quic-go to v0.59.0 (fix session ticket panic) by @egyptianbman in #9655
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9661
chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.20.1 by @localai-bot in #9649
chore(deps): bump docs/themes/hugo-theme-relearn from f69a085 to 8bb66fa by @dependabot[bot] in #9665
chore: ⬆️ Update ggml-org/llama.cpp to eff06702b2a52e1020ea009ebd86cb9f5acabab5 by @localai-bot in #9637
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 45dfd80371785731bc2ed05a76252497a4e7a282 by @localai-bot in #9644
chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #9663
chore: ⬆️ Update ggml-org/llama.cpp to bbeb89d76c41bc250f16e4a6fefcc9b530d6e3f3 by @localai-bot in #9676
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8b56d813a9ed04fa7b7fe2588fddd845cf64eccb by @localai-bot in #9677
chore: ⬆️ Update TheTom/llama-cpp-turboquant to 69d8e4be47243e83b3d0d71e932bc7aa61c644dc by @localai-bot in #9638
chore: ⬆️ Update ggml-org/whisper.cpp to 4bf733672b2871d4153158af4f621a6dd9104f4a by @localai-bot in #9636
chore(model-gallery): ⬆️ update checksum by @localai-bot in #9700
chore: ⬆️ Update ikawrakow/ik_llama.cpp to b93721902b4662f9b973b1c412006081c958d085 by @localai-bot in #9697
chore: ⬆️ Update ggml-org/llama.cpp to 2496f9c14965c39589f53eea31bdb6d762b1d360 by @localai-bot in #9698
chore: ⬆️ Update leejet/stable-diffusion.cpp to 90e87bc846f17059771efb8aaa31e9ef0cab6f78 by @localai-bot in #9701
chore(deps): bump openssl from 0.10.76 to 0.10.79 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in #9694
chore(deps): bump the go_modules group across 1 directory with 8 updates by @dependabot[bot] in #9705
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 9a26522af234f8db079ae3735f35ab6c20fe2c66 by @localai-bot in #9713
chore: ⬆️ Update ggml-org/llama.cpp to 05ff59cb57860cc992fc6dcede32c696efea711c by @localai-bot in #9714
chore: ⬆️ Update ggml-org/whisper.cpp to c81b2dabbc45484dee2ca6658cfe39c841df5c70 by @localai-bot in #9712
chore(deps): bump LocalAGI for collection rehydrate-on-init-failure fix by @localai-bot in #9721
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 98950267c67fd95937a54ebd6e3c66cf2679b710 by @localai-bot in #9725
chore: ⬆️ Update ggml-org/llama.cpp to 9f5f0e689c9e977e5f23a27e344aa36082f44738 by @localai-bot in #9724
chore: ⬆️ Update ikawrakow/ik_llama.cpp to ab0f22b819ac57b7e7484f69c00c10fc755d5c6c by @localai-bot in #9734
chore: ⬆️ Update ggml-org/llama.cpp to 00d56b11c3477b99bc18562dc1d1834f0d961778 by @localai-bot in #9733

Other Changes

ci: add pre-built base-grpc-builder image infrastructure (PR 1/2) by @localai-bot in #9737
ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) by @localai-bot in #9738
chore: ⬆️ Update ggml-org/llama.cpp to 1e5ad35d560b90a8ac447d149c8f8447ae1fcaa0 by @localai-bot in #9739
docs(agents): update CI caching docs after the GHA-free-tier migration by @localai-bot in #9742
ci: split backend-jobs into single-arch and multi-arch matrices by @localai-bot in #9746
chore: ⬆️ Update ggml-org/llama.cpp to 2b2babd1243c67ca811c0a5852cedf92b1a20024 by @localai-bot in #9747
chore: ⬆️ Update ikawrakow/ik_llama.cpp to 23127139cb6fa314899c3b5f4935b88b3374c56c by @localai-bot in #9748
chore: ⬆️ Update ggml-org/whisper.cpp to c33c5618b72bb345df029b730b36bc0e369845a3 by @localai-bot in #9749
chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.20.2 by @localai-bot in #9750
chore: ⬆️ Update ggml-org/llama.cpp to 389ff61d77b5c71cec0cf92fe4e5d01ace80b797 by @localai-bot in #9752

New Contributors

@neurocis made their first contribution in #9304
@thelittlefireman made their first contribution in #9264
@mvanhorn made their first contribution in #9379
@keithmattix made their first contribution in #9410
@SAY-5 made their first contribution in #9438
@pjbrzozowski made their first contribution in #9427
@russell made their first contribution in #9446
@leinasi2014 made their first contribution in #9443
@sec171 made their first contribution in #9461
@Dennisadira made their first contribution in #9411
@orbisai0security made their first contribution in #9486
@Anai-Guo made their first contribution in #9526
@arbrick made their first contribution in #9543
@eglia made their first contribution in #9541
@egyptianbman made their first contribution in #9655
@arteven made their first contribution in #9674

Full Changelog: v4.1.3...v4.2.0

mudler/LocalAI v4.2.0
on GitHub

🎉 LocalAI 4.2.0 Release! 🚀

📌 TL;DR

🚀 New Features & Major Enhancements

🎙️ Voice Recognition

👤 Face Recognition & Antispoofing

🎬 Diarization & a smarter audio pipeline

🦙 Ollama drop-in API

🎬 Video Generation

🎨 React UI: total refresh

🔄 Backend & model lifecycle

🧪 New Backends!

⚡ vLLM at parity (and beyond)

🛰️ Distributed Mode v2

🔐 Auth & Security

🖥️ Hardware & deployment

🛠️ Under the Hood

🐞 Notable fixes

🆕 Gallery additions

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

mudler/LocalAI v4.2.0 on GitHub

🎉 LocalAI 4.2.0 Release! 🚀

📌 TL;DR

🚀 New Features & Major Enhancements

🎙️ Voice Recognition

👤 Face Recognition & Antispoofing

🎬 Diarization & a smarter audio pipeline

🦙 Ollama drop-in API

🎬 Video Generation

🎨 React UI: total refresh

🔄 Backend & model lifecycle

🧪 New Backends!

⚡ vLLM at parity (and beyond)

🛰️ Distributed Mode v2

🔐 Auth & Security

🖥️ Hardware & deployment

🛠️ Under the Hood

🐞 Notable fixes

🆕 Gallery additions

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

mudler/LocalAI v4.2.0
on GitHub