github mudler/LocalAI v4.2.0

4 hours ago

🎉 LocalAI 4.2.0 Release! 🚀




LocalAI 4.2.0 is out!

This release teaches LocalAI to see and hear. New /v1/voice/* and /v1/audio/diarization endpoints, a full face-recognition pipeline with antispoofing, word-level timestamps for faster-whisper, and a client-cancellable Whisper. There is also a drop-in Ollama API, video generation in stable-diffusion.ggml, a redesigned chat with i18n and admin-configurable branding, eleven new backends, an interactive model config editor with autocomplete, and a hardened distributed mode v2. vLLM finally hits feature parity with llama.cpp and gets tensor-parallel distributed workers.


📌 TL;DR

Feature Summary
🎙️ Voice Recognition New /v1/voice/*. Verify, identify, embed and analyze speakers.
👤 Face Recognition + Liveness 1:1 verify, 1:N identify, detect, analyze, embed, and reject spoofed photos.
🎬 Diarization New /v1/audio/diarization endpoint, "who spoke when?" via sherpa-onnx + vibevoice.cpp.
🗣️ Better Transcriptions Word-level timestamps, client-cancellable Whisper, segments + duration + language on the stream-done event.
🦙 Ollama API Drop-in compatibility. Point your ollama client straight at LocalAI.
🎬 Video Generation stable-diffusion.ggml now generates video (i2v, first-last-frame).
💬 Redesigned UI Chat redesign, Nord palette, i18n (5 languages), admin-configurable branding.
✏️ Interactive Model Editor Autocomplete-driven config editor in the UI.
📦 Universal Importer Imports across most backends, not just llama.cpp.
🚦 Concurrency Groups Per-model exclusive groups for safe backend loading.
🧪 11 New Backends sglang, ik-llama-cpp, TurboQuant, sam.cpp, Kokoros, qwen3tts.cpp, tinygrad-multimodal, LocalVQE, vibevoice-cpp, insightface (liveness), voice-rec.
vLLM @ parity Feature parity with llama.cpp + tensor-parallel distributed workers + full engine_args.
🛰️ Distributed v2 Hardened orchestrator, round-robin replicas, scoped Upgrade All, NATS install/upgrade split.

🚀 New Features & Major Enhancements

🎙️ Voice Recognition

LocalAI is now ears-on. New /v1/voice/* endpoints let you verify, identify, analyze and embed speakers, powered by a SpeechBrain + ONNX Python backend.

  • 1:1 Verify, "is this the same speaker?"
  • 1:N Identify, "who is talking, out of my enrolled users?"
  • Embeddings, voice fingerprints for your own pipelines
  • Analyze, age, gender, emotion attributes per segment

🔥 Pairs naturally with the new diarization endpoint for full speaker pipelines.

voice.mp4

👤 Face Recognition & Antispoofing

A complete face-biometrics pipeline, built on InsightFace + ONNX.

  • 1:1 Verify, match two faces
  • 1:N Identify, resolve a face against an enrolled set
  • Detection & Analysis, find faces, extract attributes (age, gender, emotion, race)
  • Embeddings, facial fingerprints for your own stack
  • 🆕 Antispoofing (liveness), reject spoofed photos and videos

✅ Samples never leave your machine. They go only to the running backend.

face.mp4

🎬 Diarization & a smarter audio pipeline

Audio is a first-class citizen now.

  • /v1/audio/diarization, segments speech by speaker turn (sherpa-onnx + vibevoice.cpp)
  • Word-level timestamps for faster-whisper
  • Client cancellation for Whisper via the ggml abort_callback. Stop a transcription mid-flight and free the GPU.
  • Stream-done metadata on /v1/audio/transcriptions. segments, duration and language on the final event.
  • Audio transformations UI (LocalVQE), explore audio FX directly from the React UI
  • Transcription error visibility, handler errors land in the access log and on the client

🦙 Ollama drop-in API

Point your existing Ollama client at LocalAI. Everything keeps working. Another front door, same engine.

OLLAMA_HOST=http://localhost:8080 ollama run qwen3

🎬 Video Generation

The stable-diffusion.ggml backend now generates video, with curated gallery entries for Wan 2.1 FLF2V 14B 720P and Wan i2v 720p, plus a new stablediffusion-ggml-development meta backend to track the cutting edge.


🎨 React UI: total refresh

A massive UI cycle landed in 4.2:

  • 💬 Chat redesign, cleaner layout, faster perceived latency, better message density
  • 🎨 Editorial refresh with the Nord palette, calmer, more focused, dark-mode-first
  • 🌍 Multilingual / i18n, English, Italiano, Español, Deutsch, 简体中文
  • 🪪 Brandable instance, admin-configurable name, tagline, and assets (logo, favicon)
  • ✏️ Interactive model config editor, autocomplete over known fields, live validation, automatic file-renaming on save
  • 🧰 Backend management UX, revamped backend list with concrete versions
  • 🛟 Better error UX, distributed backend management errors surface cleanly

💡 Self-host with your branding. The login page, sidebar, footer, and browser tab all pick up the instance name and logo.

chat.mp4
i18n.mp4

🔄 Backend & model lifecycle

  • Backend versioning with automatic upgrade detection
  • Pin models so they survive the reaper
  • On-demand toggle per model to control auto-load
  • Concurrency groups, per-model exclusive groups so heavy backends won't trample each other
  • Universal importer, single flow that imports across most backends, with clean multi-shard GGUF handling and dedicated importers for vibevoice-cpp and whisper.cpp HF repos
importer.mp4
model-editor.mp4

🧪 New Backends!

Backend What it brings
sglang High-throughput LLM serving + speculative decoding (EAGLE/EAGLE3/DFLASH/MTP)
ik-llama.cpp ikawrakow's llama.cpp fork
TurboQuant Quant-focused llama.cpp fork
sam.cpp Segment Anything detection
Kokoros Rust-native Kokoro TTS
qwen3tts.cpp Qwen3 TTS
tinygrad-multimodal (experimental) tinygrad-powered multimodal
vibevoice.cpp Diarization-grade speech
LocalVQE Audio transformations / FX
insightface Face antispoofing
voice-rec Speaker recognition / embeddings

⚡ vLLM at parity (and beyond)

  • vLLM parity with llama.cpp, same feature surface, same ergonomics
  • vLLM engine_args, the full AsyncEngineArgs exposed via a generic YAML map
  • Tensor-parallel distributed workers, fan a single model across nodes
  • CUDA 13 builds for vLLM, vLLM-omni and sglang
  • L4T arm64 (CUDA 13), vLLM/vLLM-omni/sglang variants for Jetson-class arm64
  • MLX backend refactored, shared helpers and enhanced functionality
  • llama.cpp split_mode for explicit multi-GPU placement
  • Speculative decoding wired through for llama.cpp, Gemma 4 thinking support added
  • Vision / mtmd marker propagated from the backend via ModelMetadata

🛰️ Distributed Mode v2

Distributed mode keeps maturing. This release was a hardening pass across the orchestration loop:

  • Orchestrator resilience, auto-upgrade routing, worker bind-wait, RAG-init crash, log-spam fixes
  • Round-robin across replicas of the same model
  • Upgrade All scoped to nodes that actually have the backend installed
  • NATS install / upgrade split, backend.upgrade no longer piggybacks on install
  • Cached-replica lookup honors NodeSelector, the reconciler no longer scales up empty backends
  • VRAM/RAM reporting correct on NVIDIA unified-memory hosts
  • Agent nodes, queue loops stop on teardown, dead-letter cap added
  • Autoscaling, load-model extracted from Route() and applied during autoscale

🔐 Auth & Security

  • Settings API, env-supplied ApiKeys are stripped before persisting (no accidental leaks)
  • grpc-server hardening, removed unsafe sprintf() in the C++ grpc server
  • OIDC, bumped go-oidc/v3 to 3.18.0
  • Security hardening pass across the codebase
  • AI coding assistants policy, LocalAI now follows the Linux kernel's DCO/attribution guidelines (Assisted-by: trailer, no AI co-authors)

🖥️ Hardware & deployment

  • CUDA 13 for vLLM, vLLM-omni, and sglang
  • NVIDIA L4T arm64 (CUDA 13) for Jetson-class boards
  • ROCm 7.x bumped to latest
  • gfx1151 (Strix Halo / Ryzen AI MAX) support, AMDGPU_TARGETS exposed as a build-arg
  • Intel GPU, latest oneapi-basekit (b70 support) across Intel images
  • arm64 CI, cpu-whisperx and cpu-faster-whisper now ship arm64 images
  • whisperx, ROCm/HIPBLAS target dropped (pinned to rocm6.4 wheels)

🛠️ Under the Hood

  • Better CLI errors with actionable guidance
  • golangci-lint baseline (new-from-merge-base) keeps drift in check
  • Coding-agent discoverability, new APIs let coding agents introspect and configure LocalAI
  • Autoparser, prefers backend-emitted chat deltas, correct logprob passthrough, strips partial reasoning tags during warm-up
  • Reasoning + tools, no more empty content from thinking models in retry loops
  • Streaming hygiene, deduped content, deduped tool calls, recovered reasoning, unique tool_call IDs in deferred flushes
  • HTTP, handler-error status now visible in the access log + transcription error surface
  • Backend monitor accepts model as a query parameter
  • Config loader, YAML backup files are ignored
  • GGUF thinking probe respects explicit reasoning config
  • Inference defaults refreshed from Unsloth
  • Embeddings on collection upload, dim changes handled gracefully
  • Python backends, JIT subprocesses use tempfile.gettempdir() instead of hardcoded /tmp
  • Draft model paths, relative paths now resolve against the models dir
  • whisper-cpp: implement streaming transcription and context cancellation

🐞 Notable fixes

  • Cascading user deletion on PostgreSQL, deleting a user removes all owned data
  • Importer emits all shards for multi-part GGUF models
  • Open Responses parses OpenAI-spec nested tool_choice and uses the correct setter
  • llama-cpp: server-chat.cpp included in grpc-server TU, common -> llama-common rename, turboquant common.h detection
  • ik-llama-cpp: adapted to common_grammar in sampling.h, patched clip.cpp for the new ggml_quantize_chunk signature
  • Kokoros: trait stubs (face_verify, face_analyze, audio_transcription_stream), CI publish
  • stable-diffusion.ggml: MP4 container forced in ffmpeg mux, new i2v options
  • Gallery: orphaned meta-backend uninstall, gemma-4 URIs, flux-kontext param overrides, Wan dedup, z-image-turbo load, Qwen3.5 typo override, tag-casing normalization
  • Streaming: content + tool-call dedup, reasoning recovery, unique tool-call IDs in deferred flush
  • Realtime: consume ChatDeltas when the C++ autoparser clears Response
  • Tool-calls: use SetFunctionCallNameString when forcing a specific tool
  • Faster-whisper: cast segment timestamps to int after multiplication
  • mlx-vlm: pinned to v0.4.4 to unblock CUDA builds
  • vLLM: dropped flash-attn wheel to avoid torch 2.10 ABI mismatch
  • Downloader: list supported URL schemes in DownloadFile errors
  • Backend: resolve relative draft_model paths against the models dir
  • CI: wire AMDGPU_TARGETS through the backend workflow, switch gallery-agent to sigs.k8s.io/yaml, recover rerankers + vllm-omni on aarch64, unbreak master CI for docs/kokoros/vibevoice-cpp ABI

🆕 Gallery additions

  • Wan 2.1 FLF2V 14B 720P (video)
  • Wan i2v 720p (image-to-video)
  • stablediffusion-ggml-development meta backend
  • chroma1-hd (diffusers)
  • Gemma 4 (+ mmproj)
  • EmbeddingGemma
  • Qwen 3.5, Qwen-ASR, OCR entries for llama.cpp
  • Qwen3-VL Reranker, Qwen3-VL Embedding (tagged)
  • A steady stream of automated gallery-agent model additions throughout the cycle 🤖

🚀 The Complete Local Stack for Privacy-First AI

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Drop-in REST API compatible with OpenAI specs for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

Local AI agent management platform. Drop-in for OpenAI's Responses API, with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

RESTful API and knowledge-base management providing persistent memory and storage for AI agents. Pairs with LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall


❤️ Thank You

LocalAI is a true FOSS movement, built by contributors, powered by community.

If you believe in privacy-first, self-hosted AI:

  • Star the repo
  • 💬 Contribute code, docs, translations or feedback
  • 📣 Share with others

Your support keeps this stack alive.


✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes 🐛

  • fix(autoscaling): extract load model from Route() and use as well when doing autoscale by @mudler in #9270
  • fix(nodes): better detection if nodes goes down or model is not available by @mudler in #9274
  • fix: try to add whisperx and faster-whisper for more variants by @mudler in #9278
  • fix: thinking models with tools returning empty content (reasoning-only retry loop) by @mudler in #9290
  • fix(streaming): deduplicate tool call emissions during streaming by @mudler in #9292
  • fix(streaming): skip chat deltas for role-init elements to prevent first token duplication by @mudler in #9299
  • Fix load of z-image-turbo by @thelittlefireman in #9264
  • fix(agents): handle embedding model dim changes on collection upload by @mudler in #9365
  • fix(gallery): correct gemma-4 model URIs returning 404 by @mvanhorn in #9379
  • fix(ui): rename model config files on save to prevent duplicates by @mudler in #9388
  • fix(ci): switch gallery-agent to sigs.k8s.io/yaml by @mudler in #9397
  • fix(llama-cpp): rename linked target common -> llama-common by @mudler in #9408
  • fix(vision): propagate mtmd media marker from backend via ModelMetadata by @mudler in #9412
  • fix(turboquant): resolve common.h by detecting llama-common vs common target by @mudler in #9413
  • fix(rocm): add gfx1151 support and expose AMDGPU_TARGETS build-arg by @keithmattix in #9410
  • fix(kokoros): implement audio_transcription_stream trait stub by @mudler in #9422
  • fix(turboquant): drop ignore-eos patch, bump fork to b8967-627ebbc by @mudler in #9423
  • fix(distributed): stop queue loops on agent nodes + dead-letter cap by @mudler in #9433
  • fix(gallery): allow uninstalling orphaned meta backends + force reinstall by @mudler in #9434
  • fix(stable-diffusion.ggml): force mp4 container in ffmpeg mux by @mudler in #9435
  • fix(settings): strip env-supplied ApiKeys from the request before persisting by @SAY-5 in #9438
  • fix(api): remove duplicate /api/traces endpoint that broke React UI by @pjbrzozowski in #9427
  • fix(distributed): pass ExternalURI through NATS backend install by @russell in #9446
  • fix(ci): wire AMDGPU_TARGETS through backend build workflow by @russell in #9445
  • fix(config): ignore yaml backup files in model loader by @leinasi2014 in #9443
  • [gallery] Fix duplicate sha256 keys in Wan models by @sec171 in #9461
  • fix(tests): update InstallBackend call sites for new URI/Name/Alias params by @mudler in #9467
  • Fix: Add model parameter to neutts-air gallery definition by @localai-bot in #8793
  • fix(gallery-agent): process blacklist command on recently-closed PRs by @mudler in #9473
  • Respect explicit reasoning config during GGUF thinking probe by @leinasi2014 in #9463
  • fix(streaming): dedupe content, recover reasoning, unique tool_call IDs in deferred flush by @mudler in #9470
  • fix(backend-monitor): accept model as a query parameter by @Dennisadira in #9411
  • fix(kokoros): Build and publish the backend images from CI/CD by @richiejp in #9487
  • fix: remove unsafe sprintf() in grpc-server.cpp by @orbisai0security in #9486
  • fix(kokoros): implement face_verify and face_analyze trait stubs by @mudler in #9499
  • fix(ik-llama-cpp): adapt to common_grammar struct in sampling.h by @mudler in #9512
  • fix(llama-cpp): include server-chat.cpp in grpc-server translation unit by @mudler in #9511
  • fix(importer): emit all shards for multi-part GGUF models by @mudler in #9513
  • fix(openresponses): parse OpenAI-spec nested tool_choice + use correct setter by @walcz-de in #9509
  • fix: use SetFunctionCallNameString when forcing a specific tool (3 sites) by @Anai-Guo in #9526
  • fix(ik-llama-cpp): patch clip.cpp for new ggml_quantize_chunk signature by @mudler in #9531
  • fix(realtime): consume ChatDeltas when C++ autoparser clears Response by @richiejp in #9538
  • fix: add hipblaslt library by @eglia in #9541
  • fix(distributed): correct VRAM/RAM reporting on NVIDIA unified-memory hosts by @mudler in #9545
  • fix(vllm): drop flash-attn wheel to avoid torch 2.10 ABI mismatch by @richiejp in #9557
  • fix(mlx-vlm): pin upstream to v0.4.4 to unblock CUDA builds by @mudler in #9568
  • fix(gallery): normalize inconsistent tag casing/plurals across gallery models by @Anai-Guo in #9574
  • fix(gallery): correct Qwen3.5 typo in qwen3.5-27b-claude-4.6 model override (closes #9362) by @Anai-Guo in #9580
  • fix(diffusers): drop compel from requirements to unblock pip resolver by @mudler in #9632
  • fix(ci): fix AMDGPU_TARGETS empty-string bypass in hipblas builds by @russell in #9626
  • fix(distributed): honor NodeSelector in cached-replica lookup, stop empty-backend reconciler scaleups by @localai-bot in #9652
  • fix(distributed): orchestrator resilience — auto-upgrade routing, worker bind-wait, RAG-init crash, log spam by @localai-bot in #9657
  • fix(faster-whisper): cast segment timestamps to int after multiplication by @arteven in #9674
  • fix(python-backend): make JIT subprocesses work on hosts of any size by @richiejp in #9679
  • fix(distributed): scope Upgrade All to nodes that have the backend installed by @mudler in #9678
  • fix(backend): resolve relative draft_model paths against the models dir by @localai-bot in #9680
  • fix: unbreak master CI (docs, kokoros, vibevoice-cpp ABI) by @localai-bot in #9682
  • fix(ci): unbreak rerankers (torch bump) and vllm-omni on aarch64 by @localai-bot in #9688
  • fix(distributed): round-robin replicas of the same model by @localai-bot in #9695
  • fix(downloader): list supported URL schemes in DownloadFile error by @Anai-Guo in #9689
  • fix(auth): cascade user deletion across all owned data on PostgreSQL by @localai-bot in #9702
  • fix(http): make handler-error status visible in access log + transcription errors by @localai-bot in #9707
  • fix(distributed): make backend upgrade actually re-install on workers by @localai-bot in #9708
  • fix(distributed): split NATS backend.upgrade off install + dedup loads by @localai-bot in #9717
  • fix(gallery): keep auto-upgrade off non-dev backends when -development is installed by @mudler in #9736

Exciting New Features 🎉

  • feat(ui): Interactive model config editor with autocomplete by @richiejp in #9149
  • feat: track files being staged by @mudler in #9275
  • feat: Add Kokoros backend by @richiejp in #9212
  • feat(api): add ollama compatibility by @mudler in #9284
  • feat(sam.cpp): add sam.cpp detection backend by @mudler in #9288
  • feat(swagger): update swagger by @localai-bot in #9300
  • chore(qwen3-asr): pass prompt as context to transcribe by @mudler in #9301
  • feat: Add toggle mechanism to enable/disable models from loading on demand by @neurocis in #9304
  • feat: allow to pin models and skip from reaping by @mudler in #9309
  • feat(swagger): update swagger by @localai-bot in #9310
  • feat: backend versioning, upgrade detection and auto-upgrade by @mudler in #9315
  • feat(swagger): update swagger by @localai-bot in #9318
  • feat(qwen3tts.cpp): add new backend by @mudler in #9316
  • feat(ux): backend management enhancement by @mudler in #9325
  • feat(rocm): bump to 7.x by @mudler in #9323
  • feat(backends): add ik-llama-cpp by @mudler in #9326
  • feat(swagger): update swagger by @localai-bot in #9329
  • feat(vllm): parity with llama.cpp backend by @mudler in #9328
  • feat: refactor shared helpers and enhance MLX backend functionality by @mudler in #9335
  • feat: wire transcription for llama.cpp, add streaming support by @mudler in #9353
  • feat(backend): add turboquant llama.cpp-fork backend by @mudler in #9355
  • feat(swagger): update swagger by @localai-bot in #9356
  • feat(backend): add tinygrad multimodal backend (experimental) by @mudler in #9364
  • feat(backends): add sglang by @mudler in #9359
  • refactor(tinygrad): reuse tinygrad.apps.llm instead of vendored Transformer by @mudler in #9380
  • feat(stable-diffusion.ggml): add support for video generation by @mudler in #9420
  • feat(distributed): sync state with frontends, better backend management reporting by @mudler in #9426
  • feat(swagger): update swagger by @localai-bot in #9431
  • feat(gallery): add Wan 2.1 FLF2V 14B 720P by @mudler in #9440
  • feat(gallery): add wan i2v 720p by @mudler in #9457
  • feat: improve CLI error messages with actionable guidance by @localai-bot in #8880
  • chore(whisperx): drop ROCm/hipblas build target by @mudler in #9474
  • feat(face-recognition): add insightface/onnx backend for 1:1 verify, 1:N identify, embedding, detection, analysis by @mudler in #9480
  • feat(importer): expand importer flow to almost all backends by @mudler in #9466
  • feat(swagger): update swagger by @localai-bot in #9498
  • feat: voice recognition by @mudler in #9500
  • feat(insightface): add antispoofing (liveness) detection by @mudler in #9515
  • feat(swagger): update swagger by @localai-bot in #9518
  • feat: add biometrics UI by @mudler in #9524
  • feat: Add Sherpa ONNX backend for ASR and TTS by @richiejp in #8523
  • [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 by @arbrick in #9543
  • feat(react-ui): editorial refresh with Nord palette and polished primitives by @mudler in #9550
  • feat: surface distributed backend management errors by @mudler in #9552
  • feat(backends): add CUDA 13 + L4T arm64 CUDA 13 variants for vllm/vllm-omni/sglang by @mudler in #9553
  • feat(llama-cpp): expose split_mode option for multi-GPU placement by @mudler in #9560
  • ci(backends): build cpu-whisperx and cpu-faster-whisper for linux/arm64 by @mudler in #9573
  • [intel GPU support] Use latest oneapi-basekit image for Intel images to support b70 (in more places this time) by @arbrick in #9578
  • feat: Log backend exit code by @richiejp in #9581
  • feat(distributed): support multiple replicas of one model on the same node by @mudler in #9583
  • feat(swagger): update swagger by @localai-bot in #9587
  • feat: localai assistant chat modality by @mudler in #9602
  • chore: add golangci-lint with new-from-merge-base baseline by @richiejp in #9603
  • feat(swagger): update swagger by @localai-bot in #9607
  • feat(vllm): expose AsyncEngineArgs via generic engine_args YAML map by @richiejp in #9563
  • feat(vibevoice-cpp): add purego TTS+ASR backend by @mudler in #9610
  • feat: react chat redesign by @mudler in #9616
  • feat(llama-cpp): bump to d775992 and adapt to spec params refactor by @mudler in #9618
  • feat(backends/python): use tempfile.gettempdir() instead of hardcoded /tmp by @Anai-Guo in #9629
  • feat(importers): whisper.cpp HF repos pick a quant + nest under whisper/models by @mudler in #9630
  • feat(branding): admin-configurable instance name, tagline, and assets by @mudler in #9635
  • feat(swagger): update swagger by @localai-bot in #9643
  • feat(react-ui): add multilingual (i18n) support by @mudler in #9642
  • feat(ci): allow routing apt traffic through an alternate Ubuntu mirror by @mudler in #9650
  • feat: add LocalVQE backend and audio transformations UI by @richiejp in #9640
  • feat(swagger): update swagger by @localai-bot in #9660
  • feat(concurrency-groups): per-model exclusive groups for backend loading by @mudler in #9662
  • feat(api): add /v1/audio/diarization endpoint with sherpa-onnx + vibevoice.cpp by @mudler in #9654
  • feat(vllm, distributed): tensor parallel distributed workers by @richiejp in #9612
  • feat: support word-level timestamps for faster-whisper by @eglia in #9621
  • feat(importers): add vibevoice-cpp importer for GGUF bundles by @localai-bot in #9685
  • feat(gallery): Speed up load times and clean gallery entries by @richiejp in #9211
  • feat(swagger): update swagger by @localai-bot in #9699
  • feat(sglang): wire engine_args, add cuda13 build, ship MTP gallery demos by @richiejp in #9686
  • feat(api/transcription): include segments + duration + language on stream done event by @localai-bot in #9709
  • feat(whisper): honor client cancellation via ggml abort_callback by @localai-bot in #9710
  • chore: Security hardening by @richiejp in #9719
  • ci: phase 1-3 of GHA free tier migration (path filter, multi-arch split prep, /mnt disk relief) by @localai-bot in #9726
  • feat(swagger): update swagger by @localai-bot in #9723
  • ci: pilot per-arch split + manifest merge for faster-whisper and llama-cpp-quantization by @localai-bot in #9727
  • ci: finish GHA free-tier migration (per-arch fan-out, image splits, retire self-hosted, fix provenance) by @localai-bot in #9730
  • ci: consolidate llama-cpp-darwin into the matrix-driven Darwin flow by @mudler in #9731
  • feat(whisper-cpp): implement streaming transcription by @localai-bot in #9751

🧠 Models

  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9399
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9400
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9425
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9436
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9464
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9481
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9491
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9505
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9555
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9558
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9611
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9615
  • Add tags to qwen3-vl-reranker and Qwen3-VL-Embedding to the gallery by @ER-EPR in #9628
  • chore(model gallery): add chroma1-hd diffusers model by @Anai-Guo in #9646
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9653
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9681
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9703
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #9720

📖 Documentation and examples

  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #9268
  • docs(agents): capture vllm backend lessons + runtime lib packaging by @mudler in #9333
  • chore(agents): Update the backend creation instructions to include Rust and extra tests by @richiejp in #9490

👒 Dependencies

  • chore: ⬆️ Update ggml-org/llama.cpp to 66c4f9ded01b29d9120255be1ed8d5835bcbb51d by @localai-bot in #9269
  • chore(llama.cpp): bump to 'd12cc3d1ca6bba741cd77887ac9c9ee18c8415c7' by @mudler in #9282
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to e8323cabb0e4511ba18a50b1cb34cf1f87fc71ef by @localai-bot in #9281
  • chore: ⬆️ Update ggml-org/llama.cpp to d132f22fc92f36848f7ccf2fc9987cd0b0120825 by @localai-bot in #9302
  • chore: ⬆️ Update PABannier/sam3.cpp to 01832ef85fcc8eb6488f1d01cd247f07e96ff5a9 by @localai-bot in #9311
  • chore: ⬆️ Update ggml-org/llama.cpp to e62fa13c2497b2cd1958cb496e9489e86bbd5182 by @localai-bot in #9312
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9321
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 6b675a5ede9b0edf0a0f44191e8b79d7ef27615a by @localai-bot in #9320
  • chore: ⬆️ Update ggml-org/llama.cpp to ff5ef8278615a2462b79b50abdf3cc95cfb31c6f by @localai-bot in #9319
  • chore: ⬆️ Update ggml-org/llama.cpp to 1e9d771e2c2f1113a5ebdd0dc15bafe57dce64be by @localai-bot in #9330
  • chore(deps): bump softprops/action-gh-release from 2 to 3 by @dependabot[bot] in #9336
  • chore(deps): bump actions/upload-pages-artifact from 4 to 5 by @dependabot[bot] in #9337
  • chore(deps): bump github.com/testcontainers/testcontainers-go from 0.41.0 to 0.42.0 by @dependabot[bot] in #9338
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9346
  • chore(deps): bump sentence-transformers from 5.2.3 to 5.4.0 in /backend/python/transformers by @dependabot[bot] in #9342
  • chore: ⬆️ Update ggml-org/llama.cpp to e97492369888f5311e4d1f3beb325a36bbed70e9 by @localai-bot in #9347
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 55d3c05bf7b377deaa5dc84d255d9740a345a206 by @localai-bot in #9348
  • chore(deps): bump github.com/google/go-containerregistry from 0.21.3 to 0.21.5 by @dependabot[bot] in #9343
  • chore(deps): bump github.com/testcontainers/testcontainers-go/modules/nats from 0.41.0 to 0.42.0 by @dependabot[bot] in #9341
  • chore(deps): bump github.com/swaggo/echo-swagger from 1.4.1 to 1.5.2 by @dependabot[bot] in #9344
  • chore(deps): bump github.com/charmbracelet/glamour from 0.10.0 to 1.0.0 by @dependabot[bot] in #9340
  • chore: ⬆️ Update ggml-org/llama.cpp to fae3a28070fe4026f87bd6a544aba1b2d1896566 by @localai-bot in #9357
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9358
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9369
  • chore: ⬆️ Update ggml-org/llama.cpp to b3d758750a268bf93f084ccfa3060fb9a203192a by @localai-bot in #9370
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 1163af96cf6bb4a4b819f998f84c153a49768b99 by @localai-bot in #9368
  • chore(deps): bump the npm_and_yarn group across 1 directory with 6 updates by @dependabot[bot] in #9373
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to c41c5ded7af85e01b7fe442ff7950c720706d53a by @localai-bot in #9366
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9384
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to eaf83865a132f66e8f49efe0e78491625942f068 by @localai-bot in #9382
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to a564fdf642780d1df123f1c413b19961375b8346 by @localai-bot in #9383
  • chore: ⬆️ Update TheTom/llama-cpp-turboquant to `45f8a066ed5f5bb38c695cec532f6cef9f4efa9d' by @mudler in #9385
  • chore: ⬆️ Update ggml-org/llama.cpp to 4fbdabdc61c04d1262b581e1b8c0c3b119f688ff by @localai-bot in #9381
  • chore: bump inference defaults from unsloth by @github-actions[bot] in #9396
  • chore(deps): bump dompurify from 3.3.2 to 3.4.0 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in #9376
  • chore: ⬆️ Update ggml-org/whisper.cpp to 166c20b473d5f4d04052e699f992f625ea2a2fdd by @localai-bot in #9403
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 52efa12fdae390d1dca6ecd7ca00010fe51f651e by @localai-bot in #9404
  • chore: ⬆️ Update ggml-org/llama.cpp to 4f02d4733934179386cbc15b3454be26237940bb by @localai-bot in #9415
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 7d33d4b2ddeafa672761a5880ec33bdff452504d by @localai-bot in #9417
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8befd92ea5f702494ea9813fe42a52fb015db5fe by @localai-bot in #9418
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 44cca3d626d301e2215d5e243277e8f0e65bfa78 by @localai-bot in #9428
  • chore: ⬆️ Update ggml-org/llama.cpp to 4eac5b45095a4e8a1ff1cce4f6d030e0872fb4ad by @localai-bot in #9429
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 00ba208a5c036eee72d4a631b4f57c126095cb03 by @localai-bot in #9430
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to d4824131580b94ffa7b0e91c955e2b237c2fe16e by @localai-bot in #9447
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9451
  • chore: ⬆️ Update ggml-org/whisper.cpp to fc674574ca27cac59a15e5b22a09b9d9ad62aafe by @localai-bot in #9450
  • chore: ⬆️ Update ggml-org/llama.cpp to cf8b0dbda9ac0eac30ee33f87bc6702ead1c4664 by @localai-bot in #9448
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/service/s3 from 1.97.1 to 1.99.1 by @dependabot[bot] in #9452
  • chore(deps): bump github.com/containerd/containerd from 1.7.30 to 1.7.31 by @dependabot[bot] in #9453
  • chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.4.1 to 1.5.0 by @dependabot[bot] in #9454
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/config from 1.32.14 to 1.32.16 by @dependabot[bot] in #9456
  • chore(deps): bump github.com/coreos/go-oidc/v3 from 3.17.0 to 3.18.0 by @dependabot[bot] in #9455
  • chore: ⬆️ Update ggml-org/llama.cpp to 5a4cd6741fc33227cdacb329f355ab21f8481de2 by @localai-bot in #9479
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to c97702e1057c2fe13a7074cd9069cb9dd6edc1bf by @localai-bot in #9495
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9522
  • chore: ⬆️ Update ggml-org/llama.cpp to 187a45637054881ecacf17f8e2f6f8f2ba7df1c7 by @localai-bot in #9520
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to b8bdffc19962be7e5a84bfefeb2e31bd885b571a by @localai-bot in #9521
  • chore(deps): bump postcss from 8.5.8 to 8.5.10 in /core/http/react-ui in the npm_and_yarn group across 1 directory by @dependabot[bot] in #9544
  • chore(deps): bump rustls-webpki from 0.103.10 to 0.103.13 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in #9546
  • chore: ⬆️ Update ggml-org/llama.cpp to 361fe72acb7b9bd79059cc177cbeda99b35b5db9 by @localai-bot in #9548
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to cb58a561f0c49f68b6d125cdfda037ed80433821 by @localai-bot in #9549
  • chore: ⬆️ Update TheTom/llama-cpp-turboquant to 67559e580b10e4e47e9a6fd6218873997976886d by @localai-bot in #9497
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 3a945af45d45936341a45bbf7deda56776a4af26 by @localai-bot in #9570
  • chore: ⬆️ Update TheTom/llama-cpp-turboquant to 11a241d0db78a68e0a5b99fe6f36de6683100f6a by @localai-bot in #9571
  • chore: ⬆️ Update ggml-org/llama.cpp to dcad77cc3b0865153f486327064fb0320a57a476 by @localai-bot in #9572
  • chore: ⬆️ Update ggml-org/llama.cpp to f53577432541bb9edc1588c4ef45c66bf07e4468 by @localai-bot in #9577
  • chore: ⬆️ Update ggml-org/llama.cpp to 665abc609740d397d30c0d8ef4157dbf900bd1a3 by @localai-bot in #9584
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to d6f3e4e28fbf75e6181e6ea32e734de9ce9304fd by @localai-bot in #9585
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to a81677f59c92d90343aebca51dfed7decf0a0cb0 by @localai-bot in #9586
  • chore(deps): bump github.com/testcontainers/testcontainers-go/modules/postgres from 0.41.0 to 0.42.0 by @dependabot[bot] in #9591
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.28.1 to 2.28.2 by @dependabot[bot] in #9593
  • chore(deps): bump packaging from 24.1 to 26.2 in /backend/python/coqui by @dependabot[bot] in #9594
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 453a027c17e4d63a7f16b871197a396240a65138 by @localai-bot in #9608
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 3d6064b37ef4607917f8acf2ca8c8906d5087413 by @localai-bot in #9617
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to a8aecbf15933295af96504f9a693998322185b5c by @localai-bot in #9625
  • chore: ⬆️ Update ggml-org/llama.cpp to beb42fffa45eded44804a1fd4916146222371581 by @localai-bot in #9624
  • deps: update quic-go to v0.59.0 (fix session ticket panic) by @egyptianbman in #9655
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9661
  • chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.20.1 by @localai-bot in #9649
  • chore(deps): bump docs/themes/hugo-theme-relearn from f69a085 to 8bb66fa by @dependabot[bot] in #9665
  • chore: ⬆️ Update ggml-org/llama.cpp to eff06702b2a52e1020ea009ebd86cb9f5acabab5 by @localai-bot in #9637
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 45dfd80371785731bc2ed05a76252497a4e7a282 by @localai-bot in #9644
  • chore(deps): bump actions/checkout from 4 to 6 by @dependabot[bot] in #9663
  • chore: ⬆️ Update ggml-org/llama.cpp to bbeb89d76c41bc250f16e4a6fefcc9b530d6e3f3 by @localai-bot in #9676
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 8b56d813a9ed04fa7b7fe2588fddd845cf64eccb by @localai-bot in #9677
  • chore: ⬆️ Update TheTom/llama-cpp-turboquant to 69d8e4be47243e83b3d0d71e932bc7aa61c644dc by @localai-bot in #9638
  • chore: ⬆️ Update ggml-org/whisper.cpp to 4bf733672b2871d4153158af4f621a6dd9104f4a by @localai-bot in #9636
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #9700
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to b93721902b4662f9b973b1c412006081c958d085 by @localai-bot in #9697
  • chore: ⬆️ Update ggml-org/llama.cpp to 2496f9c14965c39589f53eea31bdb6d762b1d360 by @localai-bot in #9698
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 90e87bc846f17059771efb8aaa31e9ef0cab6f78 by @localai-bot in #9701
  • chore(deps): bump openssl from 0.10.76 to 0.10.79 in /backend/rust/kokoros in the cargo group across 1 directory by @dependabot[bot] in #9694
  • chore(deps): bump the go_modules group across 1 directory with 8 updates by @dependabot[bot] in #9705
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 9a26522af234f8db079ae3735f35ab6c20fe2c66 by @localai-bot in #9713
  • chore: ⬆️ Update ggml-org/llama.cpp to 05ff59cb57860cc992fc6dcede32c696efea711c by @localai-bot in #9714
  • chore: ⬆️ Update ggml-org/whisper.cpp to c81b2dabbc45484dee2ca6658cfe39c841df5c70 by @localai-bot in #9712
  • chore(deps): bump LocalAGI for collection rehydrate-on-init-failure fix by @localai-bot in #9721
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 98950267c67fd95937a54ebd6e3c66cf2679b710 by @localai-bot in #9725
  • chore: ⬆️ Update ggml-org/llama.cpp to 9f5f0e689c9e977e5f23a27e344aa36082f44738 by @localai-bot in #9724
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to ab0f22b819ac57b7e7484f69c00c10fc755d5c6c by @localai-bot in #9734
  • chore: ⬆️ Update ggml-org/llama.cpp to 00d56b11c3477b99bc18562dc1d1834f0d961778 by @localai-bot in #9733

Other Changes

  • ci: add pre-built base-grpc-builder image infrastructure (PR 1/2) by @localai-bot in #9737
  • ci: refactor llama-cpp variant Dockerfiles to consume prebuilt base-grpc images (PR 2/2) by @localai-bot in #9738
  • chore: ⬆️ Update ggml-org/llama.cpp to 1e5ad35d560b90a8ac447d149c8f8447ae1fcaa0 by @localai-bot in #9739
  • docs(agents): update CI caching docs after the GHA-free-tier migration by @localai-bot in #9742
  • ci: split backend-jobs into single-arch and multi-arch matrices by @localai-bot in #9746
  • chore: ⬆️ Update ggml-org/llama.cpp to 2b2babd1243c67ca811c0a5852cedf92b1a20024 by @localai-bot in #9747
  • chore: ⬆️ Update ikawrakow/ik_llama.cpp to 23127139cb6fa314899c3b5f4935b88b3374c56c by @localai-bot in #9748
  • chore: ⬆️ Update ggml-org/whisper.cpp to c33c5618b72bb345df029b730b36bc0e369845a3 by @localai-bot in #9749
  • chore: ⬆️ Update vllm-project/vllm cu130 wheel to 0.20.2 by @localai-bot in #9750
  • chore: ⬆️ Update ggml-org/llama.cpp to 389ff61d77b5c71cec0cf92fe4e5d01ace80b797 by @localai-bot in #9752

New Contributors

Full Changelog: v4.1.3...v4.2.0

Don't miss a new LocalAI release

NewReleases is sending notifications on new releases.