mudler/LocalAI v3.8.0 on GitHub

Welcome to LocalAI 3.8.0 !

LocalAI 3.8.0 focuses on smoothing out the user experience and exposing more power to the user without requiring restarts or complex configuration files. This release introduces a new onboarding flow and a universal model loader that handles everything from HF URLs to local files.

We’ve also improved the chat interface, addressed long-standing requests regarding OpenAI API compatibility (specifically SSE streaming standards) and exposed more granular controls for some backends (llama.cpp) and backend management.

📌 TL;DR

Feature	Summary
Universal Model Import	Import directly from Hugging Face, Ollama, OCI, or local paths. Auto-detects backends and handles chat templates.
UI & Index Overhaul	New onboarding wizard, auto-model selection on boot, and a cleaner tabular view for model management.
MCP Live Streaming	New: Agent actions and tool calls are now streamed live via the Model Context Protocol—see reasoning in real-time.
Hot-Reloadable Settings	Modify watchdogs, API keys, P2P settings, and defaults without restarting the container.
Chat enhancements	Chat history and parallel conversations are now persisted in local storage.
Strict SSE Compliance	Fixed streaming format to exactly match OpenAI specs (resolves issues with LangChain/JS clients).
Advanced Config	Fine-tune `context_shift`, `cache_ram`, and `parallel` workers via YAML options.
Logprobs & Logitbias	Added token-level probability support for improved agent/eval workflows.

Feature Breakdown

🚀 Universal Model Import (URL-based)

We have refactored how models are imported. You no longer need to manually write configuration files for common use cases. The new importer accepts URLs from Hugging Face, Ollama, and OCI registries, or local file paths also from the Web interface.

import.mp4

Auto-Detection: The system attempts to identify the correct backend (e.g., llama.cpp vs diffusers) and applies native chat templates (e.g., llama-3, mistral) automatically by reading the model metadata.
Customization during Import: You can override defaults immediately, for example, forcing a specific quantization on a GGUF file or selecting vLLM over transformers.
Multimodal Support: Vision components (mmproj) are detected and configured automatically.
File Safety: We added a safeguard to prevent the deletion of model files (blobs) if they are shared by multiple model configurations.

🎨 Complete UI Overhaul

The web interface has been redesigned for better usability and clearer management.

index.mp4

Onboarding Wizard: A guided flow helps first-time users import or install a model in under 30 seconds.
Auto-Focus & Selection: The input field captures focus automatically, and a default model is loaded on startup so you don't start in a "no model selected" state.
Tabular Management: Models and backends are now organized in a cleaner list view, making it easier to see what is installed.

manage.mp4

🤖 Agentic Ecosystem & MCP Live Streaming

LocalAI 3.8.0 significantly upgrades support for agentic workflows using the Model Context Protocol (MCP).

Live Action Streaming: We have added a new endpoint to stream agent results as they happen. Instead of waiting for the final output, you can now watch the agent "think": seeing tool calls, reasoning steps, and intermediate actions streamed live in the UI.

mcp.mp4

Configuring MCP via the interface is now simplified:

mcp_configuration.mp4

🔁 Runtime System Settings

A new Settings > System panel exposes configuration options that previously required environment variables or a restart.

settings.mp4

Immediate Effect: Toggling Watchdogs, P2P, and Gallery availability applies instantly.
API Key Management: You can now generate, rotate, and expire API keys via the UI.
Network: CORS and CSRF settings are now accessible here (note: these specific network settings still require a restart to take effect).

Note: In order to benefit from persisting runtime settings, in older LocalAI deployments it's necessary to mount the /configuration directory from the container image.

⚙️ Advanced `llama.cpp` Configuration

For power users running large context windows or high-throughput setups, we've exposed additional underlying llama.cpp options in the YAML config. You can now tune context shifting, RAM limits for the KV cache, and parallel worker slots.

options:
- context_shift:false
- cache_ram:-1
- use_jinja:true
- parallel:2
- grpc_servers:localhost:50051,localhost:50052

📊 Logprobs & Logitbias Support

This release adds full support for logitbias and logprobs. This is critical for advanced agentic logic, Self-RAG, and evaluating model confidence / hallucination rates. It supports the OpenAI specification.

🛠️ Fixes & Improvements

OpenAI Compatibility:

SSE Streaming: Fixed a critical issue where streaming responses were slightly non-compliant (e.g., sending empty content chunks or missing finish_reason). This resolves integration issues with openai-node, LangChain, and LlamaIndex.
Top_N Behavior: In the reranker, top_n can now be omitted or set to 0 to return all results, rather than defaulting to an arbitrary limit.

General Fixes:

Model Preview: When downloading, you can now see the actual filename and size before committing to the download.
Tool Handling: Fixed crashes when tool content is missing or malformed.
TTS: Fixed dropdown selection states for TTS models.
Browser Storage: Chat history is now persisted in your browser's local storage. You can switch between parallel chats, rename them, and export them to JSON.
True Cancellation: Clicking "Stop" during a stream now correctly propagates a cancellation context to the backend (works for llama.cpp, vLLM, transformers, and diffusers). This immediately stops generation and frees up resources.

🚀 The Complete Local Stack for Privacy-First AI

❤️ Thank You

Over 35,000 stars and growing. LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

✅ Star the repo
💬 Contribute code, docs, or feedback
📣 Share with others

Your support keeps this stack alive.

✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes 🐛

fix(reranker): respect top_n in the request by @mkhludnev in #7025
fix(chatterbox): pin numpy by @mudler in #7198
fix(reranker): support omitting top_n by @mkhludnev in #7199
fix(api): SSE streaming format to comply with specification by @Copilot in #7182
fix(edit): propagate correctly opts when reloading by @mudler in #7233
fix(reranker): llama-cpp sort score desc, crop top_n by @mkhludnev in #7211
fix: handle tool errors by @mudler in #7271
fix(reranker): tests and top_n check fix #7212 by @mkhludnev in #7284
fix the tts model dropdown to show the currently selected model by @ErixM in #7306
fix: do not delete files if used by other configured models by @mudler in #7235
fix(llama.cpp): handle corner cases with tool content by @mudler in #7324

Exciting New Features 🎉

feat(llama.cpp): allow to set cache-ram and ctx_shift by @mudler in #7009
chore: show success toast when system prompt is updated by @shohidulbari in #7131
feat(llama.cpp): consolidate options and respect tokenizer template when enabled by @mudler in #7120
feat: respect context and add request cancellation by @mudler in #7187
feat(ui): add wizard when p2p is disabled by @mudler in #7218
feat(ui): chat stats, small visual enhancements by @mudler in #7223
chore: display file names in model preview by @shohidulbari in #7251
feat: import models via URI by @mudler in #7245
chore(importers): small logic enhancements by @mudler in #7262
feat(ui): allow to cancel ops by @mudler in #7264
feat: migrate to echo and enable cancellation of non-streaming requests by @mudler in #7270
feat(mcp): add LocalAI endpoint to stream live results of the agent by @mudler in #7274
chore: do not use placeholder image by @mudler in #7279
chore: guide the user to import models by @mudler in #7280
chore(ui): import vendored libs by @mudler in #7281
feat(importers): add transformers and vLLM by @mudler in #7278
feat: restyle index by @mudler in #7282
feat: add support to logitbias and logprobs by @mudler in #7283
feat(ui): small refinements by @mudler in #7285
feat(index): minor enhancements by @mudler in #7288
chore: scroll in thinking mode, better buttons placement by @mudler in #7289
chore: small ux enhancements by @mudler in #7290
feat(ui): add backend reinstall button by @mudler in #7305
feat(importer): unify importing code with CLI by @mudler in #7299
feat(ui): runtime settings by @mudler in #7320
feat(importers): Add diffuser backend importer with ginkgo tests and UI support by @Copilot in #7316
feat(ui): add chat history by @mudler in #7325
feat(inpainting): add inpainting endpoint, wire ImageGenerationFunc and return generated image URL by @gmaOCR in #7328

🧠 Models

chore(model-gallery): ⬆️ update checksum by @localai-bot in #6972
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #6982
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #6989
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7017
chore(model-gallery): ⬆️ update checksum by @localai-bot in #7024
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7039
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7040
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7068
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7077
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7127
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7133
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7162
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7205
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7216
chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7237
chore(model-gallery): ⬆️ update checksum by @localai-bot in #7248

📖 Documentation and examples

feat: docs revamp by @mudler in #7313
fix: Update Installer Options URL by @filipeaaoliveira in #7330

👒 Dependencies

chore(deps): bump github.com/mudler/cogito from 0.4.0 to 0.5.0 by @dependabot[bot] in #7054
chore(deps): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.2 by @dependabot[bot] in #7056
chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.0.0 to 1.1.0 by @dependabot[bot] in #7053
chore(deps): bump github.com/valyala/fasthttp from 1.55.0 to 1.68.0 by @dependabot[bot] in #7057
chore(deps): bump github.com/mudler/edgevpn from 0.31.0 to 0.31.1 by @dependabot[bot] in #7055
chore(deps): bump github.com/containerd/containerd from 1.7.28 to 1.7.29 in the go_modules group across 1 directory by @dependabot[bot] in #7149
chore(deps): bump appleboy/ssh-action from 1.2.2 to 1.2.3 by @dependabot[bot] in #7224
chore(deps): bump github.com/mudler/cogito from 0.5.0 to 0.5.1 by @dependabot[bot] in #7226
chore(deps): bump github.com/jaypipes/ghw from 0.19.1 to 0.20.0 by @dependabot[bot] in #7227
chore(deps): bump github.com/docker/docker from 28.5.1+incompatible to 28.5.2+incompatible by @dependabot[bot] in #7228
chore(deps): bump github.com/testcontainers/testcontainers-go from 0.38.0 to 0.40.0 by @dependabot[bot] in #7230
chore(deps): bump github.com/ebitengine/purego from 0.9.0 to 0.9.1 by @dependabot[bot] in #7229
chore(deps): bump fyne.io/fyne/v2 from 2.7.0 to 2.7.1 by @dependabot[bot] in #7293
chore(deps): bump go.yaml.in/yaml/v2 from 2.4.2 to 2.4.3 by @dependabot[bot] in #7294
chore(deps): bump github.com/alecthomas/kong from 1.12.1 to 1.13.0 by @dependabot[bot] in #7296
chore(deps): bump google.golang.org/protobuf from 1.36.8 to 1.36.10 by @dependabot[bot] in #7295
chore(deps): bump golang.org/x/crypto from 0.43.0 to 0.45.0 in the go_modules group across 1 directory by @dependabot[bot] in #7319

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6996
chore: ⬆️ Update ggml-org/whisper.cpp to 999a7e0cbf8484dc2cea1e9f855d6b39f34f7ae9 by @localai-bot in #6997
chore: ⬆️ Update ggml-org/llama.cpp to 2f68ce7cfd20e9e7098514bf730e5389b7bba908 by @localai-bot in #6998
chore: ⬆️ Update ggml-org/llama.cpp to cd5e3b57541ecc52421130742f4d89acbcf77cd4 by @localai-bot in #7023
chore: display warning only when directory is present by @mudler in #7050
chore: ⬆️ Update ggml-org/llama.cpp to c5023daf607c578d6344c628eb7da18ac3d92d32 by @localai-bot in #7069
chore: ⬆️ Update ggml-org/llama.cpp to ad51c0a720062a04349c779aae301ad65ca4c856 by @localai-bot in #7098
chore: ⬆️ Update ggml-org/llama.cpp to a44d77126c911d105f7f800c17da21b2a5b112d1 by @localai-bot in #7125
chore: ⬆️ Update ggml-org/llama.cpp to 7f09a680af6e0ef612de81018e1d19c19b8651e8 by @localai-bot in #7156
chore: use air to live reload in dev environment by @shohidulbari in #7186
chore: ⬆️ Update ggml-org/llama.cpp to 65156105069fa86a4a81b6cb0e8cb583f6420677 by @localai-bot in #7184
chore: ⬆️ Update ggml-org/llama.cpp to 333f2595a3e0e4c0abf233f2f29ef1710acd134d by @localai-bot in #7201
chore: ⬆️ Update ggml-org/llama.cpp to b8595b16e69e3029e06be3b8f6635f9812b2bc3f by @localai-bot in #7210
chore: ⬆️ Update ggml-org/whisper.cpp to a1867e0dad0b21b35afa43fc815dae60c9a139d6 by @localai-bot in #7231
chore: ⬆️ Update ggml-org/llama.cpp to 13730c183b9e1a32c09bf132b5367697d6c55048 by @localai-bot in #7232
chore: ⬆️ Update ggml-org/llama.cpp to 7d019cff744b73084b15ca81ba9916f3efab1223 by @localai-bot in #7247
feat(swagger): update swagger by @localai-bot in #7267
chore: ⬆️ Update ggml-org/whisper.cpp to d9b7613b34a343848af572cc14467fc5e82fc788 by @localai-bot in #7268
chore(deps): bump llama.cpp to c4abcb2457217198efdd67d02675f5fddb7071c2 by @mudler in #7266
chore: ⬆️ Update ggml-org/llama.cpp to 9b17d74ab7d31cb7d15ee7eec1616c3d825a84c0 by @localai-bot in #7273
feat(swagger): update swagger by @localai-bot in #7276
chore: ⬆️ Update ggml-org/llama.cpp to 662192e1dcd224bc25759aadd0190577524c6a66 by @localai-bot in #7277
feat(swagger): update swagger by @localai-bot in #7286
chore: ⬆️ Update ggml-org/llama.cpp to 80deff3648b93727422461c41c7279ef1dac7452 by @localai-bot in #7287
chore(docs): improve documentation and split into sections bigger topics by @mudler in #7292
chore: ⬆️ Update ggml-org/whisper.cpp to b12abefa9be2abae39a73fa903322af135024a36 by @localai-bot in #7300
chore: ⬆️ Update ggml-org/llama.cpp to cb623de3fc61011e5062522b4d05721a22f2e916 by @localai-bot in #7301
chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f by @mudler in #7311
chore: ⬆️ Update ggml-org/llama.cpp to 7d77f07325985c03a91fa371d0a68ef88a91ec7f by @localai-bot in #7314
chore: ⬆️ Update ggml-org/whisper.cpp to 19ceec8eac980403b714d603e5ca31653cd42a3f by @localai-bot in #7321
chore(docs): add documentation about import by @mudler in #7315
chore: ⬆️ Update ggml-org/llama.cpp to dd0f3219419b24740864b5343958a97e1b3e4b26 by @localai-bot in #7322
chore(chatterbox): bump l4t index to support more recent pytorch by @mudler in #7332
chore: ⬆️ Update ggml-org/llama.cpp to 23bc779a6e58762ea892eca1801b2ea1b9050c00 by @localai-bot in #7331
Revert "chore(chatterbox): bump l4t index to support more recent pytorch" by @mudler in #7333

New Contributors

@shohidulbari made their first contribution in #7131
@mkhludnev made their first contribution in #7025
@ErixM made their first contribution in #7306
@filipeaaoliveira made their first contribution in #7330

Full Changelog: v3.7.0...v3.8.0

mudler/LocalAI v3.8.0
on GitHub

📌 TL;DR

Feature Breakdown

🚀 Universal Model Import (URL-based)

🎨 Complete UI Overhaul

🤖 Agentic Ecosystem & MCP Live Streaming

🔁 Runtime System Settings

⚙️ Advanced `llama.cpp` Configuration

📊 Logprobs & Logitbias Support

🛠️ Fixes & Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

mudler/LocalAI v3.8.0 on GitHub

📌 TL;DR

Feature Breakdown

🚀 Universal Model Import (URL-based)

🎨 Complete UI Overhaul

🤖 Agentic Ecosystem & MCP Live Streaming

🔁 Runtime System Settings

⚙️ Advanced llama.cpp Configuration

📊 Logprobs & Logitbias Support

🛠️ Fixes & Improvements

🚀 The Complete Local Stack for Privacy-First AI

LocalAI

LocalAGI

LocalRecall

❤️ Thank You

✅ Full Changelog

What's Changed

Bug fixes 🐛

Exciting New Features 🎉

🧠 Models

📖 Documentation and examples

👒 Dependencies

Other Changes

New Contributors

mudler/LocalAI v3.8.0
on GitHub

⚙️ Advanced `llama.cpp` Configuration