Welcome to LocalAI 3.8.0 !
LocalAI 3.8.0 focuses on smoothing out the user experience and exposing more power to the user without requiring restarts or complex configuration files. This release introduces a new onboarding flow and a universal model loader that handles everything from HF URLs to local files.
We’ve also improved the chat interface, addressed long-standing requests regarding OpenAI API compatibility (specifically SSE streaming standards) and exposed more granular controls for some backends (llama.cpp) and backend management.
📌 TL;DR
| Feature | Summary |
|---|---|
| Universal Model Import | Import directly from Hugging Face, Ollama, OCI, or local paths. Auto-detects backends and handles chat templates. |
| UI & Index Overhaul | New onboarding wizard, auto-model selection on boot, and a cleaner tabular view for model management. |
| MCP Live Streaming | New: Agent actions and tool calls are now streamed live via the Model Context Protocol—see reasoning in real-time. |
| Hot-Reloadable Settings | Modify watchdogs, API keys, P2P settings, and defaults without restarting the container. |
| Chat enhancements | Chat history and parallel conversations are now persisted in local storage. |
| Strict SSE Compliance | Fixed streaming format to exactly match OpenAI specs (resolves issues with LangChain/JS clients). |
| Advanced Config | Fine-tune context_shift, cache_ram, and parallel workers via YAML options.
|
| Logprobs & Logitbias | Added token-level probability support for improved agent/eval workflows. |
Feature Breakdown
🚀 Universal Model Import (URL-based)
We have refactored how models are imported. You no longer need to manually write configuration files for common use cases. The new importer accepts URLs from Hugging Face, Ollama, and OCI registries, or local file paths also from the Web interface.
import.mp4
- Auto-Detection: The system attempts to identify the correct backend (e.g.,
llama.cppvsdiffusers) and applies native chat templates (e.g.,llama-3,mistral) automatically by reading the model metadata. - Customization during Import: You can override defaults immediately, for example, forcing a specific quantization on a GGUF file or selecting
vLLMovertransformers. - Multimodal Support: Vision components (
mmproj) are detected and configured automatically. - File Safety: We added a safeguard to prevent the deletion of model files (blobs) if they are shared by multiple model configurations.
🎨 Complete UI Overhaul
The web interface has been redesigned for better usability and clearer management.
index.mp4
- Onboarding Wizard: A guided flow helps first-time users import or install a model in under 30 seconds.
- Auto-Focus & Selection: The input field captures focus automatically, and a default model is loaded on startup so you don't start in a "no model selected" state.
- Tabular Management: Models and backends are now organized in a cleaner list view, making it easier to see what is installed.
manage.mp4
🤖 Agentic Ecosystem & MCP Live Streaming
LocalAI 3.8.0 significantly upgrades support for agentic workflows using the Model Context Protocol (MCP).
- Live Action Streaming: We have added a new endpoint to stream agent results as they happen. Instead of waiting for the final output, you can now watch the agent "think": seeing tool calls, reasoning steps, and intermediate actions streamed live in the UI.
mcp.mp4
Configuring MCP via the interface is now simplified:
mcp_configuration.mp4
🔁 Runtime System Settings
A new Settings > System panel exposes configuration options that previously required environment variables or a restart.
settings.mp4
- Immediate Effect: Toggling Watchdogs, P2P, and Gallery availability applies instantly.
- API Key Management: You can now generate, rotate, and expire API keys via the UI.
- Network: CORS and CSRF settings are now accessible here (note: these specific network settings still require a restart to take effect).
Note: In order to benefit from persisting runtime settings, in older LocalAI deployments it's necessary to mount the
/configurationdirectory from the container image.
⚙️ Advanced llama.cpp Configuration
For power users running large context windows or high-throughput setups, we've exposed additional underlying llama.cpp options in the YAML config. You can now tune context shifting, RAM limits for the KV cache, and parallel worker slots.
options:
- context_shift:false
- cache_ram:-1
- use_jinja:true
- parallel:2
- grpc_servers:localhost:50051,localhost:50052📊 Logprobs & Logitbias Support
This release adds full support for logitbias and logprobs. This is critical for advanced agentic logic, Self-RAG, and evaluating model confidence / hallucination rates. It supports the OpenAI specification.
🛠️ Fixes & Improvements
OpenAI Compatibility:
- SSE Streaming: Fixed a critical issue where streaming responses were slightly non-compliant (e.g., sending empty content chunks or missing
finish_reason). This resolves integration issues withopenai-node,LangChain, andLlamaIndex. - Top_N Behavior: In the reranker,
top_ncan now be omitted or set to0to return all results, rather than defaulting to an arbitrary limit.
General Fixes:
- Model Preview: When downloading, you can now see the actual filename and size before committing to the download.
- Tool Handling: Fixed crashes when tool content is missing or malformed.
- TTS: Fixed dropdown selection states for TTS models.
- Browser Storage: Chat history is now persisted in your browser's local storage. You can switch between parallel chats, rename them, and export them to JSON.
- True Cancellation: Clicking "Stop" during a stream now correctly propagates a cancellation context to the backend (works for
llama.cpp,vLLM,transformers, anddiffusers). This immediately stops generation and frees up resources.
🚀 The Complete Local Stack for Privacy-First AI
LocalAI |
The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required. |
LocalAGI |
Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI. |
LocalRecall |
RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI. |
❤️ Thank You
Over 35,000 stars and growing. LocalAI is a true FOSS movement — built by contributors, powered by community.
If you believe in privacy-first AI:
- ✅ Star the repo
- 💬 Contribute code, docs, or feedback
- 📣 Share with others
Your support keeps this stack alive.
✅ Full Changelog
📋 Click to expand full changelog
What's Changed
Bug fixes 🐛
- fix(reranker): respect
top_nin the request by @mkhludnev in #7025 - fix(chatterbox): pin numpy by @mudler in #7198
- fix(reranker): support omitting top_n by @mkhludnev in #7199
- fix(api): SSE streaming format to comply with specification by @Copilot in #7182
- fix(edit): propagate correctly opts when reloading by @mudler in #7233
- fix(reranker): llama-cpp sort score desc, crop top_n by @mkhludnev in #7211
- fix: handle tool errors by @mudler in #7271
- fix(reranker): tests and top_n check fix #7212 by @mkhludnev in #7284
- fix the tts model dropdown to show the currently selected model by @ErixM in #7306
- fix: do not delete files if used by other configured models by @mudler in #7235
- fix(llama.cpp): handle corner cases with tool content by @mudler in #7324
Exciting New Features 🎉
- feat(llama.cpp): allow to set cache-ram and ctx_shift by @mudler in #7009
- chore: show success toast when system prompt is updated by @shohidulbari in #7131
- feat(llama.cpp): consolidate options and respect tokenizer template when enabled by @mudler in #7120
- feat: respect context and add request cancellation by @mudler in #7187
- feat(ui): add wizard when p2p is disabled by @mudler in #7218
- feat(ui): chat stats, small visual enhancements by @mudler in #7223
- chore: display file names in model preview by @shohidulbari in #7251
- feat: import models via URI by @mudler in #7245
- chore(importers): small logic enhancements by @mudler in #7262
- feat(ui): allow to cancel ops by @mudler in #7264
- feat: migrate to echo and enable cancellation of non-streaming requests by @mudler in #7270
- feat(mcp): add LocalAI endpoint to stream live results of the agent by @mudler in #7274
- chore: do not use placeholder image by @mudler in #7279
- chore: guide the user to import models by @mudler in #7280
- chore(ui): import vendored libs by @mudler in #7281
- feat(importers): add transformers and vLLM by @mudler in #7278
- feat: restyle index by @mudler in #7282
- feat: add support to logitbias and logprobs by @mudler in #7283
- feat(ui): small refinements by @mudler in #7285
- feat(index): minor enhancements by @mudler in #7288
- chore: scroll in thinking mode, better buttons placement by @mudler in #7289
- chore: small ux enhancements by @mudler in #7290
- feat(ui): add backend reinstall button by @mudler in #7305
- feat(importer): unify importing code with CLI by @mudler in #7299
- feat(ui): runtime settings by @mudler in #7320
- feat(importers): Add diffuser backend importer with ginkgo tests and UI support by @Copilot in #7316
- feat(ui): add chat history by @mudler in #7325
- feat(inpainting): add inpainting endpoint, wire ImageGenerationFunc and return generated image URL by @gmaOCR in #7328
🧠 Models
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #6972
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #6982
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #6989
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7017
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #7024
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7039
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7040
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7068
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7077
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7127
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7133
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7162
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7205
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7216
- chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7237
- chore(model-gallery): ⬆️ update checksum by @localai-bot in #7248
📖 Documentation and examples
- feat: docs revamp by @mudler in #7313
- fix: Update Installer Options URL by @filipeaaoliveira in #7330
👒 Dependencies
- chore(deps): bump github.com/mudler/cogito from 0.4.0 to 0.5.0 by @dependabot[bot] in #7054
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.26.0 to 2.27.2 by @dependabot[bot] in #7056
- chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.0.0 to 1.1.0 by @dependabot[bot] in #7053
- chore(deps): bump github.com/valyala/fasthttp from 1.55.0 to 1.68.0 by @dependabot[bot] in #7057
- chore(deps): bump github.com/mudler/edgevpn from 0.31.0 to 0.31.1 by @dependabot[bot] in #7055
- chore(deps): bump github.com/containerd/containerd from 1.7.28 to 1.7.29 in the go_modules group across 1 directory by @dependabot[bot] in #7149
- chore(deps): bump appleboy/ssh-action from 1.2.2 to 1.2.3 by @dependabot[bot] in #7224
- chore(deps): bump github.com/mudler/cogito from 0.5.0 to 0.5.1 by @dependabot[bot] in #7226
- chore(deps): bump github.com/jaypipes/ghw from 0.19.1 to 0.20.0 by @dependabot[bot] in #7227
- chore(deps): bump github.com/docker/docker from 28.5.1+incompatible to 28.5.2+incompatible by @dependabot[bot] in #7228
- chore(deps): bump github.com/testcontainers/testcontainers-go from 0.38.0 to 0.40.0 by @dependabot[bot] in #7230
- chore(deps): bump github.com/ebitengine/purego from 0.9.0 to 0.9.1 by @dependabot[bot] in #7229
- chore(deps): bump fyne.io/fyne/v2 from 2.7.0 to 2.7.1 by @dependabot[bot] in #7293
- chore(deps): bump go.yaml.in/yaml/v2 from 2.4.2 to 2.4.3 by @dependabot[bot] in #7294
- chore(deps): bump github.com/alecthomas/kong from 1.12.1 to 1.13.0 by @dependabot[bot] in #7296
- chore(deps): bump google.golang.org/protobuf from 1.36.8 to 1.36.10 by @dependabot[bot] in #7295
- chore(deps): bump golang.org/x/crypto from 0.43.0 to 0.45.0 in the go_modules group across 1 directory by @dependabot[bot] in #7319
Other Changes
- docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #6996
- chore: ⬆️ Update ggml-org/whisper.cpp to
999a7e0cbf8484dc2cea1e9f855d6b39f34f7ae9by @localai-bot in #6997 - chore: ⬆️ Update ggml-org/llama.cpp to
2f68ce7cfd20e9e7098514bf730e5389b7bba908by @localai-bot in #6998 - chore: ⬆️ Update ggml-org/llama.cpp to
cd5e3b57541ecc52421130742f4d89acbcf77cd4by @localai-bot in #7023 - chore: display warning only when directory is present by @mudler in #7050
- chore: ⬆️ Update ggml-org/llama.cpp to
c5023daf607c578d6344c628eb7da18ac3d92d32by @localai-bot in #7069 - chore: ⬆️ Update ggml-org/llama.cpp to
ad51c0a720062a04349c779aae301ad65ca4c856by @localai-bot in #7098 - chore: ⬆️ Update ggml-org/llama.cpp to
a44d77126c911d105f7f800c17da21b2a5b112d1by @localai-bot in #7125 - chore: ⬆️ Update ggml-org/llama.cpp to
7f09a680af6e0ef612de81018e1d19c19b8651e8by @localai-bot in #7156 - chore: use air to live reload in dev environment by @shohidulbari in #7186
- chore: ⬆️ Update ggml-org/llama.cpp to
65156105069fa86a4a81b6cb0e8cb583f6420677by @localai-bot in #7184 - chore: ⬆️ Update ggml-org/llama.cpp to
333f2595a3e0e4c0abf233f2f29ef1710acd134dby @localai-bot in #7201 - chore: ⬆️ Update ggml-org/llama.cpp to
b8595b16e69e3029e06be3b8f6635f9812b2bc3fby @localai-bot in #7210 - chore: ⬆️ Update ggml-org/whisper.cpp to
a1867e0dad0b21b35afa43fc815dae60c9a139d6by @localai-bot in #7231 - chore: ⬆️ Update ggml-org/llama.cpp to
13730c183b9e1a32c09bf132b5367697d6c55048by @localai-bot in #7232 - chore: ⬆️ Update ggml-org/llama.cpp to
7d019cff744b73084b15ca81ba9916f3efab1223by @localai-bot in #7247 - feat(swagger): update swagger by @localai-bot in #7267
- chore: ⬆️ Update ggml-org/whisper.cpp to
d9b7613b34a343848af572cc14467fc5e82fc788by @localai-bot in #7268 - chore(deps): bump llama.cpp to
c4abcb2457217198efdd67d02675f5fddb7071c2by @mudler in #7266 - chore: ⬆️ Update ggml-org/llama.cpp to
9b17d74ab7d31cb7d15ee7eec1616c3d825a84c0by @localai-bot in #7273 - feat(swagger): update swagger by @localai-bot in #7276
- chore: ⬆️ Update ggml-org/llama.cpp to
662192e1dcd224bc25759aadd0190577524c6a66by @localai-bot in #7277 - feat(swagger): update swagger by @localai-bot in #7286
- chore: ⬆️ Update ggml-org/llama.cpp to
80deff3648b93727422461c41c7279ef1dac7452by @localai-bot in #7287 - chore(docs): improve documentation and split into sections bigger topics by @mudler in #7292
- chore: ⬆️ Update ggml-org/whisper.cpp to
b12abefa9be2abae39a73fa903322af135024a36by @localai-bot in #7300 - chore: ⬆️ Update ggml-org/llama.cpp to
cb623de3fc61011e5062522b4d05721a22f2e916by @localai-bot in #7301 - chore(deps): bump llama.cpp to '10e9780154365b191fb43ca4830659ef12def80f by @mudler in #7311
- chore: ⬆️ Update ggml-org/llama.cpp to
7d77f07325985c03a91fa371d0a68ef88a91ec7fby @localai-bot in #7314 - chore: ⬆️ Update ggml-org/whisper.cpp to
19ceec8eac980403b714d603e5ca31653cd42a3fby @localai-bot in #7321 - chore(docs): add documentation about import by @mudler in #7315
- chore: ⬆️ Update ggml-org/llama.cpp to
dd0f3219419b24740864b5343958a97e1b3e4b26by @localai-bot in #7322 - chore(chatterbox): bump l4t index to support more recent pytorch by @mudler in #7332
- chore: ⬆️ Update ggml-org/llama.cpp to
23bc779a6e58762ea892eca1801b2ea1b9050c00by @localai-bot in #7331 - Revert "chore(chatterbox): bump l4t index to support more recent pytorch" by @mudler in #7333
New Contributors
- @shohidulbari made their first contribution in #7131
- @mkhludnev made their first contribution in #7025
- @ErixM made their first contribution in #7306
- @filipeaaoliveira made their first contribution in #7330
Full Changelog: v3.7.0...v3.8.0
