github mudler/LocalAI v3.10.0

6 hours ago

🎉 LocalAI 3.10.0 Release! 🚀




LocalAI 3.10.0 is big on agent capabilities, multi-modal support, and cross-platform reliability.

We've added native Anthropic API support, launched a new Video Generation UI, introduced Open Responses API compatibility, and enhanced performance with a unified GPU backend system.

For a full tour, see below!


📌 TL;DR

Feature Summary
Anthropic API Support Fully compatible /v1/messages endpoint for seamless drop-in replacement of Claude.
Open Responses API Native support for stateful agents with tool calling, streaming, background mode, and multi-turn conversations, passing all official acceptance tests.
Video & Image Generation Suite New video gen UI + LTX-2 support for text-to-video and image-to-video.
Unified GPU Backends GPU libraries (CUDA, ROCm, Vulkan) packaged inside backend containers — works out of the box on Nvidia, AMD, and ARM64 (Experimental).
Tool Streaming & XML Parsing Full support for streaming tool calls and XML-formatted tool outputs.
System-Aware Backend Gallery Only see backends your system can run (e.g., hide MLX on Linux).
Crash Fixes Prevents crashes on AVX-only CPUs (Intel Sandy/Ivy Bridge) and fixes VRAM reporting on AMD GPUs.
Request Tracing Debug agents & fine-tuning with memory-based request/response logging.
Moonshine Backend Ultra-fast transcription engine for low-end devices.
Pocket-TTS Lightweight, high-fidelity text-to-speech with voice cloning.
Vulkan arm64 builds We now build backends and images for vulkan on arm64 as well

🚀 New Features & Major Enhancements

🤖 Open Responses API: Build Smarter, Autonomous Agents

LocalAI now supports the OpenAI Responses API, enabling powerful agentic workflows locally.

  • Stateful conversations via response_id — resume and manage long-running agent sessions.
  • Background mode: Run agents asynchronously and fetch results later.
  • Streaming support for tools, images, and audio.
  • Built-in tools: Web search, file search, and computer use (via MCP integrations).
  • Multi-turn interaction with dynamic context and tool use.

✅ Ideal for developers building agents that can browse, analyze files, or interact with systems — all on your local machine.

🔧 How to Use:

  • Set response_id in your request to maintain session state across calls.
  • Use background: true to run agents asynchronously.
  • Retrieve results via GET /api/v1/responses/{response_id}.
  • Enable streaming with stream: true to receive partial responses and tool calls in real time.

📌 Tip: Use response_id to build agent orchestration systems that persist context and avoid redundant computation.

Our support passes all the official acceptance tests:

Open Responses API support

🧠 Anthropic Messages API: Clone Claude Locally

LocalAI now fully supports the Anthropic messages API.

  • Use https://api.localai.host/v1/messages as a drop-in replacement for Claude.
  • Full tool/function calling support, just like OpenAI.
  • Streaming and non-streaming responses.
  • Compatible with anthropic-sdk-go, LangChain, and other tooling.

🔥 Perfect for teams migrating from Anthropic to local inference with full feature parity.


🎥 Video Generation: From Text to Video in the Web UI

  • New dedicated video generation page with intuitive controls.
  • LTX-2 is supported
  • Supports text-to-video and image-to-video workflows.
  • Built on top of diffusers with full compatibility.

📌 How to Use:

  • Go to /video in the web UI.
  • Enter a prompt (e.g., "A cat walking on a moonlit rooftop").
  • Optionally upload an image for image-to-video generation.
  • Adjust parameters like fps, num_frames, and guidance_scale.

⚙️ Unified GPU Backends: Acceleration Works Out of the Box

A major architectural upgrade: GPU libraries (CUDA, ROCm, Vulkan) are now packaged inside backend containers.

  • Single image: Now you don't need anymore to pull a specific image for your GPU. Any image works regardless if you have a GPU or not.
  • No more manual GPU driver setup — just run the image and get acceleration.
  • Works on Nvidia (CUDA), AMD (ROCm), and ARM64 (Vulkan).
  • Vulkan arm64 builds enabled
  • Reduced image complexity, faster builds, and consistent performance.

🚀 This means latest/master images now support GPU acceleration on all platforms — no extra config!

Note: this is experimental, please help us by filing an issue if something doesn't work!


🧩 Tool Streaming & Advanced Parsing

Enhance your agent workflows with richer tool interaction.

  • Streaming tool calls: Receive partial tool arguments in real time (e.g., input_json_delta).
  • XML-style tool call parsing: Models that return tools in XML format (<function>...</function>) are now properly parsed alongside text.
  • Works across all backends (llama.cpp, vLLM, diffusers, etc.).

💡 Enables more natural, real-time interaction with agents that use structured tool outputs.


🌐 System-Aware Backend Gallery: Only Compatible Backends Show

The backend gallery now shows only backends your system can run.

  • Auto-detects system capabilities (CPU, GPU, MLX, etc.).
  • Hides unsupported backends (e.g., MLX on Linux, CUDA on AMD).
  • Shows detected capabilities in the hero section.

🎤 New TTS Backends: Pocket-TTS

Add expressive voice generation to your apps with Pocket-TTS.

  • Real-time text-to-speech with voice cloning support (requires HF login).
  • Lightweight, fast, and open-source.
  • Available in the model gallery.

🗣️ Perfect for voice agents, narrators, or interactive assistants.
Note: Voice cloning requires HF authentication and a registered voice model.


🔍 Request Tracing: Debug Your Agents

Trace requests and responses in memory — great for fine-tuning and agent debugging.

  • Enable via runtime setting or API.
  • Log stored in memory, dropped after max size.
  • Fetch logs via GET /api/v1/trace.
  • Export to JSON for analysis.

🪄 New 'Reasoning' Field: Extract Thinking Steps

LocalAI now automatically detects and extracts thinking tags from model output.

  • Supports both SSE and non-SSE modes.
  • Displays reasoning steps in the chat UI (under "Thinking" tab).
  • Fixes issue where thinking content appeared as part of final answer.

🚀 Moonshine Backend: Faster Transcription for Low-End Devices

Add Moonshine, an ONNX-based transcription engine, for fast, lightweight speech-to-text.

  • Optimized for low-end devices (Raspberry Pi, older laptops).
  • One of the fastest transcription engines available.
  • Supports live transcription.

🛠️ Fixes & Stability Improvements

🔧 Prevent BMI2 Crashes on AVX-Only CPUs

Fixed crashes on older Intel CPUs (Ivy Bridge, Sandy Bridge) that lack BMI2 instructions.

  • Now safely falls back to llama-cpp-fallback (SSE2 only).
  • No more EOF errors during model warmup.

✅ Ensures LocalAI runs smoothly on older hardware.


📊 Fix Swapped VRAM Usage on AMD GPUs

Correctly parses rocm-smi output: used and total VRAM are now displayed correctly.

  • Fixes misreported memory usage on dual-Radeon setups.
  • Handles HIP_VISIBLE_DEVICES properly (e.g., when using only discrete GPU).

🚀 The Complete Local Stack for Privacy-First AI

LocalAI Logo

LocalAI

The free, Open Source OpenAI alternative. Drop-in replacement REST API compatible with OpenAI specifications for local AI inferencing. No GPU required.

Link: https://github.com/mudler/LocalAI

LocalAGI Logo

LocalAGI

Local AI agent management platform. Drop-in replacement for OpenAI's Responses API, supercharged with advanced agentic capabilities and a no-code UI.

Link: https://github.com/mudler/LocalAGI

LocalRecall Logo

LocalRecall

RESTful API and knowledge base management system providing persistent memory and storage capabilities for AI agents. Works alongside LocalAI and LocalAGI.

Link: https://github.com/mudler/LocalRecall


❤️ Thank You

LocalAI is a true FOSS movement — built by contributors, powered by community.

If you believe in privacy-first AI:

  • Star the repo
  • 💬 Contribute code, docs, or feedback
  • 📣 Share with others

Your support keeps this stack alive.


✅ Full Changelog

📋 Click to expand full changelog

What's Changed

Bug fixes 🐛

  • fix(ui): correctly parse import errors by @mudler in #7726
  • fix(cli): import via CLI needs system state by @mudler in #7746
  • fix(amd-gpu): correctly show total and used vram by @mudler in #7761
  • fix: add nil checks before mergo.Merge to prevent panic in gallery model installation by @majiayu000 in #7785
  • fix: Usage for image generation is incorrect (and causes error in LiteLLM) by @majiayu000 in #7786
  • fix: propagate validation errors by @majiayu000 in #7787
  • fix: Failed to download checksums.txt when using launch to install localai by @majiayu000 in #7788
  • fix(image-gen): fix scrolling issues by @mudler in #7829
  • fix(llama.cpp/mmproj): fix loading mmproj in nested sub-dirs different from model path by @mudler in #7832
  • fix: Prevent BMI2 instruction crash on AVX-only CPUs by @coffeerunhobby in #7817
  • fix: Highly inconsistent agent response to cogito agent calling MCP server - Body "Invalid http method" by @majiayu000 in #7790
  • fix(chat/ui): record model name in history for consistency by @mudler in #7845
  • fix(ui): fix 404 on API menu link by pointing to index.html by @DEVMANISHOFFL in #7878
  • fix: BMI2 crash on AVX-only CPUs (Intel Ivy Bridge/Sandy Bridge) by @coffeerunhobby in #7864
  • fix(model): do not assume success when deleting a model process by @jroeber in #7963
  • fix(functions): do not duplicate function when valid JSON is inside XML tags by @mudler in #8043

Exciting New Features 🎉

  • feat: disable force eviction by @mudler in #7725
  • feat(api): Allow tracing of requests and responses by @richiejp in #7609
  • feat(UI): image generation improvements by @mudler in #7804
  • feat(image-gen/UI): move controls to the left, make the page more compact by @mudler in #7823
  • feat(function): Add tool streaming, XML Tool Call Parsing Support by @mudler in #7865
  • chore: Update to Ubuntu24.04 (cont #7423) by @richiejp in #7769
  • feat: package GPU libraries inside backend containers for unified base image by @Copilot in #7891
  • feat(backends): add moonshine backend for faster transcription by @mudler in #7833
  • feat: enable Vulkan arm64 image builds by @Copilot in #7912
  • feat: Add Anthropic Messages API support by @Copilot in #7948
  • feat: add tool/function calling support to Anthropic Messages API by @Copilot in #7956
  • feat(api): support 'reasoning' api field by @mudler in #7959
  • feat: Filter backend gallery by system capabilities by @Copilot in #7950
  • feat(tts): add pocket-tts backend by @mudler in #8018
  • feat(diffusers): add support to LTX-2 by @mudler in #8019
  • feat(ui): add video gen UI by @mudler in #8020
  • feat(api): add support for open responses specification by @mudler in #8063

🧠 Models

  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7801
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7807
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7816
  • Fix(gallery): Updated checksums for qwen3-vl-30b instruct & thinking by @Nold360 in #7819
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #7821
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7826
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7831
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7840
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7903
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7916
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7922
  • chore(model gallery): 🤖 add 1 new models via gallery agent by @localai-bot in #7954
  • chore(model gallery): add qwen3-coder-30b-a3b-instruct based on model request by @rampa3 in #8082

📖 Documentation and examples

  • chore(AGENTS.md): Add section to help with building backends by @richiejp in #7871
  • [gallery] add JSON schema for gallery model specification by @DEVMANISHOFFL in #7890
  • chore(doc): put alert on install.sh until is fixed by @mudler in #8042

👒 Dependencies

  • chore(deps): bump securego/gosec from 2.22.9 to 2.22.11 by @dependabot[bot] in #7774
  • chore(deps): bump google.golang.org/grpc from 1.77.0 to 1.78.0 by @dependabot[bot] in #7777
  • chore(deps): bump github.com/schollz/progressbar/v3 from 3.18.0 to 3.19.0 by @dependabot[bot] in #7775
  • chore(deps): bump github.com/modelcontextprotocol/go-sdk from 1.1.0 to 1.2.0 by @dependabot[bot] in #7776
  • chore(deps): bump dependabot/fetch-metadata from 2.4.0 to 2.5.0 by @dependabot[bot] in #7876
  • chore(deps): bump github.com/labstack/echo/v4 from 4.14.0 to 4.15.0 by @dependabot[bot] in #7875
  • chore(deps): bump protobuf from 6.33.2 to 6.33.4 in /backend/python/transformers by @dependabot[bot] in #7993
  • chore(deps): bump github.com/mudler/go-processmanager from 0.0.0-20240820160718-8b802d3ecf82 to 0.1.0 by @dependabot[bot] in #7992
  • chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 by @dependabot[bot] in #8000
  • chore(deps): bump github.com/gpustack/gguf-parser-go from 0.22.1 to 0.23.1 by @dependabot[bot] in #8001
  • chore(deps): bump fyne.io/fyne/v2 from 2.7.1 to 2.7.2 by @dependabot[bot] in #8003
  • chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 by @dependabot[bot] in #8004
  • chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory by @dependabot[bot] in #8066

Other Changes

  • docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #7716
  • chore: ⬆️ Update ggml-org/whisper.cpp to 6114e692136bea917dc88a5eb2e532c3d133d963 by @localai-bot in #7717
  • chore: ⬆️ Update ggml-org/llama.cpp to c18428423018ed214c004e6ecaedb0cbdda06805 by @localai-bot in #7718
  • chore: ⬆️ Update ggml-org/llama.cpp to 85c40c9b02941ebf1add1469af75f1796d513ef4 by @localai-bot in #7731
  • chore: ⬆️ Update ggml-org/llama.cpp to 7ac8902133da6eb390c4d8368a7d252279123942 by @localai-bot in #7740
  • chore: ⬆️ Update ggml-org/llama.cpp to a4bf35889eda36d3597cd0f8f333f5b8a2fcaefc by @localai-bot in #7751
  • chore: ⬆️ Update ggml-org/llama.cpp to 4ffc47cb2001e7d523f9ff525335bbe34b1a2858 by @localai-bot in #7760
  • chore(ci): be more precise when detecting existing models by @mudler in #7767
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 4ff2c8c74bd17c2cfffe3a01be77743fb3efba2f by @richiejp in #7771
  • chore: ⬆️ Update ggml-org/llama.cpp to c9a3b40d6578f2381a1373d10249403d58c3c5bd by @localai-bot in #7778
  • Revert "chore(deps): bump securego/gosec from 2.22.9 to 2.22.11" by @mudler in #7789
  • feat(swagger): update swagger by @localai-bot in #7794
  • chore: ⬆️ Update ggml-org/llama.cpp to 0f89d2ecf14270f45f43c442e90ae433fd82dab1 by @localai-bot in #7795
  • chore: ⬆️ Update ggml-org/whisper.cpp to e9898ddfb908ffaa7026c66852a023889a5a7202 by @localai-bot in #7810
  • chore: ⬆️ Update ggml-org/llama.cpp to 13814eb370d2f0b70e1830cc577b6155b17aee47 by @localai-bot in #7809
  • feat(swagger): update swagger by @localai-bot in #7820
  • chore: ⬆️ Update ggml-org/llama.cpp to ced765be44ce173c374f295b3c6f4175f8fd109b by @localai-bot in #7822
  • chore: ⬆️ Update ggml-org/llama.cpp to 706e3f93a60109a40f1224eaf4af0d59caa7c3ae by @localai-bot in #7836
  • feat(swagger): update swagger by @localai-bot in #7847
  • chore: ⬆️ Update ggml-org/llama.cpp to e57f52334b2e8436a94f7e332462dfc63a08f995 by @localai-bot in #7848
  • chore(Makefile): refactor common make targets by @mudler in #7858
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to b90b1ee9cf84ea48b478c674dd2ec6a33fd504d6 by @localai-bot in #7862
  • chore: ⬆️ Update ggml-org/llama.cpp to 4974bf53cf14073c7b66e1151348156aabd42cb8 by @localai-bot in #7861
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to c5602a676caff5fe5a9f3b76b2bc614faf5121a5 by @localai-bot in #7880
  • chore: ⬆️ Update ggml-org/whisper.cpp to 679bdb53dbcbfb3e42685f50c7ff367949fd4d48 by @localai-bot in #7879
  • chore: ⬆️ Update ggml-org/llama.cpp to e443fbcfa51a8a27b15f949397ab94b5e87b2450 by @localai-bot in #7881
  • chore(image-ui): simplify interface by @mudler in #7882
  • chore(llama.cpp/flags): simplify conditionals by @mudler in #7887
  • chore: ⬆️ Update ggml-org/llama.cpp to ccbc84a5374bab7a01f68b129411772ddd8e7c79 by @localai-bot in #7894
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 9be0b91927dfa4007d053df72dea7302990226bb by @localai-bot in #7895
  • chore(dockerfile): drop driver-requirements section by @mudler in #7907
  • chore(detection): detect GPU vendor from files present in the system by @mudler in #7908
  • chore(ci): restore building of GPU vendor images by @mudler in #7910
  • chore(Dockerfile): restore GPU vendor specific sections by @mudler in #7911
  • fix(intel): Add ARG for Ubuntu codename in Dockerfile by @mudler in #7917
  • chore: ⬆️ Update ggml-org/llama.cpp to ae9f8df77882716b1702df2bed8919499e64cc28 by @localai-bot in #7915
  • chore(ci): use latest jetpack image for l4t by @mudler in #7926
  • chore(l4t-12): do not use python 3.12 (wheels are only for 3.10) by @mudler in #7928
  • chore(docs): Add Crush and VoxInput to the integrations by @richiejp in #7924
  • Optimize GPU library copying to preserve symlinks and avoid duplicates by @Copilot in #7931
  • chore(uv): add --index-strategy=unsafe-first-match to l4t by @mudler in #7934
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 0e52afc6513cc2dea9a1a017afc4a008d5acf2b0 by @localai-bot in #7930
  • chore(ci): roll back l4t-cuda12 configurations by @mudler in #7935
  • Revert "chore(uv): add --index-strategy=unsafe-first-match to l4t" by @mudler in #7936
  • chore(deps): Bump llama.cpp to '480160d47297df43b43746294963476fc0a6e10f' by @mudler in #7933
  • chore(llama.cpp): propagate errors during model load by @mudler in #7937
  • chore: ⬆️ Update ggml-org/llama.cpp to 593da7fa49503b68f9f01700be9f508f1e528992 by @localai-bot in #7946
  • feat(swagger): update swagger by @localai-bot in #7964
  • chore: ⬆️ Update ggml-org/llama.cpp to b1377188784f9aea26b8abde56d4aee8c733eec7 by @localai-bot in #7965
  • fix(l4t-12): use pip to install python deps by @mudler in #7967
  • chore: ⬆️ Update ggml-org/llama.cpp to 0c3b7a9efebc73d206421c99b7eb6b6716231322 by @localai-bot in #7978
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 885e62ea822e674c6837a8225d2d75f021b97a6a by @localai-bot in #7979
  • chore(backends): do not bundle cuda target directory by @mudler in #7982
  • chore(vulkan): bump vulkan-sdk to 1.4.335.0 by @mudler in #7981
  • chore: ⬆️ Update ggml-org/llama.cpp to bcf7546160982f56bc290d2e538544bbc0772f63 by @localai-bot in #7991
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 7010bb4dff7bd55b03d35ef9772142c21699eba9 by @localai-bot in #8013
  • chore: ⬆️ Update ggml-org/whisper.cpp to a96310871a3b294f026c3bcad4e715d17b5905fe by @localai-bot in #8014
  • chore: ⬆️ Update ggml-org/llama.cpp to e4832e3ae4d58ac0ecbdbf4ae055424d6e628c9f by @localai-bot in #8015
  • chore: ⬆️ Update ggml-org/whisper.cpp to 47af2fb70f7e4ee1ba40c8bed513760fdfe7a704 by @localai-bot in #8039
  • chore: ⬆️ Update ggml-org/llama.cpp to d98b548120eecf98f0f6eaa1ba7e29b3afda9f2e by @localai-bot in #8040
  • fix: reduce log verbosity for /api/operations polling by @Divyanshupandey007 in #8050
  • chore: ⬆️ Update ggml-org/whisper.cpp to 2eeeba56e9edd762b4b38467bab96c2517163158 by @localai-bot in #8052
  • chore: ⬆️ Update ggml-org/llama.cpp to 785a71008573e2d84728fb0ba9e851d72d3f8fab by @localai-bot in #8053
  • fix(ci): use more beefy runner for expensive jobs by @mudler in #8065
  • Revert "chore(deps): bump torch from 2.3.1+cxx11.abi to 2.8.0 in /backend/python/rerankers in the pip group across 1 directory" by @mudler in #8072
  • chore: ⬆️ Update ggml-org/llama.cpp to 388ce822415f24c60fcf164a321455f1e008cafb by @localai-bot in #8073
  • chore: ⬆️ Update ggml-org/whisper.cpp to f53dc74843e97f19f94a79241357f74ad5b691a6 by @localai-bot in #8074
  • chore(ui): add video generation link by @mudler in #8079
  • chore: ⬆️ Update ggml-org/llama.cpp to 2fbde785bc106ae1c4102b0e82b9b41d9c466579 by @localai-bot in #8087
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 9565c7f6bd5fcff124c589147b2621244f2c4aa1 by @localai-bot in #8086

New Contributors

Full Changelog: v3.9.0...v3.10.0

Don't miss a new LocalAI release

NewReleases is sending notifications on new releases.