github mudler/LocalAI v2.26.0

6 days ago


🦙 LocalAI v2.26.0!

Hey everyone - very excited about this release!

It contains several cleanups, performance improvements and few breaking changes: old backends that are now superseded have been removed (for example, vall-e-x), while new backends have been added to expand the range of model architectures that LocalAI can support. While most of the changes are tested, if you encounter issues with the new backends or migrated ones please file a new issue.

We also now have support for Nvidia L4T devices (for example, Nvidia AGX Orin) with specific container images. See the documentation for more details.

⚠️ Breaking Changes ⚠️

  • Several backends have been dropped and replaced for improved performance and compatibility.
  • Vall-e-x and Openvoice were deprecated and dropped.
  • The stablediffusion-NCN backend was replaced with the stablediffusion-ggml implementation.
  • Deprecated llama-ggml backend has been dropped in favor of GGUF support.
Check all details!

Backends that were dropped:

  • Vall-e-x and Openvoice: These projects went silent, and there are better alternatives now. They have been completely superseded by the CoquiTTS community fork, Kokoro, and OutelTTS.
  • Stablediffusion-NCN: This was the first variant shipped with LocalAI based on the ONNX runtime. It has now been superseded by the stablediffusion-ggml backend, which offers similar capabilities and wider support across more architectures.
  • Llama-ggml backend: This was the pre-GGUF backend, which is now deprecated. Moving forward, LocalAI will support only GGUF models.

Notable Backend Changes:

  • Mamba has moved to the transformers backend.
  • Transformers-Musicgen has moved to the transformers backend.
  • Sentencetransformers has moved to the transformers backend.

While LocalAI will try to alias to the transformers backend automatically when using these backends, there might be incompatibilies with your configuration files. Please open an issue if you face any problem!

New Backends:

  • Kokoro (TTS): A new backend for text-to-speech.
  • OuteTTS: A TTS backend with voice cloning capabilities.
  • Fast-Whisper: A backend designed for faster whisper model inference.

New Features 🎉

  • Lazy grammars (llama.cpp): Added grammar triggers for llama.cpp: this allow models trained with specific tokens to enable grammar generation when such tokens are seen: this allows precise JSON generation but also consistent output when the model does not need to answer with a tool. For example, in the config file of the model triggers can be specified as such:
  function:
    grammar:
      triggers:
        word: "<tool_call>"
        at_start: true
  • Function Argument Parsing Using Named Regex: A new feature that allows parsing function arguments with named regular expressions, simplifying function calls.
  • Support for New Backends: Added Kokoro, OutelTTS, and Fast-Whisper backends.
  • Diffusers Update: Added support for Sana pipelines and image generation option overrides.
  • Machine Tag and Inference Timing: Allows tracking machine performance during inference.
  • Tokenization: Introduced tokenization support for llama.cpp to improve text processing.
  • AVX512: There is now bundled support for CPUs supporting AVX512 instruction set
  • Nvidia L4T: Support for Nvidia devices on arm64, for example Nvidia AGX Orin and alikes. See the documentation. TLDR; You can start container images ready to go with:
docker run -e DEBUG=true \
                    -p 8080:8080 \
                    -v $PWD/models:/build/models  \
                   -ti --restart=always --name local-ai \
                   --runtime nvidia --gpus all quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core

Bug Fixes 🐛

  • Multiple fixes to improve stability, including enabling SYCL support for stablediffusion-ggml and consistent OpenAI stop reason returns.
  • Improved context shift handling for llama.cpp and fixed gallery store overrides.

🧠 Models:



I've fine-tuned a family of models based on o1-cot and function call datasets to work closely with all LocalAI features regarding function calling. The models are tailored to be conversational and execute function calls:

Enjoy! All the models are available in the LocalAI gallery:

local-ai run LocalAI-functioncall-phi-4-v0.3
local-ai run LocalAI-functioncall-llama3.2-1b-v0.4
local-ai run LocalAI-functioncall-llama3.2-3b-v0.5
local-ai run localai-functioncall-qwen2.5-7b-v0.5

Other models

Numerous model updates and additions:

  • New models like nightwing3-10b, rombos-qwen2.5-writer, and negative_llama_70b.
  • Updated checksum for model galleries.
  • Added icons and improved prompt templates for various models.
  • Expanded model gallery with new additions like DeepSeek-R1, Mistral-small-24b, and more.

Full changelog 👇

👉 Click to expand 👈

Breaking Changes 🛠

  • chore(vall-e-x): Drop backend by @mudler in #4619
  • feat(transformers): merge musicgen functionalities to a single backend by @mudler in #4620
  • feat(transformers): merge sentencetransformers backend by @mudler in #4624
  • chore(stablediffusion-ncn): drop in favor of ggml implementation by @mudler in #4652
  • feat(transformers): add support to Mamba by @mudler in #4669
  • chore(openvoice): drop backend by @mudler in #4673
  • chore: drop embedded models by @mudler in #4715
  • chore(llama-ggml): drop deprecated backend by @mudler in #4775
  • fix(llama.cpp): disable mirostat as default by @mudler in #2911

Bug fixes 🐛

  • fix(stablediffusion-ggml): correctly enable sycl by @mudler in #4591
  • fix(stablediffusion-ggml): enable oneapi before build by @mudler in #4593
  • fix(docs): add missing -core suffix to sycl images by @M0Rf30 in #4630
  • fix(stores): Stores fixes and testing by @richiejp in #4663
  • fix(gallery): do not return overrides and additional config by @mudler in #4768
  • fix(openai): consistently return stop reason by @mudler in #4771
  • fix(llama.cpp): improve context shift handling by @mudler in #4820

Exciting New Features 🎉

🧠 Models

  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #4580
  • chore(model gallery): add nightwing3-10b-v0.1 by @mudler in #4582
  • chore(model gallery): add qwq-32b-preview-ideawhiz-v1 by @mudler in #4583
  • chore(model gallery): add rombos-qwen2.5-writer-32b by @mudler in #4584
  • chore(model gallery): add sky-t1-32b-preview by @mudler in #4585
  • chore(model gallery): add negative_llama_70b by @mudler in #4586
  • chore(model gallery): add finemath-llama-3b by @mudler in #4587
  • chore(model gallery): add LocalAI-functioncall-phi-4-v0.1 by @mudler in #4588
  • chore(model gallery): add LocalAI-functioncall-phi-4-v0.2 by @mudler in #4589
  • chore(model gallery): add LocalAI-functioncall-phi-4-v0.3 by @mudler in #4599
  • chore(model gallery): add negative-anubis-70b-v1 by @mudler in #4600
  • chore(model gallery): add qwen2.5-72b-rp-ink by @mudler in #4601
  • chore(model gallery): add steiner-32b-preview by @mudler in #4602
  • chore(model gallery): add qwerus-7b by @mudler in #4609
  • chore(model gallery): add l3.3-ms-nevoria-70b by @mudler in #4610
  • chore(model gallery): add lb-reranker-0.5b-v1.0 by @mudler in #4611
  • chore(model gallery): add uwu-7b-instruct by @mudler in #4613
  • chore(model gallery): add drt-o1-14b by @mudler in #4614
  • chore(model gallery): add vikhr-qwen-2.5-1.5b-instruct by @mudler in #4615
  • chore: remove deprecated tinydream backend by @M0Rf30 in #4631
  • chore(model gallery): add MiniCPM-V-2.6-8b-q4_K_M by @M0Rf30 in #4633
  • chore(model gallery): add InternLM3-8b-Q4_K_M by @M0Rf30 in #4637
  • fix(model gallery): minicpm-v-2.6 is based on qwen2 by @M0Rf30 in #4638
  • chore(model gallery): update icons and add missing ones by @M0Rf30 in #4639
  • chore(model gallery): add wayfarer-12b by @mudler in #4641
  • chore(model gallery): add l3.3-70b-magnum-v4-se by @mudler in #4642
  • chore(model gallery): add l3.3-prikol-70b-v0.2 by @mudler in #4643
  • chore(model gallery): remove dead icons and update LLAVA and DeepSeek ones by @M0Rf30 in #4645
  • chore(model gallery): add sd-3.5-large-ggml by @mudler in #4647
  • chore(model gallery): add Deepseek-R1-Distill models by @M0Rf30 in #4646
  • chore(model gallery): add deepseek-r1-distill-qwen-7b by @mudler in #4660
  • chore(model gallery): add sd-1.5-ggml and sd-3.5-medium-ggml by @mudler in #4664
  • chore(model gallery): add MiniCPM-o-2.6-7.6b by @M0Rf30 in #4676
  • chore(model gallery): add DeepSeek R1 14b, 32b and 70b by @M0Rf30 in #4679
  • chore(model gallery): add flux.1, stablediffusion and whisper icons by @M0Rf30 in #4680
  • chore(model gallery): update deepseek-r1 prompt template by @M0Rf30 in #4686
  • chore(model gallery): add lamarck-14b-v0.7 by @mudler in #4687
  • chore(model gallery): add art-v0-3b by @mudler in #4688
  • chore(model gallery): add chuluun-qwen2.5-72b-v0.08 by @mudler in #4689
  • chore(model gallery): add l3.3-nevoria-r1-70b by @mudler in #4691
  • chore(model gallery): add dumpling-qwen2.5-32b by @mudler in #4692
  • chore(model gallery): add deepseek-r1-qwen-2.5-32b-ablated by @mudler in #4693
  • chore(model gallery): add confucius-o1-14b by @mudler in #4696
  • chore(model gallery): add fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1 by @mudler in #4697
  • chore(model gallery): add fuseo1-deepseekr1-qwen2.5-instruct-32b-preview by @mudler in #4698
  • chore(model gallery): add fuseo1-deepseekr1-qwq-32b-preview by @mudler in #4699
  • chore(model gallery): add specific message templates for llama3.2 based models by @mKenfenheuer in #4707
  • chore(model gallery): add virtuoso-lite by @mudler in #4718
  • chore(model gallery): add selene-1-mini-llama-3.1-8b by @mudler in #4719
  • chore(model gallery): add openthinker-7b by @mudler in #4720
  • chore(model-gallery): ⬆️ update checksum by @localai-bot in #4723
  • chore(model gallery): add mistral-small-24b-instruct-2501 by @mudler in #4725
  • chore(model gallery): add tinyswallow-1.5b-instruct by @mudler in #4726
  • chore(model gallery): add taid-llm-1.5b by @mudler in #4727
  • chore(model gallery): add fuseo1-deekseekr1-qwq-skyt1-32b-preview by @mudler in #4731
  • chore(model gallery): add steelskull_l3.3-damascus-r1 by @mudler in #4737
  • chore(model gallery): add thedrummer_gemmasutra-pro-27b-v1.1 by @mudler in #4738
  • chore(model gallery): add uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b by @mudler in #4739
  • chore(model gallery): add LocalAI-functioncall-llama3.2-1b-v0.4 by @mudler in #4740
  • chore(model gallery): add fblgit_miniclaus-qw1.5b-unamgs-grpo by @mudler in #4758
  • chore(model gallery): add nohobby_l3.3-prikol-70b-v0.4 by @mudler in #4759
  • chore(model gallery): add suayptalha_maestro-10b by @mudler in #4760
  • chore(model gallery): add agi-0_art-skynet-3b by @mudler in #4763
  • chore(model gallery): add rubenroy_gilgamesh-72b by @mudler in #4764
  • chore(model gallery): add krutrim-ai-labs_krutrim-2-instruct by @mudler in #4765
  • chore(model gallery): add LocalAI-functioncall-llama3.2-3b-v0.5 by @mudler in #4766
  • chore(model gallery): add arliai_llama-3.3-70b-arliai-rpmax-v1.4 by @mudler in #4772
  • chore(model gallery): add tiger-lab_qwen2.5-32b-instruct-cft by @mudler in #4773
  • chore(model gallery): add black-ink-guild_pernicious_prophecy_70b by @mudler in #4774
  • chore(model gallery): add nohobby_l3.3-prikol-70b-v0.5 by @mudler in #4777
  • chore(model gallery): add cognitivecomputations_dolphin3.0-r1-mistral-24b by @mudler in #4778
  • chore(model gallery): add cognitivecomputations_dolphin3.0-mistral-24b by @mudler in #4779
  • chore(model gallery): add sicariussicariistuff_redemption_wind_24b by @mudler in #4781
  • chore(model gallery): add huihui-ai_deepseek-r1-distill-llama-70b-abliterated by @mudler in #4790
  • chore(model gallery): add subtleone_qwen2.5-32b-erudite-writer by @mudler in #4792
  • chore(model gallery): add ilsp_llama-krikri-8b-instruct by @mudler in #4795
  • feat: Centralized Request Processing middleware by @dave-gray101 in #3847
  • chore(model gallery): add localai-functioncall-qwen2.5-7b-v0.5 by @mudler in #4796
  • chore(model gallery): add agentica-org_deepscaler-1.5b-preview by @mudler in #4804
  • chore(model gallery): add simplescaling_s1.1-32b by @mudler in #4812
  • chore(model gallery): add theskullery_l3.3-exp-unnamed-model-70b-v0.5 by @mudler in #4813
  • chore(model gallery): add nvidia_aceinstruct-1.5b by @mudler in #4819
  • chore(model gallery): add nvidia_aceinstruct-7b by @mudler in #4821
  • chore(model gallery): add nvidia_aceinstruct-72b by @mudler in #4822
  • chore(model gallery): add sicariussicariistuff_phi-lthy4 by @mudler in #4826
  • chore(model gallery): add open-thoughts_openthinker-32b by @mudler in #4827
  • chore(model gallery): add nousresearch_deephermes-3-llama-3-8b-preview by @mudler in #4828
  • chore(model gallery): add rombo-org_rombo-llm-v3.0-qwen-32b by @mudler in #4830
  • chore(model gallery): add pygmalionai_eleusis-12b by @mudler in #4832
  • chore(model gallery): add davidbrowne17_llamathink-8b-instruct by @mudler in #4833

📖 Documentation and examples

👒 Dependencies

  • chore: ⬆️ Update ggerganov/llama.cpp to c05e8c9934f94fde49bc1bc9dc51eed282605150 by @localai-bot in #4579
  • chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e… by @mudler in #4592
  • chore(deps): Bump securego/gosec from 2.21.4 to 2.22.0 by @dependabot in #4594
  • chore: ⬆️ Update ggerganov/llama.cpp to 504af20ee4eae72080a56d59d744f6774f7901ce by @localai-bot in #4597
  • chore: ⬆️ Update ggerganov/llama.cpp to b4d92a59a20eea400d8dd30844a339b76210daa0 by @localai-bot in #4606
  • chore: ⬆️ Update ggerganov/llama.cpp to adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5 by @localai-bot in #4612
  • chore: ⬆️ Update ggerganov/llama.cpp to 4dbc8b9cb71876e005724f4e8f73a3544646bcf5 by @localai-bot in #4618
  • chore(deps): Bump scipy from 1.14.0 to 1.15.1 in /backend/python/transformers by @dependabot in #4621
  • chore(llama.cpp): update dependency by @mudler in #4628
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to 5eb15ef4d022bef4a391de4f5f6556e81fbb5024 by @localai-bot in #4636
  • chore: ⬆️ Update ggerganov/llama.cpp to a1649cc13f89946322358f92ea268ae1b7b5096c by @localai-bot in #4635
  • chore: ⬆️ Update ggerganov/llama.cpp to 92bc493917d43b83e592349e138b54c90b1c3ea7 by @localai-bot in #4640
  • chore(deps): Bump docs/themes/hugo-theme-relearn from 80e448e to 8dad5ee by @dependabot in #4656
  • chore: ⬆️ Update ggerganov/llama.cpp to aea8ddd5165d525a449e2fc3839db77a71f4a318 by @localai-bot in #4657
  • chore: ⬆️ Update ggerganov/llama.cpp to 6171c9d25820ccf676b243c172868819d882848f by @localai-bot in #4661
  • chore: ⬆️ Update ggerganov/llama.cpp to 6152129d05870cb38162c422c6ba80434e021e9f by @localai-bot in #4668
  • chore(parler-tts): drop backend by @mudler in #4672
  • chore: ⬆️ Update ggerganov/llama.cpp to c5d9effb49649db80a52caf5c0626de6f342f526 by @localai-bot in #4685
  • chore: ⬆️ Update ggerganov/llama.cpp to 26771a1491f3a4c3d5b99c4c267b81aca9a7dfa0 by @localai-bot in #4690
  • chore: ⬆️ Update ggerganov/llama.cpp to 178a7eb952d211b8d4232d5e50ae1b64519172a9 by @localai-bot in #4694
  • chore(deps): Bump sentence-transformers from 3.3.1 to 3.4.0 in /backend/python/transformers by @dependabot in #4702
  • chore(deps): Bump docs/themes/hugo-theme-relearn from 8dad5ee to 5bcb9fe by @dependabot in #4704
  • chore: ⬆️ Update ggerganov/llama.cpp to a4417ddda98fd0558fb4d802253e68a933704b59 by @localai-bot in #4705
  • chore(deps): Bump dependabot/fetch-metadata from 2.2.0 to 2.3.0 by @dependabot in #4701
  • chore: ⬆️ Update ggerganov/llama.cpp to cae9fb4361138b937464524eed907328731b81f6 by @localai-bot in #4711
  • chore: ⬆️ Update ggerganov/llama.cpp to eb7cf15a808d4d7a71eef89cc6a9b96fe82989dc by @localai-bot in #4717
  • chore: ⬆️ Update ggerganov/llama.cpp to 8b576b6c55bc4e6be898b47522f0ef402b93ef62 by @localai-bot in #4722
  • chore: ⬆️ Update ggerganov/llama.cpp to aa6fb1321333fae8853d0cdc26bcb5d438e650a1 by @localai-bot in #4728
  • chore: ⬆️ Update ggerganov/llama.cpp to 53debe6f3c9cca87e9520a83ee8c14d88977afa4 by @localai-bot in #4732
  • chore: ⬆️ Update ggerganov/llama.cpp to 90f9b88afb6447d3929843a2aa98c0f11074762d by @localai-bot in #4736
  • chore(deps): Bump GrantBirki/git-diff-action from 2.7.0 to 2.8.0 by @dependabot in #4746
  • chore: ⬆️ Update ggerganov/llama.cpp to 5598f475be3e31430fbe17ebb85654ec90dc201e by @localai-bot in #4757
  • chore(deps): Bump sentence-transformers from 3.4.0 to 3.4.1 in /backend/python/transformers by @dependabot in #4748
  • chore(deps): Bump docs/themes/hugo-theme-relearn from 5bcb9fe to 66bc366 by @dependabot in #4750
  • chore: ⬆️ Update ggerganov/llama.cpp to 3ec9fd4b77b6aca03a3c2bf678eae3f9517d6904 by @localai-bot in #4762
  • chore: ⬆️ Update leejet/stable-diffusion.cpp to d46ed5e184b97c2018dc2e8105925bdb8775e02c by @localai-bot in #4769
  • chore: ⬆️ Update ggerganov/llama.cpp to d774ab3acc4fee41fbed6dbfc192b57d5f79f34b by @localai-bot in #4770
  • chore: ⬆️ Update ggerganov/llama.cpp to 8a59053f63fffc24e730cd3ea067760abfe4a919 by @localai-bot in #4776
  • chore: ⬆️ Update ggerganov/llama.cpp to d2fe216fb2fb7ca8627618c9ea3a2e7886325780 by @localai-bot in #4780
  • chore: ⬆️ Update ggerganov/llama.cpp to e6e658319952f7ad269dc11275b9edddc721fc6d by @localai-bot in #4787
  • chore: ⬆️ Update ggerganov/llama.cpp to 19d3c8293b1f61acbe2dab1d49a17950fd788a4a by @localai-bot in #4793
  • chore(deps): Bump docs/themes/lotusdocs from f5785a2 to 975da91 by @dependabot in #4801
  • chore: ⬆️ Update ggerganov/llama.cpp to 19b392d58dc08c366d0b29bd3b9c6991fa4e1662 by @localai-bot in #4803
  • chore: ⬆️ Update ggerganov/llama.cpp to 90e4dba461b07e635fd1daf3b491c978c7dd0013 by @localai-bot in #4810
  • chore: ⬆️ Update ggerganov/llama.cpp to 0fb77f821f6e70ad8b8247a97d1022f0fef78991 by @localai-bot in #4814
  • chore: ⬆️ Update ggerganov/llama.cpp to 8a8c4ceb6050bd9392609114ca56ae6d26f5b8f5 by @localai-bot in #4825
  • chore: ⬆️ Update ggerganov/llama.cpp to 300907b2110cc17b4337334dc397e05de2d8f5e0 by @localai-bot in #4829

Other Changes

New Contributors

Full Changelog: v2.25.0...v2.26.0

Don't miss a new LocalAI release

NewReleases is sending notifications on new releases.