🦙 LocalAI v2.26.0!

Hey everyone - very excited about this release!

It contains several cleanups, performance improvements and few breaking changes: old backends that are now superseded have been removed (for example, vall-e-x), while new backends have been added to expand the range of model architectures that LocalAI can support. While most of the changes are tested, if you encounter issues with the new backends or migrated ones please file a new issue.

We also now have support for Nvidia L4T devices (for example, Nvidia AGX Orin) with specific container images. See the documentation for more details.

⚠️ Breaking Changes ⚠️

Several backends have been dropped and replaced for improved performance and compatibility.
Vall-e-x and Openvoice were deprecated and dropped.
The stablediffusion-NCN backend was replaced with the stablediffusion-ggml implementation.
Deprecated llama-ggml backend has been dropped in favor of GGUF support.

Check all details!

Backends that were dropped:

Vall-e-x and Openvoice: These projects went silent, and there are better alternatives now. They have been completely superseded by the CoquiTTS community fork, Kokoro, and OutelTTS.
Stablediffusion-NCN: This was the first variant shipped with LocalAI based on the ONNX runtime. It has now been superseded by the stablediffusion-ggml backend, which offers similar capabilities and wider support across more architectures.
Llama-ggml backend: This was the pre-GGUF backend, which is now deprecated. Moving forward, LocalAI will support only GGUF models.

Notable Backend Changes:

Mamba has moved to the transformers backend.
Transformers-Musicgen has moved to the transformers backend.
Sentencetransformers has moved to the transformers backend.

While LocalAI will try to alias to the transformers backend automatically when using these backends, there might be incompatibilies with your configuration files. Please open an issue if you face any problem!

New Backends:

Kokoro (TTS): A new backend for text-to-speech.
OuteTTS: A TTS backend with voice cloning capabilities.
Fast-Whisper: A backend designed for faster whisper model inference.

New Features 🎉

Lazy grammars (llama.cpp): Added grammar triggers for llama.cpp: this allow models trained with specific tokens to enable grammar generation when such tokens are seen: this allows precise JSON generation but also consistent output when the model does not need to answer with a tool. For example, in the config file of the model triggers can be specified as such:

  function:
    grammar:
      triggers:
        word: "<tool_call>"
        at_start: true

Function Argument Parsing Using Named Regex: A new feature that allows parsing function arguments with named regular expressions, simplifying function calls.
Support for New Backends: Added Kokoro, OutelTTS, and Fast-Whisper backends.
Diffusers Update: Added support for Sana pipelines and image generation option overrides.
Machine Tag and Inference Timing: Allows tracking machine performance during inference.
Tokenization: Introduced tokenization support for llama.cpp to improve text processing.
AVX512: There is now bundled support for CPUs supporting AVX512 instruction set
Nvidia L4T: Support for Nvidia devices on arm64, for example Nvidia AGX Orin and alikes. See the documentation. TLDR; You can start container images ready to go with:

docker run -e DEBUG=true \
                    -p 8080:8080 \
                    -v $PWD/models:/build/models  \
                   -ti --restart=always --name local-ai \
                   --runtime nvidia --gpus all quay.io/go-skynet/local-ai:master-nvidia-l4t-arm64-core

Bug Fixes 🐛

Multiple fixes to improve stability, including enabling SYCL support for stablediffusion-ggml and consistent OpenAI stop reason returns.
Improved context shift handling for llama.cpp and fixed gallery store overrides.

🧠 Models:

I've fine-tuned a family of models based on o1-cot and function call datasets to work closely with all LocalAI features regarding function calling. The models are tailored to be conversational and execute function calls:

llama3.2-1b version: https://huggingface.co/mudler/LocalAI-functioncall-llama3.2-1b-v0.4
llama3.2-3b version: https://huggingface.co/mudler/LocalAI-functioncall-llama3.2-3b-v0.5
phi-4 version: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3
qwen2.5 (7b) version: https://huggingface.co/mudler/LocalAI-functioncall-qwen2.5-7b-v0.5

Enjoy! All the models are available in the LocalAI gallery:

local-ai run LocalAI-functioncall-phi-4-v0.3
local-ai run LocalAI-functioncall-llama3.2-1b-v0.4
local-ai run LocalAI-functioncall-llama3.2-3b-v0.5
local-ai run localai-functioncall-qwen2.5-7b-v0.5

Other models

Numerous model updates and additions:

New models like nightwing3-10b, rombos-qwen2.5-writer, and negative_llama_70b.
Updated checksum for model galleries.
Added icons and improved prompt templates for various models.
Expanded model gallery with new additions like DeepSeek-R1, Mistral-small-24b, and more.

Full changelog 👇

👉 Click to expand 👈

Breaking Changes 🛠

chore(vall-e-x): Drop backend by @mudler in #4619
feat(transformers): merge musicgen functionalities to a single backend by @mudler in #4620
feat(transformers): merge sentencetransformers backend by @mudler in #4624
chore(stablediffusion-ncn): drop in favor of ggml implementation by @mudler in #4652
feat(transformers): add support to Mamba by @mudler in #4669
chore(openvoice): drop backend by @mudler in #4673
chore: drop embedded models by @mudler in #4715
chore(llama-ggml): drop deprecated backend by @mudler in #4775
fix(llama.cpp): disable mirostat as default by @mudler in #2911

Bug fixes 🐛

fix(stablediffusion-ggml): correctly enable sycl by @mudler in #4591
fix(stablediffusion-ggml): enable oneapi before build by @mudler in #4593
fix(docs): add missing -core suffix to sycl images by @M0Rf30 in #4630
fix(stores): Stores fixes and testing by @richiejp in #4663
fix(gallery): do not return overrides and additional config by @mudler in #4768
fix(openai): consistently return stop reason by @mudler in #4771
fix(llama.cpp): improve context shift handling by @mudler in #4820

Exciting New Features 🎉

feat(stablediffusion-ggml): respect build type by @mudler in #4581
feat(diffusers): add support for Sana pipelines by @mudler in #4603
feat(tts): Add Kokoro backend by @mudler in #4616
feat: add machine tag and inference timings by @mintyleaf in #4577
feat(transformers): add support to OuteTTS by @mudler in #4622
Extra-Usage and Machine-Tag docs by @mintyleaf in #4627
chore: fix some function names in comment by @petercover in #4665
feat(faster-whisper): add backend by @mudler in #4666
chore: detect and enable avx512 builds by @mudler in #4675
chore(downloader): support hf.co and hf:// URIs by @mudler in #4677
feat: function argument parsing using named regex by @mKenfenheuer in #4700
feat(llama.cpp): Add support to grammar triggers by @mudler in #4733
feat: tokenization with llama.cpp by @shraddhazpy in #4724
feat(diffusers): allow to override image gen options by @mudler in #4807

🧠 Models

chore(model-gallery): ⬆️ update checksum by @localai-bot in #4580
chore(model gallery): add nightwing3-10b-v0.1 by @mudler in #4582
chore(model gallery): add qwq-32b-preview-ideawhiz-v1 by @mudler in #4583
chore(model gallery): add rombos-qwen2.5-writer-32b by @mudler in #4584
chore(model gallery): add sky-t1-32b-preview by @mudler in #4585
chore(model gallery): add negative_llama_70b by @mudler in #4586
chore(model gallery): add finemath-llama-3b by @mudler in #4587
chore(model gallery): add LocalAI-functioncall-phi-4-v0.1 by @mudler in #4588
chore(model gallery): add LocalAI-functioncall-phi-4-v0.2 by @mudler in #4589
chore(model gallery): add LocalAI-functioncall-phi-4-v0.3 by @mudler in #4599
chore(model gallery): add negative-anubis-70b-v1 by @mudler in #4600
chore(model gallery): add qwen2.5-72b-rp-ink by @mudler in #4601
chore(model gallery): add steiner-32b-preview by @mudler in #4602
chore(model gallery): add qwerus-7b by @mudler in #4609
chore(model gallery): add l3.3-ms-nevoria-70b by @mudler in #4610
chore(model gallery): add lb-reranker-0.5b-v1.0 by @mudler in #4611
chore(model gallery): add uwu-7b-instruct by @mudler in #4613
chore(model gallery): add drt-o1-14b by @mudler in #4614
chore(model gallery): add vikhr-qwen-2.5-1.5b-instruct by @mudler in #4615
chore: remove deprecated tinydream backend by @M0Rf30 in #4631
chore(model gallery): add MiniCPM-V-2.6-8b-q4_K_M by @M0Rf30 in #4633
chore(model gallery): add InternLM3-8b-Q4_K_M by @M0Rf30 in #4637
fix(model gallery): minicpm-v-2.6 is based on qwen2 by @M0Rf30 in #4638
chore(model gallery): update icons and add missing ones by @M0Rf30 in #4639
chore(model gallery): add wayfarer-12b by @mudler in #4641
chore(model gallery): add l3.3-70b-magnum-v4-se by @mudler in #4642
chore(model gallery): add l3.3-prikol-70b-v0.2 by @mudler in #4643
chore(model gallery): remove dead icons and update LLAVA and DeepSeek ones by @M0Rf30 in #4645
chore(model gallery): add sd-3.5-large-ggml by @mudler in #4647
chore(model gallery): add Deepseek-R1-Distill models by @M0Rf30 in #4646
chore(model gallery): add deepseek-r1-distill-qwen-7b by @mudler in #4660
chore(model gallery): add sd-1.5-ggml and sd-3.5-medium-ggml by @mudler in #4664
chore(model gallery): add MiniCPM-o-2.6-7.6b by @M0Rf30 in #4676
chore(model gallery): add DeepSeek R1 14b, 32b and 70b by @M0Rf30 in #4679
chore(model gallery): add flux.1, stablediffusion and whisper icons by @M0Rf30 in #4680
chore(model gallery): update deepseek-r1 prompt template by @M0Rf30 in #4686
chore(model gallery): add lamarck-14b-v0.7 by @mudler in #4687
chore(model gallery): add art-v0-3b by @mudler in #4688
chore(model gallery): add chuluun-qwen2.5-72b-v0.08 by @mudler in #4689
chore(model gallery): add l3.3-nevoria-r1-70b by @mudler in #4691
chore(model gallery): add dumpling-qwen2.5-32b by @mudler in #4692
chore(model gallery): add deepseek-r1-qwen-2.5-32b-ablated by @mudler in #4693
chore(model gallery): add confucius-o1-14b by @mudler in #4696
chore(model gallery): add fuseo1-deepseekr1-qwen2.5-coder-32b-preview-v0.1 by @mudler in #4697
chore(model gallery): add fuseo1-deepseekr1-qwen2.5-instruct-32b-preview by @mudler in #4698
chore(model gallery): add fuseo1-deepseekr1-qwq-32b-preview by @mudler in #4699
chore(model gallery): add specific message templates for llama3.2 based models by @mKenfenheuer in #4707
chore(model gallery): add virtuoso-lite by @mudler in #4718
chore(model gallery): add selene-1-mini-llama-3.1-8b by @mudler in #4719
chore(model gallery): add openthinker-7b by @mudler in #4720
chore(model-gallery): ⬆️ update checksum by @localai-bot in #4723
chore(model gallery): add mistral-small-24b-instruct-2501 by @mudler in #4725
chore(model gallery): add tinyswallow-1.5b-instruct by @mudler in #4726
chore(model gallery): add taid-llm-1.5b by @mudler in #4727
chore(model gallery): add fuseo1-deekseekr1-qwq-skyt1-32b-preview by @mudler in #4731
chore(model gallery): add steelskull_l3.3-damascus-r1 by @mudler in #4737
chore(model gallery): add thedrummer_gemmasutra-pro-27b-v1.1 by @mudler in #4738
chore(model gallery): add uncensoredai_uncensoredlm-deepseek-r1-distill-qwen-14b by @mudler in #4739
chore(model gallery): add LocalAI-functioncall-llama3.2-1b-v0.4 by @mudler in #4740
chore(model gallery): add fblgit_miniclaus-qw1.5b-unamgs-grpo by @mudler in #4758
chore(model gallery): add nohobby_l3.3-prikol-70b-v0.4 by @mudler in #4759
chore(model gallery): add suayptalha_maestro-10b by @mudler in #4760
chore(model gallery): add agi-0_art-skynet-3b by @mudler in #4763
chore(model gallery): add rubenroy_gilgamesh-72b by @mudler in #4764
chore(model gallery): add krutrim-ai-labs_krutrim-2-instruct by @mudler in #4765
chore(model gallery): add LocalAI-functioncall-llama3.2-3b-v0.5 by @mudler in #4766
chore(model gallery): add arliai_llama-3.3-70b-arliai-rpmax-v1.4 by @mudler in #4772
chore(model gallery): add tiger-lab_qwen2.5-32b-instruct-cft by @mudler in #4773
chore(model gallery): add black-ink-guild_pernicious_prophecy_70b by @mudler in #4774
chore(model gallery): add nohobby_l3.3-prikol-70b-v0.5 by @mudler in #4777
chore(model gallery): add cognitivecomputations_dolphin3.0-r1-mistral-24b by @mudler in #4778
chore(model gallery): add cognitivecomputations_dolphin3.0-mistral-24b by @mudler in #4779
chore(model gallery): add sicariussicariistuff_redemption_wind_24b by @mudler in #4781
chore(model gallery): add huihui-ai_deepseek-r1-distill-llama-70b-abliterated by @mudler in #4790
chore(model gallery): add subtleone_qwen2.5-32b-erudite-writer by @mudler in #4792
chore(model gallery): add ilsp_llama-krikri-8b-instruct by @mudler in #4795
feat: Centralized Request Processing middleware by @dave-gray101 in #3847
chore(model gallery): add localai-functioncall-qwen2.5-7b-v0.5 by @mudler in #4796
chore(model gallery): add agentica-org_deepscaler-1.5b-preview by @mudler in #4804
chore(model gallery): add simplescaling_s1.1-32b by @mudler in #4812
chore(model gallery): add theskullery_l3.3-exp-unnamed-model-70b-v0.5 by @mudler in #4813
chore(model gallery): add nvidia_aceinstruct-1.5b by @mudler in #4819
chore(model gallery): add nvidia_aceinstruct-7b by @mudler in #4821
chore(model gallery): add nvidia_aceinstruct-72b by @mudler in #4822
chore(model gallery): add sicariussicariistuff_phi-lthy4 by @mudler in #4826
chore(model gallery): add open-thoughts_openthinker-32b by @mudler in #4827
chore(model gallery): add nousresearch_deephermes-3-llama-3-8b-preview by @mudler in #4828
chore(model gallery): add rombo-org_rombo-llm-v3.0-qwen-32b by @mudler in #4830
chore(model gallery): add pygmalionai_eleusis-12b by @mudler in #4832
chore(model gallery): add davidbrowne17_llamathink-8b-instruct by @mudler in #4833

📖 Documentation and examples

docs: update advanced-usage.md to reflect changes in #4700 by @mKenfenheuer in #4709

👒 Dependencies

chore: ⬆️ Update ggerganov/llama.cpp to c05e8c9934f94fde49bc1bc9dc51eed282605150 by @localai-bot in #4579
chore(deps): bump llama.cpp to '924518e2e5726e81f3aeb2518fb85963a500e… by @mudler in #4592
chore(deps): Bump securego/gosec from 2.21.4 to 2.22.0 by @dependabot in #4594
chore: ⬆️ Update ggerganov/llama.cpp to 504af20ee4eae72080a56d59d744f6774f7901ce by @localai-bot in #4597
chore: ⬆️ Update ggerganov/llama.cpp to b4d92a59a20eea400d8dd30844a339b76210daa0 by @localai-bot in #4606
chore: ⬆️ Update ggerganov/llama.cpp to adc5dd92e8aea98f5e7ac84f6e1bc15de35130b5 by @localai-bot in #4612
chore: ⬆️ Update ggerganov/llama.cpp to 4dbc8b9cb71876e005724f4e8f73a3544646bcf5 by @localai-bot in #4618
chore(deps): Bump scipy from 1.14.0 to 1.15.1 in /backend/python/transformers by @dependabot in #4621
chore(llama.cpp): update dependency by @mudler in #4628
chore: ⬆️ Update leejet/stable-diffusion.cpp to 5eb15ef4d022bef4a391de4f5f6556e81fbb5024 by @localai-bot in #4636
chore: ⬆️ Update ggerganov/llama.cpp to a1649cc13f89946322358f92ea268ae1b7b5096c by @localai-bot in #4635
chore: ⬆️ Update ggerganov/llama.cpp to 92bc493917d43b83e592349e138b54c90b1c3ea7 by @localai-bot in #4640
chore(deps): Bump docs/themes/hugo-theme-relearn from 80e448e to 8dad5ee by @dependabot in #4656
chore: ⬆️ Update ggerganov/llama.cpp to aea8ddd5165d525a449e2fc3839db77a71f4a318 by @localai-bot in #4657
chore: ⬆️ Update ggerganov/llama.cpp to 6171c9d25820ccf676b243c172868819d882848f by @localai-bot in #4661
chore: ⬆️ Update ggerganov/llama.cpp to 6152129d05870cb38162c422c6ba80434e021e9f by @localai-bot in #4668
chore(parler-tts): drop backend by @mudler in #4672
chore: ⬆️ Update ggerganov/llama.cpp to c5d9effb49649db80a52caf5c0626de6f342f526 by @localai-bot in #4685
chore: ⬆️ Update ggerganov/llama.cpp to 26771a1491f3a4c3d5b99c4c267b81aca9a7dfa0 by @localai-bot in #4690
chore: ⬆️ Update ggerganov/llama.cpp to 178a7eb952d211b8d4232d5e50ae1b64519172a9 by @localai-bot in #4694
chore(deps): Bump sentence-transformers from 3.3.1 to 3.4.0 in /backend/python/transformers by @dependabot in #4702
chore(deps): Bump docs/themes/hugo-theme-relearn from 8dad5ee to 5bcb9fe by @dependabot in #4704
chore: ⬆️ Update ggerganov/llama.cpp to a4417ddda98fd0558fb4d802253e68a933704b59 by @localai-bot in #4705
chore(deps): Bump dependabot/fetch-metadata from 2.2.0 to 2.3.0 by @dependabot in #4701
chore: ⬆️ Update ggerganov/llama.cpp to cae9fb4361138b937464524eed907328731b81f6 by @localai-bot in #4711
chore: ⬆️ Update ggerganov/llama.cpp to eb7cf15a808d4d7a71eef89cc6a9b96fe82989dc by @localai-bot in #4717
chore: ⬆️ Update ggerganov/llama.cpp to 8b576b6c55bc4e6be898b47522f0ef402b93ef62 by @localai-bot in #4722
chore: ⬆️ Update ggerganov/llama.cpp to aa6fb1321333fae8853d0cdc26bcb5d438e650a1 by @localai-bot in #4728
chore: ⬆️ Update ggerganov/llama.cpp to 53debe6f3c9cca87e9520a83ee8c14d88977afa4 by @localai-bot in #4732
chore: ⬆️ Update ggerganov/llama.cpp to 90f9b88afb6447d3929843a2aa98c0f11074762d by @localai-bot in #4736
chore(deps): Bump GrantBirki/git-diff-action from 2.7.0 to 2.8.0 by @dependabot in #4746
chore: ⬆️ Update ggerganov/llama.cpp to 5598f475be3e31430fbe17ebb85654ec90dc201e by @localai-bot in #4757
chore(deps): Bump sentence-transformers from 3.4.0 to 3.4.1 in /backend/python/transformers by @dependabot in #4748
chore(deps): Bump docs/themes/hugo-theme-relearn from 5bcb9fe to 66bc366 by @dependabot in #4750
chore: ⬆️ Update ggerganov/llama.cpp to 3ec9fd4b77b6aca03a3c2bf678eae3f9517d6904 by @localai-bot in #4762
chore: ⬆️ Update leejet/stable-diffusion.cpp to d46ed5e184b97c2018dc2e8105925bdb8775e02c by @localai-bot in #4769
chore: ⬆️ Update ggerganov/llama.cpp to d774ab3acc4fee41fbed6dbfc192b57d5f79f34b by @localai-bot in #4770
chore: ⬆️ Update ggerganov/llama.cpp to 8a59053f63fffc24e730cd3ea067760abfe4a919 by @localai-bot in #4776
chore: ⬆️ Update ggerganov/llama.cpp to d2fe216fb2fb7ca8627618c9ea3a2e7886325780 by @localai-bot in #4780
chore: ⬆️ Update ggerganov/llama.cpp to e6e658319952f7ad269dc11275b9edddc721fc6d by @localai-bot in #4787
chore: ⬆️ Update ggerganov/llama.cpp to 19d3c8293b1f61acbe2dab1d49a17950fd788a4a by @localai-bot in #4793
chore(deps): Bump docs/themes/lotusdocs from f5785a2 to 975da91 by @dependabot in #4801
chore: ⬆️ Update ggerganov/llama.cpp to 19b392d58dc08c366d0b29bd3b9c6991fa4e1662 by @localai-bot in #4803
chore: ⬆️ Update ggerganov/llama.cpp to 90e4dba461b07e635fd1daf3b491c978c7dd0013 by @localai-bot in #4810
chore: ⬆️ Update ggerganov/llama.cpp to 0fb77f821f6e70ad8b8247a97d1022f0fef78991 by @localai-bot in #4814
chore: ⬆️ Update ggerganov/llama.cpp to 8a8c4ceb6050bd9392609114ca56ae6d26f5b8f5 by @localai-bot in #4825
chore: ⬆️ Update ggerganov/llama.cpp to 300907b2110cc17b4337334dc397e05de2d8f5e0 by @localai-bot in #4829

Other Changes

docs: ⬆️ update docs version mudler/LocalAI by @localai-bot in #4578
chore(stablediffusion-ggml): disable sycl optimizations by @mudler in #4598
chore: alias transformers-musicgen to transformers by @mudler in #4623
feat(swagger): update swagger by @localai-bot in #4625
feat(swagger): update swagger by @localai-bot in #4667
chore(refactor): group cpu cap detection by @mudler in #4674
chore(deps): bump grpcio to 1.70.0 by @mudler in #4682
refactor: function argument parsing using named regex by @mKenfenheuer in #4708
fix(tests): pin to branch for config used in tests by @mudler in #4721
feat(swagger): update swagger by @localai-bot in #4735
chore: migrate bruno request files to examples repo by @dave-gray101 in #4788
chore(tests): decrease parallelism for gRPC builds by @mudler in #4797
chore(grpcio): bump to 1.70 by @mudler in #4798
chore(grpcio): reduce parallelism by @mudler in #4799
chore(swagger): update by @mudler in #4805
Revert "chore(deps): Bump docs/themes/lotusdocs from f5785a2 to 975da91" by @mudler in #4808
feat(swagger): update swagger by @localai-bot in #4809

New Contributors

@petercover made their first contribution in #4665
@mKenfenheuer made their first contribution in #4700

Full Changelog: v2.25.0...v2.26.0

mudler/LocalAI v2.26.0 on GitHub