github xorbitsai/inference v1.9.1

3 days ago

What's new in 1.9.1 (2025-08-30)

These are the changes in inference v1.9.1.

New features

Enhancements

  • ENH: added zero shot and voice cloning ability for audio models by @qianduoduo0904 in #3968
  • ENH: Add Template for Qwen3 Reranker when model_engine = vllm by @zhcn000000 in #3983
  • ENH: Update the environment dependencies for cosyvoice2 by @Gmgge in #4015
  • ENH: Compat with xllamacpp 0.2.0 by @codingl2k1 in #4004
  • ENH: support chat_template_kwargs for llama.cpp by @qinxuye in #3988
  • BLD: Clean up Docker's last legacy cache and images before executing each step by @zwt-1234 in #3963
  • BLD: fix CI failures by @qinxuye in #4002

Bug fixes

  • BUG: disable flash_attention when GPU compute capability < 8.0 by @amumu96 in #3973
  • BUG: fix rerank model creation by @qinxuye in #3977

Documentation

Others

New Contributors

Full Changelog: v1.9.0...v1.9.1

Don't miss a new inference release

NewReleases is sending notifications on new releases.