What's new in 1.9.0 (2025-08-16)

These are the changes in inference v1.9.0.

New features

FEAT: [UI] running models data display replica. by @yiboyasss in #3897
FEAT: [model] Qwen-Image by @qinxuye in #3916
FEAT: [model] gpt-oss by @qinxuye in #3924
FEAT: function calling support for deepseek-r1-0528 by @qinxuye in #3931
FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in #3945
FEAT: sglang support streaming function call by @aniya105 in #3939
FEAT: parsing harmony format for gpt-oss by @qinxuye in #3948
FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in #3881
FEAT: Support GLM-4.5v by @Jun-Howie in #3957

ENH: Add qwen3 new model to tool call list by @zhcn000000 in #3900
ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in #3944
ENH: add flash_attention control params attn_implementation by @amumu96 in #3951
ENH: support qwen-image gguf by @qinxuye in #3954
ENH: clean embedding model cache when using vllm engine by @amumu96 in #3956
BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in #3953
BLD: Add Openfst source by @zwt-1234 in #3959

Replace @torch.no_grad() with @torch.inference_mode() in Qwen3-Reranker by @yasu-oh in #3911

Full Changelog: v1.8.1...v1.9.0