What's new in 1.9.0 (2025-08-16)
These are the changes in inference v1.9.0.
New features
- FEAT: [UI] running models data display replica. by @yiboyasss in #3897
- FEAT: [model] Qwen-Image by @qinxuye in #3916
- FEAT: [model] gpt-oss by @qinxuye in #3924
- FEAT: function calling support for deepseek-r1-0528 by @qinxuye in #3931
- FEAT: Support for GLM 4.5 quantized models by @Jun-Howie in #3945
- FEAT: sglang support streaming function call by @aniya105 in #3939
- FEAT: parsing harmony format for gpt-oss by @qinxuye in #3948
- FEAT: Add support for switching rerank model engines and support for rerank of vllm engine by @zhcn000000 in #3881
- FEAT: Support GLM-4.5v by @Jun-Howie in #3957
Enhancements
- ENH: Add qwen3 new model to tool call list by @zhcn000000 in #3900
- ENH: Update chat_template for Qwen3-Coder by @Jun-Howie in #3944
- ENH: add flash_attention control params attn_implementation by @amumu96 in #3951
- ENH: support qwen-image gguf by @qinxuye in #3954
- ENH: clean embedding model cache when using vllm engine by @amumu96 in #3956
- BLD: Downgrade flash-attn to version 2.7.4 by @zwt-1234 in #3953
- BLD: Add Openfst source by @zwt-1234 in #3959
Bug fixes
Documentation
- DOC: add doc about cu128 docker by @qinxuye in #3899
- DOC: Update xllamacpp doc by @codingl2k1 in #3862
Others
Full Changelog: v1.8.1...v1.9.0