What's Changed
🚀 Features
- [ascend] support dptp by @tangzhiyi11 in #4218
- Support Deepseek v32 by @grimoire in #4026
💥 Improvements
- Improve metrics by @CUHKSZzxy in #4178
- reserve blocks for dummy inputs by @grimoire in #4157
- Add vision id for Qwen3-VL by @CUHKSZzxy in #4183
- [Enhance]: Return routed experts when request canceled by @RunningLeon in #4197
- Add mm processor args for Qwen3-VL by @CUHKSZzxy in #4196
- support chat_template_kwargs in v1/chat/completions by @lvhan028 in #4201
- Refactor scheduler and engine.py by @grimoire in #4163
- update dp timeout by @grimoire in #4204
- Improve Qwen3-VL by @CUHKSZzxy in #4207
🐞 Bug fixes
- [Fix]: Split routed experts with query lens by @RunningLeon in #4180
- [Maca] fix ray and memory sync by @wanfengcxz in #4164
- Build block trie in prefill and add hit rate by @RunningLeon in #4184
- fix fope by @CUHKSZzxy in #4191
- fix hf modules read/write conflicts by multi processors by @lvhan028 in #4188
- Some Minor fix by @windreamer in #4185
- fix insecure deserialization when calling torch.load() by @lvhan028 in #4202
- Fix processor args by @CUHKSZzxy in #4200
- remove get_model_config to avoid pickle hf_config error in rpc calling by @lvhan028 in #4217
- Fix quant scale-fmt by @grimoire in #4212
- Fix requests of mix return_logprobs by @RunningLeon in #4222
- fix fillkv quant8 by @grimoire in #4229
- fix scale-fmt by @grimoire in #4230
📚 Documentations
- [Docs]: Add guide for VLMEvalKit by @CUHKSZzxy in #4156
🌐 Other
- Add FA3 by @CUHKSZzxy in #4166
- Add distributed test cases by @littlegy in #4161
- Add generate test by @littlegy in #4181
- [ci] add mllm eval by @zhulinJulia24 in #4194
- [ascend] refactor code by @yao-fengchen in #4176
- install serve.txt when building the docker image by @lvhan028 in #4219
- bump version to v0.11.1 by @lvhan028 in #4221
Full Changelog: v0.11.0...v0.11.1