What's Changed
🚀 Features
- [Feature] support qwen2.5-vl for pytorch engine by @CUHKSZzxy in #3194
- Support reward models by @lvhan028 in #3192
- Add collective communication kernels by @lzhangzz in #3163
- PytorchEngine multi-node support v2 by @grimoire in #3147
- Add flash mla by @AllentDan in #3218
- Add gemma3 implementation by @AllentDan in #3272
💥 Improvements
- remove update badwords by @grimoire in #3183
- defaullt executor ray by @grimoire in #3210
- change ascend&camb default_batch_size to 256 by @jinminxi104 in #3251
- Tool reasoning parsers and streaming function call by @AllentDan in #3198
- remove torchelastic flag by @grimoire in #3242
- disable flashmla warning on sm<90 by @grimoire in #3271
🐞 Bug fixes
- Fix missing cli chat option by @lzhangzz in #3209
- [ascend] fix multi-card distributed inference failures by @tangzhiyi11 in #3215
- fix for small cache-max-entry-count by @grimoire in #3221
- [dlinfer] fix glm-4v graph mode on ascend by @jinminxi104 in #3235
- fix qwen2.5 pytorch engine dtype error on NPU by @tcye in #3247
- [Fix] failed to update the tokenizer's eos_token_id into stop_word list by @lvhan028 in #3257
- fix dsv3 gate scaling by @grimoire in #3263
- Fix the bug for reading dict error by @GxjGit in #3196
- Fix get ppl by @lvhan028 in #3268
📚 Documentations
- Specifiy lmdeploy version in benchmark guide by @lyj0309 in #3216
- [ascend] add Ascend docker image by @jinminxi104 in #3239
🌐 Other
- [ci] testcase refactoring by @zhulinJulia24 in #3151
- [ci] add testcase for native communicator by @zhulinJulia24 in #3217
- [ci] add volc evaluation testcase by @zhulinJulia24 in #3240
- [ci] remove v100 testconfig by @zhulinJulia24 in #3253
- add rdma dependencies into docker file by @CUHKSZzxy in #3262
- docs: update ascend docs for docker running by @CyCle1024 in #3266
- bump version to v0.7.2 by @lvhan028 in #3252
New Contributors
Full Changelog: v0.7.1...v0.7.2