What's Changed
🚀 Features
- Add ROCm support: installation guide and FlashAttention compatibility for AMD GPUs by @Vivicai1005 in #3925
- support gpt-oss basic output by @irexyc in #3956
- Add FP8*(B)F16 GEMM by @lzhangzz in #3960
- Support GLM-4.5 by @CUHKSZzxy in #3863
- [Refactor]: Remove tokenizer when building engine by @RunningLeon in #3978
- Support InternVL3.5-Flash by @CUHKSZzxy in #3952
- support gpt-oss function/reasoning in /v1/chat/completions by @irexyc in #3962
- support returning stop_str in output by @lvhan028 in #3984
- Support SDAR by @grimoire in #3922
💥 Improvements
- specify installation on GeForce RTX 50 series by @lvhan028 in #3947
- cherry pick PR-3708 to return token_id by @lvhan028 in #3976
- Optimize AsyncEngine generation method by @shell-nlp in #3982
- Use blocking sync when TP engine is idling by @lzhangzz in #3974
- add openai_harmony to requirements by @irexyc in #4006
🐞 Bug fixes
- fix bugs with triton3.4.0 by @grimoire in #3946
- fix longrope by @grimoire in #3968
- Fix tm rl usage in xtuner by @irexyc in #3912
- Disable prefix caching when serving a VLM model by @lvhan028 in #3990
- remove NCCL_LAUNCH_MODE by @irexyc in #3994
- return the last token's logprobs, logits and last_hidden_states if include_stop_str_in_output is requested by @lvhan028 in #4000
- [Fix] device args in chat cli when using pytorch engine by @CyCle1024 in #3999
- fix internvl by @CUHKSZzxy in #3997
- fix not-returned iterator in SequenceManager::Erase by @irexyc in #4001
- fix cudagraph without warmup by @grimoire in #4005
- fix internvl flash long context acc by @CUHKSZzxy in #4003
🌐 Other
- [ci] update daily testcase by @zhulinJulia24 in #3944
- [maca] change kv layout from pagedattn to flashattn by @yuchiwang in #3958
- remove cudnn by @irexyc in #3969
- build(pypi): add cuda 12.8 support for wheels by @windreamer in #3948
- [CI] add ascend test by @littlegy in #3959
- update serve requirement by @RunningLeon in #3986
- [ci] add h800 function test workflow by @zhulinJulia24 in #3985
- bump version to v0.10.1 by @lvhan028 in #3989
New Contributors
- @Vivicai1005 made their first contribution in #3925
- @shell-nlp made their first contribution in #3982
- @littlegy made their first contribution in #3959
Full Changelog: v0.10.0...v0.10.1