What's Changed
🚀 Features
- Support video inputs by @CUHKSZzxy in #4360
- feat: fully implement compressed-tensors gs32 support in TurboMind by @lapy in #4429
- Draft model update params by @CUHKSZzxy in #4452
💥 Improvements
- support qwen3.5 on volta by @grimoire in #4405
- Optimize Qwen3.5 by @lzhangzz in #4434
- Builtin mrope by @grimoire in #4393
- delete ray remote function return value by @grimoire in #4422
- support cache_seqlen on recurrent-gdr and causal-conv1d-update by @grimoire in #4417
- safe ray api by @CUHKSZzxy in #4455
- add R3 for qwen3-vl-moe models by @lvhan028 in #4457
- Align rope init in lmdeploy by @RangiLyu in #4466
- Make tilelang a Linux-only dependency (like triton) by @Copilot in #4469
- prepare chunk indices before cache initialize by @grimoire in #4458
- unify rope device by @CUHKSZzxy in #4467
- custom processor args by @CUHKSZzxy in #4472
- Assign sequential api_server ports when proxy_url is unset by @lvhan028 in #4416
- disable fla intracard_backend by @grimoire in #4482
- [Fix][Feat] Fix worker sorting with external pg bundles & Support persistent buffer for update_params by @CyCle1024 in #4397
- simplify interns1 pro codes by @CUHKSZzxy in #4480
🐞 Bug fixes
- fix test_hf_overrides for transformers>5 by @grimoire in #4418
- fix qwen3.5 pytorch multimodal inference by @CUHKSZzxy in #4430
- fix
generateendpoint by @CUHKSZzxy in #4432 - Make Intern-S1-Pro compatible with Transformers 5.0+ by @lvhan028 in #4435
- fix multiround chat by @CUHKSZzxy in #4438
- fix(async_engine): make safe_run cancellation cleanup reliable with shield and SafeRunException by @lvhan028 in #4439
- release state cache by @CUHKSZzxy in #4462
- Split/tool call args json for qwen3coder tool calls (Qwen3.5) by @lapy in #4433
- fix(turbomind): fix dimension mismatch in ApplyTokenBitmaskInplace by @windreamer in #4456
- fix metrics by @CUHKSZzxy in #4410
- fix security issues by @CUHKSZzxy in #4447
- fix qwen3.5 fp8 support by @grimoire in #4470
- fix image / video resize function by @CUHKSZzxy in #4478
- fix dynamic ntk device by @CUHKSZzxy in #4483
- fix pagedattention pointer range by @grimoire in #4494
- fix glm4.7-flash by @grimoire in #4500
- Fix torch awq by @grimoire in #4503
🌐 Other
- [ci] add legacy test workflow and test config by @zhulinJulia24 in #4387
- chore: add CLAUDE.md and Claude Code skills by @CUHKSZzxy in #4413
- Fix CI errors including linting error and unit test error by @lvhan028 in #4431
- Use pyupgrade and ruff to modernize LMDeploy Python Code by @windreamer in #4392
- reduce ci memory by @irexyc in #4471
- fix: add safe.directory for git in docker workflows by @windreamer in #4474
- [ci] add nightly docker build workflow by @zhulinJulia24 in #4406
- split docker wheel preparation into staged build steps and use python 3.12 as the default version by @lvhan028 in #4476
- [Feat]: Support qwen35 with mtp by @RunningLeon in #4437
- bump version to v0.12.3 by @lvhan028 in #4493
New Contributors
Full Changelog: v0.12.2...v0.12.3