InternLM/lmdeploy v0.12.2 on GitHub

What's Changed

support glm5 by @grimoire in #4355
Qwen/Internlm/Llama Dense/Moe model fp8 quant online by @43758726 in #4324
Qwen3.5 by @grimoire in #4351
GLM-4.7-Flash Turbomind support by @lapy in #4362
Support router replay and ignore quant layer for qwen3.5 by @RunningLeon in #4394
[Feature] Add TurboMind support for Qwen3.5 models (dense + MoE) by @lapy in #4389
support repetition ngram logits processor by @grimoire in #4288

Compatible with transformers 5.0 at TurboMind side by @lvhan028 in #4304
Support fp32 head for qwen and internlm models by @RunningLeon in #4160
Reduce MLA kv-cache memory by @lzhangzz in #4373
add recurrent_gated_delta_rule kernel by @grimoire in #4376
[ascend]adapt for s1-pro dp*tp+ep by @yao-fengchen in #4380
Support glm4.7 with mtp by @RunningLeon in #4346
Faster MLA kernels by @lzhangzz in #4391
Attention kernel self-registration and decoupled dispatching by @lzhangzz in #4396

fix: change debug log from ERROR to DEBUG in RepetitionPenaltyKernel by @murray-macdonald in #4363
Fix quant config parsing for internvl awq model by @RunningLeon in #4369
Fix XGrammar bitmask initialization and add null check for gen_config in generate method by @windreamer in #4349
fix the logic of closing session by @lvhan028 in #4370
Fix authorization by @lvhan028 in #4338
Fix some minor issues and provide tests for Pipeline by @windreamer in #4365
fix dllm mask on set_step by @grimoire in #4278
fix models for transformers>=5 by @grimoire in #4381
fix exception when aborting a request by @lvhan028 in #4403
fix inference crashed on v100 with qwen3.5-0.8b by @lvhan028 in #4420

ci(lint): skip flaky deadlink test for python wiki page by @windreamer in #4357
fix fa3 install by @irexyc in #4361
fix lint by @windreamer in #4375
upgrade triton and torch by @grimoire in #4379
Add speculative decoding test by @littlegy in #4377
ci: integrate clang-format lint into pre-commit hooks by @windreamer in #4390
Update dockerfile by removing cu11 and changing cu12.4 to cu12.6 by @lvhan028 in #4398
manually build dev image instead of publishing it every version by @lvhan028 in #4409
bump version to v0.12.2 by @lvhan028 in #4378

Full Changelog: v0.12.1...v0.12.2