InternLM/lmdeploy v0.13.0 on GitHub

What's Changed

[Ascend] support qwen3.5 35BA3B by @wanfengcxz in #4485
feat: Add TurboQuant (quant_policy=42) support for KV Cache Quantization by @windreamer in #4510
[refactor] [api_server] [2/N] improve tool parsers by abstracting xml parser by @lvhan028 in #4548
feat(turbomind): integrate cublasGemmGroupedBatchedEx for Qwen3.5 MoE inference on Blackwell GPUs with memory copy optimizations by @hd9568 in #4490
feat: add Anthropic-compatible serving endpoints by @lvhan028 in #4538
Support InternS2 Preview by @CUHKSZzxy in #4575

lmdeploy support kernel block size by @Tsundoku958 in #4421
Reject requests on stale session or sleeping engine by @lvhan028 in #4496
Add modern logging utils by @lzhangzz in #4486
refine dlinfer update_weights by @yao-fengchen in #4519
feat(serve): expose repetition n-gram params on OpenAI routes by @lvhan028 in #4522
Refactor step inputs by @grimoire in #4504
fix lite module for transformers>=5.0 by @43758726 in #4488
[refactor] [api_server] [1/N] Improve reasoning and tool-call parsers by @lvhan028 in #4468
fix: prevent prefill starvation under high decode load by @grimoire in #4532
Mixed modality by @CUHKSZzxy in #4531
optimize get_sorted_idx in moe by @grimoire in #4529
Map user-input session_id to internal session_id to maintain session identity by @lvhan028 in #4523
support more message item types by @CUHKSZzxy in #4501
add explicit trust_remote_code controls to resolve the security issue by @lvhan028 in #4511

[ascend] fix prefix caching by @yao-fengchen in #4448
fix update params by @CUHKSZzxy in #4514
fix ray mem leak by @grimoire in #4487
Fix mtp by @RunningLeon in #4517
fix kernel-block-size by @grimoire in #4521
fix: use is not None check for seed to prevent seed=0 being silently ignored by @kuishou68 in #4526
Fix qwen35 dp by @grimoire in #4535
Fix mtp for rl by @RunningLeon in #4520
cancel request and block new inputs when sleeping by @grimoire in #4541
Fix mp engine by @RunningLeon in #4540
Fix cache sizing and cache block layout edge cases by @grimoire in #4552
Fix qwen3.5-moe mtp with tp>1 by @RunningLeon in #4568
block_offsets padding 0 by @grimoire in #4569
hotfix: resolve test issues for v0.13.0 by @lvhan028 in #4571
ResponseParser forget to strip tag in non-stream mode by @lvhan028 in #4576
yield error when prompt processing suffers exception by @lvhan028 in #4574
Fix the reprefill of evicted seqs with invalid draft tokens by @RunningLeon in #4564
Support mtp fp8 by @RunningLeon in #4572

Use env LMDEPLOY_FP32_MAMBA_SSM_DTYPE to control the dtype of recurrent state by @lvhan028 in #4518
add tool and reasoning test by @littlegy in #4388
update h config and add glm4.7 mtp test by @littlegy in #4424
[ci] change test whl into python 312 and use test images by @zhulinJulia24 in #4513
[Misc] fix typos in turbomind.py and model.py by @ZhijunLStudio in #4543
[Misc] fix mutable default arguments by @ZhijunLStudio in #4544
Add docker/Dockerfile_patch; minor tweaks in messages.py and setup.py. by @lvhan028 in #4546
remove barely used skills and checkin docker-build skill by @lvhan028 in #4560
bump version to v0.13.0 by @lvhan028 in #4549

Full Changelog: v0.12.3...v0.13.0