InternLM/lmdeploy v0.12.1
on GitHub

11 hours ago

What's Changed

🚀 Features

support glm-4.7-flash by @RunningLeon in #4320
[ascend]suppot ep by @yao-fengchen in #3696

💥 Improvements

fix rotary embedding for transformers v5 by @grimoire in #4303
Improve metrics log by @CUHKSZzxy in #4297
Support ignore layers in quant config for qwen3 models by @RunningLeon in #4293
add custom noaux kernel by @grimoire in #4345
fix qwen3vl with transformers5 by @grimoire in #4348

🐞 Bug fixes

fix tool call parser's streaming cursor by @lvhan028 in #4333
Fix data race for guided decoding in TP mode by @lzhangzz in #4341
fa3 check by @grimoire in #4340
Fix time series preprocess by @CUHKSZzxy in #4339
Negative KV sequence length error in Attention op by @jinminxi104 in #4316
fix qwen3-vl-moe long context by @grimoire in #4342
fix: move quantized norm to CPU instead of stale q_linear reference in smooth_quant by @Mr-Neutr0n in #4352
update noaux-kernel check by @grimoire in #4358

🌐 Other

change INPUT_CUDA_VERSION to 12.6.2 by @lvhan028 in #4322
add Qwen3-8B accuracy evaluation in llm_compressor.md by @43758726 in #4319
[ci] refactor ete testcase by @zhulinJulia24 in #4274
Set alias interns1_1 for interns1_pro by @lvhan028 in #4334
build(docker): skip FA2 when use cu13 by @windreamer in #4356
bump version to v0.12.1 by @lvhan028 in #4350

New Contributors

@Mr-Neutr0n made their first contribution in #4352

Full Changelog: v0.12.0...v0.12.1

Check out latest releases or
releases around InternLM/lmdeploy v0.12.1

Don't miss a new lmdeploy release

NewReleases is sending notifications on new releases.

Get notifications