InternLM/lmdeploy v0.11.0 on GitHub

What's Changed

Enlarge gc threshold by @grimoire in #4076
remove num_tokens from EngineOutput by @lvhan028 in #4088
revert masking vocab_size by @lvhan028 in #4089
feat: add json_object support in response_format by @windreamer in #4080
support image_data input to /generate endpoint by @irexyc in #4086
[Fix] all RayEngineWorker actors created at node 0 in RL training by @CyCle1024 in #4107
Optimize sleep level=1 for turbomind backend by @irexyc in #4074
[Feat] enable ascend update_params by @CyCle1024 in #4111
Enhance request checker by @lvhan028 in #4104
Refactor dp tp by @grimoire in #4004
fix kernel numerical error by @grimoire in #4133
free ray put by @grimoire in #4137
Reduce experts cache when resize by @RunningLeon in #4138
support interleave text and image in messages by @lvhan028 in #4141
optimize rms norm by @grimoire in #4153
fix evict policy by @Tsundoku958 in #4127

fix type hint by @grimoire in #4078
Fix inputs split by @RunningLeon in #4083
add missing update_model_meta by @jinminxi104 in #4099
Fix update_params for pytorch backend when loading vl model by @irexyc in #4101
workaround for issue "TypeError argument 'tokens': 'NoneType' object cannot be converted to 'PyString" by @lvhan028 in #4103
fix bug: schedule ratio support prefix-caching by @Tsundoku958 in #4100
remove prefill free ratio threshold by @grimoire in #4110
fix key error: api_server node might be removed by @lvhan028 in #4112
Incorrectly judging the request as a bad request by @lvhan028 in #4121
fix dist config keys by @grimoire in #4125
proxy server miss media_type in streaming mode by @lvhan028 in #4130
Fix logprobs to_tensor by @RunningLeon in #4132
Fix cli help by @RunningLeon in #4139
fix and optimize fill_kv_cache_quant by @grimoire in #4140
fix: fix package deprecation introduced by CUDA 13 by @windreamer in #4117
yield empty list for token_ids when it runs out of tokens by @lvhan028 in #4148
Fix interns1 routed experts outputs by @RunningLeon in #4149
fix qwen3-30-a3b lcb-code score by @yao-fengchen in #4142
Fix ep deployment issues by @CUHKSZzxy in #4084
Fix dllm to not use fa3 decoding by @RunningLeon in #4159
fix: handle non-tuple decoder outputs during Qwen-2.5 quantization by @chengyuma in #4158
fix cu11 docker build by @CUHKSZzxy in #4165
Fix model config by @CUHKSZzxy in #4170
fix lora by @grimoire in #4172
fix cmake logic detect sm70, sm75 by @tuilakhanh in #4175

add dockerfile to build dev image by @lvhan028 in #4091
add ascend_a3 Dockerfile by @yao-fengchen in #4097
[ci] refactor longtext benchmark by @zhulinJulia24 in #4087
enable metrics by default by @lvhan028 in #4108
Replace pynvml with nvidia-ml-py in requirements by @myhloli in #4118
[ci] add free disk before build test whl package and add session_len args in benchmark script by @zhulinJulia24 in #4136
Add prefixcache functionality and performance testing by @littlegy in #4119
[ci] modify pipeline.close and add more case into pr_test by @zhulinJulia24 in #4150
bump version to v0.11.0 by @lvhan028 in #4155

Full Changelog: v0.10.2...v0.11.0