InternLM/lmdeploy v0.5.2
LMDeploy Release V0.5.2

on GitHub

latest releases: v0.10.2, v0.10.1, v0.10.0...

16 months ago

Highlight

LMDeploy support Llama3.1 and its Tool Calling. An example of calling "Wolfram Alpha" to perform complex mathematical calculations can be found from here

What's Changed

🚀 Features

Support glm4 awq by @AllentDan in #1993
Support llama3.1 by @lvhan028 in #2122
Support Llama3.1 tool calling by @AllentDan in #2123

💥 Improvements

Remove the triton inference server backend "turbomind_backend" by @lvhan028 in #1986
Remove kv cache offline quantization by @AllentDan in #2097
Remove session_len and deprecated short names of the chat templates by @lvhan028 in #2105
clarify "n>1" in GenerationConfig hasn't been supported yet by @lvhan028 in #2108

🐞 Bug fixes

fix stop words for glm4 by @RunningLeon in #2044
Disable peer access code by @lzhangzz in #2082
set log level ERROR in benchmark scripts by @lvhan028 in #2086
raise thread exception by @irexyc in #2071
Fix index error when profiling token generation with -ct 1 by @lvhan028 in #1898

🌐 Other

misc: replace slow Jimver/cuda-toolkit by @zhyncs in #2065
misc: update bug issue template by @zhyncs in #2083
update daily testcase new by @zhulinJulia24 in #2035
bump version to v0.5.2 by @lvhan028 in #2143

Full Changelog: v0.5.1...v0.5.2

Check out latest releases or
releases around InternLM/lmdeploy v0.5.2

Don't miss a new lmdeploy release

NewReleases is sending notifications on new releases.

Get notifications