github InternLM/lmdeploy v0.13.0

7 hours ago

What's Changed

🚀 Features

  • [Ascend] support qwen3.5 35BA3B by @wanfengcxz in #4485
  • feat: Add TurboQuant (quant_policy=42) support for KV Cache Quantization by @windreamer in #4510
  • [refactor] [api_server] [2/N] improve tool parsers by abstracting xml parser by @lvhan028 in #4548
  • feat(turbomind): integrate cublasGemmGroupedBatchedEx for Qwen3.5 MoE inference on Blackwell GPUs with memory copy optimizations by @hd9568 in #4490
  • feat: add Anthropic-compatible serving endpoints by @lvhan028 in #4538
  • Support InternS2 Preview by @CUHKSZzxy in #4575

💥 Improvements

🐞 Bug fixes

🌐 Other

New Contributors

Full Changelog: v0.12.3...v0.13.0

Don't miss a new lmdeploy release

NewReleases is sending notifications on new releases.