pypi vllm 0.7.2
v0.7.2

14 hours ago

Highlights

  • Qwen2.5-VL is now supported in vLLM. Please note that it requires a source installation from Hugging Face transformers library at the moment (#12604)
  • Add transformers backend support via --model-impl=transformers. This allows vLLM to be ran with arbitrary Hugging Face text models (#11330, #12785, #12727).
  • Performance enhancement to DeepSeek models.
    • Align KV caches entries to start 256 byte boundaries, yielding 43% throughput enhancement (#12676)
    • Apply torch.compile to fused_moe/grouped_topk, yielding 5% throughput enhancement (#12637)
    • Enable MLA for DeepSeek VL2 (#12729)
    • Enable DeepSeek model on ROCm (#12662)

Core Engine

  • Use VLLM_LOGITS_PROCESSOR_THREADS to speed up structured decoding in high batch size scenarios (#12368)

Security Update

  • Improve hash collision avoidance in prefix caching (#12621)
  • Add SPDX-License-Identifier headers to python source files (#12628)

Other

  • Enable FusedSDPA support for Intel Gaudi (HPU) (#12359)

What's Changed

New Contributors

Full Changelog: v0.7.1...v0.7.2

Don't miss a new vllm release

NewReleases is sending notifications on new releases.