pypi vllm 0.18.1
v0.18.1

7 hours ago

This is a patch release on top of v0.18.0 to address a few issues:

  • Change default SM100 MLA prefill backend back to TRT-LLM (#38562)
  • Fix mock.patch resolution failure for standalone_compile.FakeTensorMode on Python <= 3.10 (#37158)
  • Disable monolithic TRTLLM MoE for Renormalize routing #37605
  • Pre-download missing FlashInfer headers in Docker build #38391
  • Fix DeepGemm E8M0 accuracy degradation for Qwen3.5 FP8 on Blackwell (#38083)

Don't miss a new vllm release

NewReleases is sending notifications on new releases.