github vllm-project/vllm v0.4.0.post1
v0.4.0.post1, restore sm70/75 support

latest releases: v0.6.2, v0.6.1.post2, v0.6.1.post1...
5 months ago

Highlight

v0.4.0 lacks support for sm70/75 support. We did a hotfix for it.

What's Changed

  • [Kernel] Layernorm performance optimization by @mawong-amd in #3662
  • [Doc] Update installation doc for build from source and explain the dependency on torch/cuda version by @youkaichao in #3746
  • [CI/Build] Make Marlin Tests Green by @robertgshaw2-neuralmagic in #3753
  • [Misc] Minor fixes in requirements.txt by @WoosukKwon in #3769
  • [Misc] Some minor simplifications to detokenization logic by @njhill in #3670
  • [Misc] Fix Benchmark TTFT Calculation for Chat Completions by @ywang96 in #3768
  • [Speculative decoding 4/9] Lookahead scheduling for speculative decoding by @cadedaniel in #3250
  • [Misc] Add support for new autogptq checkpoint_format by @Qubitium in #3689
  • [Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup by @cadedaniel in #3783
  • [Hardware][Intel] Add CPU inference backend by @bigPYJ1151 in #3634
  • [HotFix] [CI/Build] Minor fix for CPU backend CI by @bigPYJ1151 in #3787
  • [Frontend][Bugfix] allow using the default middleware with a root path by @A-Mahla in #3788
  • [Doc] Fix vLLMEngine Doc Page by @ywang96 in #3791
  • [CI/Build] fix TORCH_CUDA_ARCH_LIST in wheel build by @youkaichao in #3801
  • Fix crash when try torch.cuda.set_device in worker by @leiwen83 in #3770
  • [Bugfix] Add __init__.py files for vllm/core/block/ and vllm/spec_decode/ by @mgoin in #3798
  • [CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary by @youkaichao in #3803

New Contributors

Full Changelog: v0.4.0...v0.4.0.post1

Don't miss a new vllm release

NewReleases is sending notifications on new releases.