Highlight
v0.4.0 lacks support for sm70/75 support. We did a hotfix for it.
What's Changed
- [Kernel] Layernorm performance optimization by @mawong-amd in #3662
- [Doc] Update installation doc for build from source and explain the dependency on torch/cuda version by @youkaichao in #3746
- [CI/Build] Make Marlin Tests Green by @robertgshaw2-neuralmagic in #3753
- [Misc] Minor fixes in requirements.txt by @WoosukKwon in #3769
- [Misc] Some minor simplifications to detokenization logic by @njhill in #3670
- [Misc] Fix Benchmark TTFT Calculation for Chat Completions by @ywang96 in #3768
- [Speculative decoding 4/9] Lookahead scheduling for speculative decoding by @cadedaniel in #3250
- [Misc] Add support for new autogptq checkpoint_format by @Qubitium in #3689
- [Misc] [CI/Build] Speed up block manager CPU-only unit tests ~10x by opting-out of GPU cleanup by @cadedaniel in #3783
- [Hardware][Intel] Add CPU inference backend by @bigPYJ1151 in #3634
- [HotFix] [CI/Build] Minor fix for CPU backend CI by @bigPYJ1151 in #3787
- [Frontend][Bugfix] allow using the default middleware with a root path by @A-Mahla in #3788
- [Doc] Fix vLLMEngine Doc Page by @ywang96 in #3791
- [CI/Build] fix TORCH_CUDA_ARCH_LIST in wheel build by @youkaichao in #3801
- Fix crash when try torch.cuda.set_device in worker by @leiwen83 in #3770
- [Bugfix] Add
__init__.py
files forvllm/core/block/
andvllm/spec_decode/
by @mgoin in #3798 - [CI/Build] 0.4.0.post1, fix sm 7.0/7.5 binary by @youkaichao in #3803
New Contributors
- @mawong-amd made their first contribution in #3662
- @Qubitium made their first contribution in #3689
- @bigPYJ1151 made their first contribution in #3634
- @A-Mahla made their first contribution in #3788
Full Changelog: v0.4.0...v0.4.0.post1