Highlights

Add initial TPU integration (#5292)
Fix crashes when using FlashAttention backend (#5478)
Fix issues when using num_devices < num_available_devices (#5473)

What's Changed

[CI/Build] Add is_quant_method_supported to control quantization test configurations by @mgoin in #5253
Revert "[CI/Build] Add is_quant_method_supported to control quantization test configurations" by @simon-mo in #5463
[CI] Upgrade codespell version. by @rkooo567 in #5381
[Hardware] Initial TPU integration by @WoosukKwon in #5292
[Bugfix] Add device assertion to TorchSDPA by @bigPYJ1151 in #5402
[ci] Add AMD, Neuron, Intel tests for AWS CI and turn off default soft fail for GPU tests by @khluu in #5464
[Kernel] Vectorized FP8 quantize kernel by @comaniac in #5396
[Bugfix] TYPE_CHECKING for MultiModalData by @kimdwkimdw in #5444
[Frontend] [Core] Support for sharded tensorized models by @tjohnson31415 in #4990
[misc] add hint for AttributeError by @youkaichao in #5462
[Doc] Update debug docs by @DarkLight1337 in #5438
[Bugfix] Fix typo in scheduler.py (requeset -> request) by @mgoin in #5470
[Frontend] Add "input speed" to tqdm postfix alongside output speed by @mgoin in #5425
[Bugfix] Fix wrong multi_modal_input format for CPU runner by @Isotr0py in #5451
[Core][Distributed] add coordinator to reduce code duplication in tp and pp by @youkaichao in #5293
[ci] Use sccache to build images by @khluu in #5419
[Bugfix]if the content is started with ":"(response of ping), client should i… by @sywangyi in #5303
[Kernel] w4a16 support for compressed-tensors by @dsikka in #5385
[CI/Build][REDO] Add is_quant_method_supported to control quantization test configurations by @mgoin in #5466
[Kernel] Tune Qwen2MoE kernel configurations with tp2,4 by @wenyujin333 in #5497
[Hardware][Intel] Optimize CPU backend and add more performance tips by @bigPYJ1151 in #4971
[Docs] Add 4th meetup slides by @WoosukKwon in #5509
[Misc] Add vLLM version getter to utils by @DarkLight1337 in #5098
[CI/Build] Simplify OpenAI server setup in tests by @DarkLight1337 in #5100
[Doc] Update LLaVA docs by @DarkLight1337 in #5437
[Kernel] Factor out epilogues from cutlass kernels by @tlrmchlsmth in #5391
[MISC] Remove FP8 warning by @comaniac in #5472
Seperate dev requirements into lint and test by @Yard1 in #5474
Revert "[Core] Remove unnecessary copies in flash attn backend" by @Yard1 in #5478
[misc] fix format.sh by @youkaichao in #5511
[CI/Build] Disable test_fp8.py by @tlrmchlsmth in #5508
[Kernel] Disable CUTLASS kernels for fp8 by @tlrmchlsmth in #5505
Add cuda_device_count_stateless by @Yard1 in #5473
[Hardware][Intel] Support CPU inference with AVX2 ISA by @DamonFool in #5452
[Bugfix]typofix by @AllenDou in #5507
bump version to v0.5.0.post1 by @simon-mo in #5522

New Contributors

@kimdwkimdw made their first contribution in #5444
@sywangyi made their first contribution in #5303

Full Changelog: v0.5.0...v0.5.0.post1

vllm 0.5.0.post1 v0.5.0.post1 on Python PyPI

Highlights

What's Changed

New Contributors

vllm 0.5.0.post1
v0.5.0.post1

on Python PyPI