vllm-project/vllm v0.8.0rc2 on GitHub

What's Changed

[V1] Remove input cache client by @DarkLight1337 in #14864
[Misc][XPU] Use None as device capacity for XPU by @yma11 in #14932
[Doc] Add vLLM Beijing meetup slide by @heheda12345 in #14938
setup.py: drop assumption about local main branch by @russellb in #14692
[MISC] More AMD unused var clean up by @houseroad in #14926
fix minor miscalled method by @kushanam in #14327
[V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. by @vanbasten23 in #14846
[Bugfix] Fix Ultravox on V1 by @DarkLight1337 in #14929
[Misc] Add --seed option to offline multi-modal examples by @DarkLight1337 in #14934
[Bugfix][ROCm] running new process using spawn method for rocm in tests. by @vllmellm in #14810
[Doc] Fix misleading log during multi-modal profiling by @DarkLight1337 in #14955
Add patch merger by @patrickvonplaten in #14957
[V1] Default MLA to V1 by @simon-mo in #14921
[Bugfix] Fix precommit - line too long in pixtral.py by @tlrmchlsmth in #14960
[Bugfix][Model] Mixtral: use unused head_dim config argument by @qtrrb in #14961
[Fix][Structured Output] using vocab_size to construct matcher by @aarnphm in #14868
[Bugfix] Make Gemma3 MM V0 only for now by @ywang96 in #14971

Full Changelog: v0.8.0rc1...v0.8.0rc2