What's Changed
- [V1] Remove input cache client by @DarkLight1337 in #14864
- [Misc][XPU] Use None as device capacity for XPU by @yma11 in #14932
- [Doc] Add vLLM Beijing meetup slide by @heheda12345 in #14938
- setup.py: drop assumption about local
main
branch by @russellb in #14692 - [MISC] More AMD unused var clean up by @houseroad in #14926
- fix minor miscalled method by @kushanam in #14327
- [V1][TPU] Apply the ragged paged attention kernel fix and remove the padding. by @vanbasten23 in #14846
- [Bugfix] Fix Ultravox on V1 by @DarkLight1337 in #14929
- [Misc] Add
--seed
option to offline multi-modal examples by @DarkLight1337 in #14934 - [Bugfix][ROCm] running new process using spawn method for rocm in tests. by @vllmellm in #14810
- [Doc] Fix misleading log during multi-modal profiling by @DarkLight1337 in #14955
- Add patch merger by @patrickvonplaten in #14957
- [V1] Default MLA to V1 by @simon-mo in #14921
- [Bugfix] Fix precommit - line too long in pixtral.py by @tlrmchlsmth in #14960
- [Bugfix][Model] Mixtral: use unused head_dim config argument by @qtrrb in #14961
- [Fix][Structured Output] using vocab_size to construct matcher by @aarnphm in #14868
- [Bugfix] Make Gemma3 MM V0 only for now by @ywang96 in #14971
New Contributors
Full Changelog: v0.8.0rc1...v0.8.0rc2