pypi vllm 0.8.1
v0.8.1

one day ago

This release contains important bug fixes for v0.8.0. We highly recommend upgrading!

  • V1 Fixes

    • Ensure using int64 for sampled token ids (#15065)
    • Fix long dtype in topk sampling (#15049)
    • Refactor Structured Output for multiple backends (#14694)
    • Fix size calculation of processing cache (#15114)
    • Optimize Rejection Sampler with Triton Kernels (#14930)
    • Fix oracle for device checking (#15104)
  • TPU

    • Fix chunked prefill with padding (#15037)
    • Enhanced CI/CD (#15054, 14974)
  • Model

    • Re-enable Gemma3 for V1 (#14980)
    • Embedding model support LoRA (#14935)
    • Pixtral: Remove layer instantiation duplication (#15053)

What's Changed

New Contributors

Full Changelog: v0.8.0...v0.8.1

Don't miss a new vllm release

NewReleases is sending notifications on new releases.