github vllm-project/vllm v0.6.1.post1

latest releases: v0.6.3.post1, v0.6.3, v0.6.2...
2 months ago

Highlights

This release features important bug fixes and enhancements for

  • Pixtral models. (#8415, #8425, #8399, #8431)
    • Chunked scheduling has been turned off for vision models. Please replace --max_num_batched_tokens 16384 with --max-model-len 16384
  • Multistep scheduling. (#8417, #7928, #8427)
  • Tool use. (#8423, #8366)

Also

  • support multiple images for qwen-vl (#8247)
  • removes engine_use_ray (#8126)
  • add engine option to return only deltas or final output (#7381)
  • add bitsandbytes support for Gemma2 (#8338)

What's Changed

New Contributors

Full Changelog: v0.6.1...v0.6.1.post1

Don't miss a new vllm release

NewReleases is sending notifications on new releases.