github vllm-project/vllm v0.6.3.post1

17 hours ago

Highlights

New Models

  • Support Ministral 3B and Ministral 8B via interleaved attention (#9414)
  • Support multiple and interleaved images for Llama3.2 (#9095)
  • Support VLM2Vec, the first multimodal embedding model in vLLM (#9303)

Important bug fix

  • Fix chat API continuous usage stats (#9357)
  • Fix vLLM UsageInfo and logprobs None AssertionError with empty token_ids (#9034)
  • Fix Molmo text-only input bug (#9397)
  • Fix CUDA 11.8 Build (#9386)
  • Fix _version.py not found issue (#9375)

Other Enhancements

  • Remove block manager v1 and make block manager v2 default (#8704)
  • Spec Decode Optimize ngram lookup performance (#9333)

What's Changed

New Contributors

Full Changelog: v0.6.3...v0.6.3.post1

Don't miss a new vllm release

NewReleases is sending notifications on new releases.