github vllm-project/vllm v0.2.6

latest releases: v0.6.3.post1, v0.6.3, v0.6.2...
10 months ago

Major changes

  • Fast model execution with CUDA/HIP graph
  • W4A16 GPTQ support (thanks to @chu-tianxiang)
  • Fix memory profiling with tensor parallelism
  • Fix *.bin weight loading for Mixtral models

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.2.6

Don't miss a new vllm release

NewReleases is sending notifications on new releases.