github vllm-project/vllm v0.2.2

latest releases: v0.6.1.post2, v0.6.1.post1, v0.6.1...
10 months ago

Major changes

  • Bump up to PyTorch v2.1 + CUDA 12.1 (vLLM+CUDA 11.8 is also provided)
  • Extensive refactoring for better tensor parallelism & quantization support
  • New models: Yi, ChatGLM, Phi
  • Changes in scheduler: from 1D flattened input tensor to 2D tensor
  • AWQ support for all models
  • Added LogitsProcessor API
  • Preliminary support for SqueezeLLM

What's Changed

New Contributors

Full Changelog: v0.2.1...v0.2.2

Don't miss a new vllm release

NewReleases is sending notifications on new releases.