pypi vllm 0.5.3
v0.5.3

latest releases: 0.6.3.post1, 0.6.3, 0.6.2...
3 months ago

Highlights

Model Support

  • PLACEHOLDER
  • Support Mistral-Nemo (#6548)
  • Support Chameleon (#6633, #5770)
  • Pipeline parallel support for Mixtral (#6516)

Hardware Support

Performance Enhancements

  • Add AWQ support to the Marlin kernel. This brings significant (1.5-2x) perf improvements to existing AWQ models! (#6612)
  • Progress towards refactoring for SPMD worker execution. (#6032)
  • Progress in improving prepare inputs procedure. (#6164, #6338, #6596)
  • Memory optimization for pipeline parallelism. (#6455)

Production Engine

  • Correctness testing for pipeline parallel and CPU offloading (#6410, #6549)
  • Support dynamically loading Lora adapter from HuggingFace (#6234)
  • Pipeline Parallel using stdlib multiprocessing module (#6130)

Others

  • A CPU offloading implementation, you can now use --cpu-offload-gb to control how much memory to "extend" the RAM with. (#6496)
  • The new vllm CLI is now ready for testing. It comes with three commands: serve, complete, and chat. Feedback and improvements are greatly welcomed! (#6431)
  • The wheels now build on Ubuntu 20.04 instead of 22.04. (#6517)

What's Changed

New Contributors

Full Changelog: v0.5.2...v0.5.3

Don't miss a new vllm release

NewReleases is sending notifications on new releases.