github vllm-project/vllm v0.1.4
vLLM v0.1.4

latest releases: v0.6.3.post1, v0.6.3, v0.6.2...
14 months ago

Major changes

  • From now on, vLLM is published with pre-built CUDA binaries. Users don't have to compile the vLLM's CUDA kernels on their machine.
  • New models: InternLM, Qwen, Aquila.
  • Optimizing CUDA kernels for paged attention and GELU.
  • Many bug fixes.

What's Changed

New Contributors

Full Changelog: v0.1.3...v0.1.4

Don't miss a new vllm release

NewReleases is sending notifications on new releases.