github vllm-project/vllm v0.3.3

latest releases: vtest, v0.5.0.post1, v0.5.0...
4 months ago

Major changes

  • StarCoder2 support
  • Performance optimization and LoRA support for Gemma
  • 2/3/8-bit GPTQ support
  • Integrate Marlin Kernels for Int4 GPTQ inference
  • Performance optimization for MoE kernel
  • [Experimental] AWS Inferentia2 support
  • [Experimental] Structured output (JSON, Regex) in OpenAI Server

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.3

Don't miss a new vllm release

NewReleases is sending notifications on new releases.