github sgl-project/sglang v0.1.20
Release v0.1.20

latest releases: v0.4.1.post7, v0.4.1.post6, v0.4.1.post5...
6 months ago

Highlights

  • Enable CUDA graph by default. It brings 1.5x - 2x speedup for small batch size decoding (#612)
  • Model support: Gemma2, minicpm, Qwen2 MoE
  • Docker support (#217 )
  • Various latency optimizations

What's Changed

New Contributors

Full Changelog: v0.1.18...v0.1.20

Don't miss a new sglang release

NewReleases is sending notifications on new releases.