github sgl-project/sglang v0.3.0
Release v0.3.0

latest releases: v0.5.3rc0, v0.5.2, v0.5.2rc2...
12 months ago

Highlights

Checkout the release blog post https://lmsys.org/blog/2024-09-04-sglang-v0-3/ to find detailed instructions and descriptions for the items below.

  • Up to 7x higher throughput for DeepSeek Multi-head Latent Attention (MLA)
  • Up to 1.5x lower latency with torch.compile on small batch sizes
  • Support for interleaved text and multi-image/video in LLaVA-OneVision
  • Support for interleaved window attention and 2x longer context length in Gemma-2
  • Chunked prefill is turned on by default (You can choose separate or mix prefill and decode).
  • Add multi-GPU accuracy, performance test, and nightly accuracy test for more models.

What's Changed

New Contributors

Full Changelog: v0.2.13...v0.3.0

Don't miss a new sglang release

NewReleases is sending notifications on new releases.