github sgl-project/sglang v0.3.4.post1
Release v0.3.4.post1

latest releases: v0.5.2, v0.5.2rc2, v0.5.2rc1...
10 months ago

Highlights

  • Hosted the first LMSYS online meetup: Efficient LLM Deployment and Serving.
    • Covered CPU overhead hiding, faster constrained decoding, and DeepSeek MLA. Slides
  • Added Engine API for offline inference with reduced overhead. Usage. #1614 #1567
  • Added an overlap scheduler for reducing CPU overhead #1738
  • New models: Llama 3.2 (#1551), QWen-VL2 (#1721), OLMo (#1676), GLM 4 (#1736).
  • Added support for reward models #1525.
  • Added support for Intel XPU #1480.
  • Improved stability for greedy decoding #1589.
  • Accelerated multi-LoRA serving #1587.

What's Changed

New Contributors

Full Changelog: v0.3.2...v0.3.4.post1

Don't miss a new sglang release

NewReleases is sending notifications on new releases.