github sgl-project/sglang v0.5.7

one day ago

Highlights

  • New Model Support:
  • Model Gateway v0.3.0 Release:
    https://docs.sglang.io/advanced_features/sgl_model_gateway.html
  • Scalable pipeline parallelism with dynamic chunking support for ultra-long contexts (PP Refactor Roadmap #11857
  • Encoder Disaggregation for Multi-modal models (Roadmap #15118)
  • SGLang-Diffusion:
    • Set --dit-layerwise-offload true to reduce peak VRAM usage by up to 30GB, and improve performance by up to 58% for all models
    • Significantly reduce the latency of Qwen-Image-Edit, making it one-of-the-fastest among all open-source solutions. More improvements are on the way
    • Add support for AMD/4090/5090, along with additional attention choices (sage-attn, sage-attn3), more parallelism options (TP) and enhancements to HTTP API (Google vertex supported)
    • Cache-dit integration to improve performance by up to 165%

What's Changed

New Contributors

Full Changelog: v0.5.6...v0.5.7

Don't miss a new sglang release

NewReleases is sending notifications on new releases.