github sgl-project/sglang v0.5.8

12 hours ago

Highlights

https://lmsys.org/blog/2026-01-16-sglang-diffusion/
https://lmsys.org/blog/2026-01-15-chunked-pipeline/
https://lmsys.org/blog/2026-01-21-novita-glm4/
https://lmsys.org/blog/2026-01-12-epd/

New Model Support

DeepSeek V3.2 Optimization

  • Context Parallelism Optimization with support for fused MoE, multi-batch, and FP8 KV cache: #13959

Flash Attention 4

  • Support for Flash Attention 4 decoding kernels: #16034

SGLang-Diffusion

  • Run sglang-diffusion with diffusers backend
  • Features: Multi-LoRA inference, SLA attention backends, warmup switch in CLI, ComfyUI Plugin
  • Performance improvements for all models

Dependencies

  • sgl-kernel updated to 0.3.21: #17075
  • Cutedsl updated to 4.3.4: #17075
  • Added dependencies for tvm-ffi and quack-kernels: #17075
  • Flashinfer updated to 0.6.1: #15551
  • Mooncake transfer engine updated to 0.3.8.post1: #16792

Security

  • Fixed urllib and gpgv vulnerabilities: #17439

What's Changed

Don't miss a new sglang release

NewReleases is sending notifications on new releases.