github sgl-project/sglang v0.4.8
Release v0.4.8

latest releases: v0.5.2rc2, v0.5.2rc1, v0.5.2rc0...
2 months ago

Highlights

OpenAI-Compatible Server Refactor

Re-structured the OpenAI-compatible server to support production and enterprise environments. Key improvements include:

  • Consistent metrics and logging for better observability and debugging.

  • Unified error handling, request validation, and processing logic for improved reliability and maintainability.

  • Improved request tracking across sessions and components.

  • Fixed bugs in embedding requests and reasoning parsers.

This work was a collaborative effort involving engineers from academic and industry institutions. Special thanks to the Oracle Cloud team and the SGLang team and community — including @slin1237, @CatherineSue, @key4ng, @JustinTong0323, @jhinpan, @yhyang201, @woodx9 and @whybeyoung — for their invaluable contributions.

DeepSeek R1 FP4 on Blackwell GPU

Added support for DeepSeek R1 with FP4 and MTP on NVIDIA Blackwell GPU.

  • Integrated FlashInfer NVFP4 MoE, supporting TP, EP, and DP.

  • Supported 2-stream shared expert execution.

  • Achieved up to 90 TPS per user at isl/osl/bs = 1k/1k/16 on B200.

Further optimization in progress. Special thanks to the FlashInfer, NVIDIA Enterprise Products, Novita AI, DataCrunch, Google Cloud, and SGLang teams — especially @Alcanderian and @pyc96 — for their critical contributions.

Breaking Change: OpenAI-Compatible API Module Moved

The sglang/srt/openai_api directory has been removed and replaced with sglang/srt/entrypoints/openai.

Update your imports to the new module path. For example:

- from sglang.srt.openai_api.protocol import Tool
+ from sglang.srt.entrypoints.openai.protocol import Tool

What's Changed

New Contributors

Full Changelog: v0.4.7...v0.4.8

Don't miss a new sglang release

NewReleases is sending notifications on new releases.