sgl-project/sglang gateway-v0.2.3 on GitHub

🚀 SGLang Model Gateway - New Release!

We're excited to announce another powerful update to SGLang Model Gateway with performance improvements and expanded database support!

✨ Headline Features

⚡ Bucket Mode Routing - 20-30% Performance Boost
Introducing our new bucket-based routing algorithm that dramatically improves performance in PD mode. See up to 20-30% improvements in TTFT (Time To First Token) and overall throughput

💾 PostgreSQL Support for Chat History Management
Flexibility in data storage! We now support PostgreSQL alongside OracleDB and in-memory storage for chat history management.

🛠️ Enhanced Model Tool & Structured Output Support

MinMax M2 model support!
Structured model output for OpenAI and gRPC router
Streaming parsing with Tool Choice in chat completions API
Tool_choice support for Responses API
OutputItemDone events with output item array storage for better observability

🐛 Stability & Quality Improvements

Multiple bug fixes for model validation, streaming logic, reasoning content indexing, and CI stability enhancements.

🔧 Code Quality Enhancements

Refactored builders for chat and responses, restructured modules for better maintainability, and consolidated error handling.

Try the latest version: pip install sglang-router --upgrade

What's Changed in Gateway

Gateway Changes (45 commits)

[model-gateway] smg release 0.2.3 (#13312) by @slin1237 in #13312
[router]Replace requests lib with openai in e2e_response_api (#13293) by @XinyueZhang369 in #13293
fix outdated router doc (#13255) by @fzyzcjy in #13255
[router][grpc] Refine docs in minimax_m2 to match other parsers (#13218) by @CatherineSue in #13218
fix: display served_model_name in /v1/models (#13155) by @Sunhaihua1 in #13155
[router] minmax-m2 xml tool parser (#13148) by @slin1237 in #13148
[router] remove worker url requirement (#13172) by @slin1237 in #13172
[router] Fix Flaky test_circuit_breaker_opens_and_recovers (#13164) by @XinyueZhang369 in #13164
[router] Add comprehensive validation to Responses API (#13127) by @key4ng in #13127
bugfix: multi-model routing for /generate api (#12979) by @SYChen123 in #12979
[router][grpc] Support vllm backend for grpc router (#13120) by @CatherineSue in #13120
[router] add minmax m2 reasoning parser (#13137) by @slin1237 in #13137
[router] Support complex assistant and tool messages in /chat/completions (#12860) by @hellodanylo in #12860
[router] move radix tree to policy crate and addreses some code styles (#13131) by @slin1237 in #13131
[Router] use call_id instead of id for matching function calls in Responses API for Harmony (#13056) by @zhaowenzi in #13056
Revert "fix: display served_model_name in /v1/models" (#13093) by @CatherineSue in #13093
fix: display served_model_name in /v1/models (#13063) by @Sunhaihua1 in #13063
[router] add postgres databases data connector (#12218) by @lengrongfu in #12218
[router][ci] Quick Improvement to make CI more stable (#12869) by @key4ng in #12869
[router][ci] Fix maturin build (#13012) by @key4ng in #13012
[router] bucket policy (#11719) by @syy-hw in #11719
[router] Switch MCP tests from DeepWiki to self-hosted Brave search server (#12849) by @key4ng in #12849
[router][grpc] Move all error logs to their call sites (#12859) by @CatherineSue in #12859
[router][grpc] Refactor: Add builders for chat and responses (#12852) by @CatherineSue in #12852
[router] Support structured model output for openai and grpc router (#12431) by @key4ng in #12431
[router][grpc] Add more mcp test cases to responses api (#12749) by @CatherineSue in #12749
fix ci (#12760) by @key4ng in #12760
Add timing metrics for requests (#12646) by @cicirori in #12646
[router][ci] Disable cache (#12752) by @key4ng in #12752
[router][grpc] Support mixin tool calls in Responses API (#12736) by @CatherineSue in #12736
Revert "[router] web_search_preview tool basic implementation" (#12716) by @key4ng in #12716
[router] add basic ci tests for gpt-oss model support (#12651) by @key4ng in #12651
[router][quick fix] Add minimal option for reasoning effort in spec (#12711) by @key4ng in #12711
[router][grpc] Make harmony parser checks recipient first before channel (#12713) by @CatherineSue in #12713
[router][ci] speed up python binding to 1.5 min (#12673) by @key4ng in #12673
[router] fix: validate HTTP status codes in health check (#12631) by @wyx-0203 in #12631
[router][grpc] Support streaming parsing with Tool Choice in chat completions API (#12677) by @CatherineSue in #12677
[router][grpc] Implement tool_choice support for Responses API (#12668) by @CatherineSue in #12668
[router][grpc] Emit OutputItemDone event and store output item array (#12656) by @CatherineSue in #12656
[router][grpc] Fix index issues in reasoning content and missing streaming events (#12650) by @CatherineSue in #12650
[router][grpc] Fix model validation, tool call check, streaming logic and misc in responses (#12616) by @CatherineSue in #12616
Support aggregating engine metrics in sgl-router (#11456) by @fzyzcjy in #11456
[router][grpc] Restructure modules and code clean up (#12598) by @CatherineSue in #12598
[router][grpc] Consolidate error messages build in error.rs (#12301) by @CatherineSue in #12301
[ci] install released version router (#12410) by @key4ng in #12410

New Contributors

@XinyueZhang369 made their first contribution in 2cdde3d46
@Sunhaihua1 made their first contribution in a06c44f90
@zhaowenzi made their first contribution in 7b877ab83
@cicirori made their first contribution in 58095cb00
@wyx-0203 made their first contribution in 3651cfbf6
@syy-hw made their first contribution in 611a4fd08
@SYChen123 made their first contribution in 4ef439054
@hellodanylo made their first contribution in d28caaf60

Paths Included

sgl-router
python/sglang/srt/grpc
python/sglang/srt/entrypoints/grpc_server.py

Full Changelog: gateway-v0.2.2...gateway-v0.2.3

sgl-project/sglang gateway-v0.2.3 Release Gateway-v0.2.3 on GitHub