🚀 SGLang Model Gateway - New Release!
We're excited to announce another powerful update to SGLang Model Gateway with performance improvements and expanded database support!
✨ Headline Features
⚡ Bucket Mode Routing - 20-30% Performance Boost
Introducing our new bucket-based routing algorithm that dramatically improves performance in PD mode. See up to 20-30% improvements in TTFT (Time To First Token) and overall throughput
💾 PostgreSQL Support for Chat History Management
Flexibility in data storage! We now support PostgreSQL alongside OracleDB and in-memory storage for chat history management.
🛠️ Enhanced Model Tool & Structured Output Support
- MinMax M2 model support!
- Structured model output for OpenAI and gRPC router
- Streaming parsing with Tool Choice in chat completions API
- Tool_choice support for Responses API
- OutputItemDone events with output item array storage for better observability
🐛 Stability & Quality Improvements
Multiple bug fixes for model validation, streaming logic, reasoning content indexing, and CI stability enhancements.
🔧 Code Quality Enhancements
Refactored builders for chat and responses, restructured modules for better maintainability, and consolidated error handling.
Try the latest version: pip install sglang-router --upgrade
What's Changed in Gateway
Gateway Changes (45 commits)
- [model-gateway] smg release 0.2.3 (#13312) by @slin1237 in #13312
- [router]Replace requests lib with openai in e2e_response_api (#13293) by @XinyueZhang369 in #13293
- fix outdated router doc (#13255) by @fzyzcjy in #13255
- [router][grpc] Refine docs in minimax_m2 to match other parsers (#13218) by @CatherineSue in #13218
- fix: display served_model_name in /v1/models (#13155) by @Sunhaihua1 in #13155
- [router] minmax-m2 xml tool parser (#13148) by @slin1237 in #13148
- [router] remove worker url requirement (#13172) by @slin1237 in #13172
- [router] Fix Flaky test_circuit_breaker_opens_and_recovers (#13164) by @XinyueZhang369 in #13164
- [router] Add comprehensive validation to Responses API (#13127) by @key4ng in #13127
- bugfix: multi-model routing for /generate api (#12979) by @SYChen123 in #12979
- [router][grpc] Support vllm backend for grpc router (#13120) by @CatherineSue in #13120
- [router] add minmax m2 reasoning parser (#13137) by @slin1237 in #13137
- [router] Support complex assistant and tool messages in /chat/completions (#12860) by @hellodanylo in #12860
- [router] move radix tree to policy crate and addreses some code styles (#13131) by @slin1237 in #13131
- [Router] use call_id instead of id for matching function calls in Responses API for Harmony (#13056) by @zhaowenzi in #13056
- Revert "fix: display served_model_name in /v1/models" (#13093) by @CatherineSue in #13093
- fix: display served_model_name in /v1/models (#13063) by @Sunhaihua1 in #13063
- [router] add postgres databases data connector (#12218) by @lengrongfu in #12218
- [router][ci] Quick Improvement to make CI more stable (#12869) by @key4ng in #12869
- [router][ci] Fix maturin build (#13012) by @key4ng in #13012
- [router] bucket policy (#11719) by @syy-hw in #11719
- [router] Switch MCP tests from DeepWiki to self-hosted Brave search server (#12849) by @key4ng in #12849
- [router][grpc] Move all error logs to their call sites (#12859) by @CatherineSue in #12859
- [router][grpc] Refactor: Add builders for chat and responses (#12852) by @CatherineSue in #12852
- [router] Support structured model output for openai and grpc router (#12431) by @key4ng in #12431
- [router][grpc] Add more mcp test cases to responses api (#12749) by @CatherineSue in #12749
- fix ci (#12760) by @key4ng in #12760
- Add timing metrics for requests (#12646) by @cicirori in #12646
- [router][ci] Disable cache (#12752) by @key4ng in #12752
- [router][grpc] Support mixin tool calls in Responses API (#12736) by @CatherineSue in #12736
- Revert "[router] web_search_preview tool basic implementation" (#12716) by @key4ng in #12716
- [router] add basic ci tests for gpt-oss model support (#12651) by @key4ng in #12651
- [router][quick fix] Add minimal option for reasoning effort in spec (#12711) by @key4ng in #12711
- [router][grpc] Make harmony parser checks recipient first before channel (#12713) by @CatherineSue in #12713
- [router][ci] speed up python binding to 1.5 min (#12673) by @key4ng in #12673
- [router] fix: validate HTTP status codes in health check (#12631) by @wyx-0203 in #12631
- [router][grpc] Support streaming parsing with Tool Choice in chat completions API (#12677) by @CatherineSue in #12677
- [router][grpc] Implement tool_choice support for Responses API (#12668) by @CatherineSue in #12668
- [router][grpc] Emit OutputItemDone event and store output item array (#12656) by @CatherineSue in #12656
- [router][grpc] Fix index issues in reasoning content and missing streaming events (#12650) by @CatherineSue in #12650
- [router][grpc] Fix model validation, tool call check, streaming logic and misc in responses (#12616) by @CatherineSue in #12616
- Support aggregating engine metrics in sgl-router (#11456) by @fzyzcjy in #11456
- [router][grpc] Restructure modules and code clean up (#12598) by @CatherineSue in #12598
- [router][grpc] Consolidate error messages build in error.rs (#12301) by @CatherineSue in #12301
- [ci] install released version router (#12410) by @key4ng in #12410
New Contributors
- @XinyueZhang369 made their first contribution in 2cdde3d46
- @Sunhaihua1 made their first contribution in a06c44f90
- @zhaowenzi made their first contribution in 7b877ab83
- @cicirori made their first contribution in 58095cb00
- @wyx-0203 made their first contribution in 3651cfbf6
- @syy-hw made their first contribution in 611a4fd08
- @SYChen123 made their first contribution in 4ef439054
- @hellodanylo made their first contribution in d28caaf60
Paths Included
sgl-routerpython/sglang/srt/grpcpython/sglang/srt/entrypoints/grpc_server.py
Full Changelog: gateway-v0.2.2...gateway-v0.2.3