π SGLang Model Gateway v0.2.2 Released!
β¨ Features
π― Industry-First Responses API for All Models
We're bringing OpenAI's Responses API to the entire open-source ecosystem! Now enjoy native support for Llama, DeepSeek, Qwen, and more β with built-in chat history management, multi-turn conversations, and seamless MCP integration. This is the first solution to democratize advanced conversation management across all OSS models.
βΈοΈ Production-Ready Kubernetes Operations
Taking large-scale deployments seriously! We now support native gRPC health check endpoints, making it effortless to deploy and operate SGLang at scale on Kubernetes with proper health monitoring and orchestration.
π Your Network, Your Control
- mTLS Support: Secure gateway-to-SGLang communication whether you're running on edge, remote cloud, multi-cloud, or hybrid environments β we've got you covered
- MCP Proxy Enhancements: Configure proxies globally or per-individual MCP server β complete network control in your hands
π€ Harmony Pipeline
Introducing our unified OpenAI-native architecture with GPT OSS model support for both Responses API and Chat Completion β fully integrated with MCP and intelligent storage management.
π Universal Platform Support
A major leap in accessibility! SGLang Model Gateway now runs on nearly every operating system and architecture: Linux, Windows, Mac, x86, and ARM. Even better β we support all Python versions from 3.8 to 3.14 in a single wheel file, while reducing wheel size by more than 40%. Deploy anywhere, on any Python version, with unprecedented efficiency!
β‘ Additional Enhancements
- Multi-worker URL support for better load distribution
- Connection pooling and tool inventory for MCP
- Native OpenAI web search tool support and function calling for OpenAI router
π Stability Improvements
We've squashed numerous bugs including background task handling, tool call IDs, conversation management, and installation dependencies.
Try it now: pip install sglang-router==0.2.2
What's Changed in Gateway
Gateway Changes (48 commits)
- [router] 0.2.2 release (#12399) by @slin1237 in #12399
- [router] web_search_preview tool basic implementation (#12290) by @key4ng in #12290
- [router] Function call support for openai router Responses API (#12386) by @key4ng in #12386
- [router] Fix safety_identifier missing (#12404) by @key4ng in #12404
- [router] use safety_identifier replace user on chat history storage (#12185) by @lengrongfu in #12185
- [router] harmony responses api streaming support (#12395) by @slin1237 in #12395
- [router] Harmony Pipeline: Chat Completion & Responses API with MCP Support (#12153) by @slin1237 in #12153
- [bug] fix router installation to include additional dependency (#12348) by @slin1237 in #12348
- [router] refactor mcp to use LRU and fix pooling bug (#12346) by @CatherineSue in #12346
- [bug] fix router pypi license file (#12345) by @slin1237 in #12345
- [router] fix router release workflow and add build test in PR (#12315) by @CatherineSue in #12315
- [Bug fix] trace: fix import error in mini_lb if sgl-router image does not install sglang (#12338) by @sufeng-buaa in #12338
- [router][grpc] Fix inconsistent behavior of conversation_id not found (#12299) by @CatherineSue in #12299
- [router] support arm, windows, mac, linux, reduce wheel size and number (#12285) by @slin1237 in #12285
- [rust][ci] Add end-to-end tests for Oracle history backend (#12233) by @key4ng in #12233
- [router] upgrade grpc dependency and py 3.13 3.14 support (#12284) by @slin1237 in #12284
- [router] Fix type unmatch during validation (#12257) by @key4ng in #12257
- [Feature] Sglang Tracing: Fine-Grained Tracking for Request Latency - Part 2 (#10804) by @sufeng-buaa in #10804
- [router] configure workflow retries and timeout based on routerConfig (#12252) by @slin1237 in #12252
- [router] use mcp struct from sdk and clean up code across codebase (#12249) by @slin1237 in #12249
- [router] remove code duplication (#12245) by @slin1237 in #12245
- [sgl-route] Optimize the use of constant slices and retain to simplif⦠(#12159) by @lengrongfu in #12159
- [router] Remove SharedXxxStorage type aliases to make Arc explicit (#12171) by @CatherineSue in #12171
- [router][grpc] Add
ResponsesContextand fix error propagation in responses api (#12164) by @CatherineSue in #12164 - [misc][grpc] Remove duplicate log (#12168) by @CatherineSue in #12168
- [router] centralize mcp tool args handling (#12155) by @slin1237 in #12155
- [router][grpc] Fix tool call id in
parse_json_schema_response(#12152) by @CatherineSue in #12152 - [router] cleaned up all the redundant comments in the config module (#12147) by @CatherineSue in #12147
- [router] MCP Manager Refactoring - Flat Architecture with Connection Pooling (#12097) by @slin1237 in #12097
- [router] Refactor data connector architecture with unified storage modules (#12096) by @key4ng in #12096
- [router][grpc] Remove gpt_oss parsers and remove _parser suffix in tool parser files (#12091) by @CatherineSue in #12091
- [router] migrate app context to builder pattern 2/n (#12089) by @slin1237 in #12089
- [router] migrate app context to builder pattern 1/n (#12086) by @slin1237 in #12086
- [router] fix ut router config init to use build pattern (#12084) by @slin1237 in #12084
- [router] implement response api get input item function and refactor input/output store (#11924) by @key4ng in #11924
- [router] Add mTLS Support for Router-to-Worker Communication (#12019) by @slin1237 in #12019
- [router] Add builder pattern for RouterConfig with zero duplication (#12030) by @slin1237 in #12030
- [router][CI] Clean up imports and prints statements in sgl-router/py_test (#12024) by @CatherineSue in #12024
- [router] change ci names and update log level in ci (#12021) by @slin1237 in #12021
- [Router] Consolidate ConnectionMode enum to core module (#11937) by @YouNeedCryDear in #11937
- [router] Add comprehensive E2E tests for Response API (#11988) by @key4ng in #11988
- [grpc] Support gRPC standard health check (#11955) by @CatherineSue in #11955
- [router] create worker removal step and clean up worker manager (#11921) by @slin1237 in #11921
- [router] Support multiple worker URLs for OpenAI router (#11723) by @key4ng in #11723
- [router][grpc] Fix background tasks stored with wrong id (#11945) by @CatherineSue in #11945
- [router] Add gRPC E2E test suite (#11790) by @key4ng in #11790
- [router][grpc] Support
v1/responsesAPI (#11926) by @CatherineSue in #11926 - Fix openai input_text type compatibility (#11935) by @key4ng in #11935
New Contributors
- @lengrongfu made their first contribution in 09af0a7b5
- @sufeng-buaa made their first contribution in ea9610600
Paths Included
sgl-routerpython/sglang/srt/grpcpython/sglang/srt/entrypoints/grpc_server.py
Full Changelog: gateway-v0.2.1...gateway-v0.2.2