๐ SGLang Model Gateway v0.2.1 Released!
This release focuses on stability, cleanup, and two big new performance features.
๐งพ Docs & CI
- Updated router documentation to reflect recent feature additions
๐งน Code Cleanup
- Refactored StopSequenceDecoder for cleaner incremental decoding
- Added spec.rs test harness under spec/ for structured unit tests
๐ Bug Fixes
- Fixed UTF-8 boundary in stop-sequence decoding
- Fixed gRPC timeout configuration
- Fixed worker filtering, tool-choice normalization, and bootstrap-port handling
- Additional gRPC server warm-up and concurrency fixes
๐ New Features
- Two-Level Tokenizer Caching (L0 + L1)
- L0: exact-match cache for repeated prompts
- L1: prefix-aware cache at special-token boundaries
- OpenAI-Style Classification API โ new /v1/classifications endpoint, shout out to yanbo for the contribution
- Worker Management Workflow Engine โ improved async registration, worker self discovery, and health orchestration
What's Changed in Gateway
Gateway Changes (26 commits)
- [router] release router 0.2.1 (#11885) by @slin1237 in #11885
- [router][grpc] Fix wram-up random token ids for small models (#11887) by @CatherineSue in #11887
- [router] clean up workflow logs to debug for implementation details logs (#11886) by @slin1237 in #11886
- fix(sql-router): fix conflict port in test (#11826) by @htiennv in #11826
- [router][grpc] Remove
continue_final_messageinChatTemplateParamsand addminijinja-contrib(#11882) by @CatherineSue in #11882 - [router] remove encoding header for oai router (#11881) by @slin1237 in #11881
- [router] Worker Management Workflow Engine (#11868) by @slin1237 in #11868
- [2/2] [feature] support openai like classification api in router (#11670) by @whybeyoung in #11670
- [router] Add Configurable L0 and L1 Tokenizer Caching (#11688) by @slin1237 in #11688
- [router][grpc] Support parallel queue puts in grpc_request_manager and remove mutex for grpc_client (#11798) by @CatherineSue in #11798
- [Lint] Add
python/sglangto ruff F401 checks and remove unused imports in files (#11685) by @CatherineSue in #11685 - [router][grpc] Remove timeout for connections and remove
max_tokensdeprecation warning log (#11775) by @CatherineSue in #11775 - [doc] update router document (#11767) by @key4ng in #11767
- [router] fix grpc client time out to 1h (#11768) by @slin1237 in #11768
- [router] Fix UTF-8 Boundary Panic in Stop Sequence Decoder (#11766) by @slin1237 in #11766
- Revert "[router] fix get_models endpoint for openai router (#11687)" (#11740) by @key4ng in #11687
- [router] Add rustfmt and set group imports by default (#11732) by @CatherineSue in #11732
- [router] add spec.rs to enables tests under spec folder (#11734) by @key4ng in #11734
- [router] Fix tool_choice normalization in ChatCompletionRequest and fix ut (#11731) by @CatherineSue in #11731
- [router][grpc] add dissag info to warm up in grpc server (#11727) by @slin1237 in #11727
- [router] fix p and d worker filtering and bootstrap port handling (#11729) by @slin1237 in #11729
- [Router] Refactor protocol definitions: split spec.rs into modular files (#11677) by @key4ng in #11677
- [router] fix get_models endpoint for openai router (#11687) by @key4ng in #11687
- [router] Refactor StopSequenceDecoder to Use Sequence for Incremental Decoding (#11676) by @slin1237 in #11676
- [router][grpc] Simplify model_id determination (#11684) by @CatherineSue in #11684
- [router] Fix response api related spec (#11621) by @key4ng in #11621
Paths Included
sgl-routerpython/sglang/srt/grpcpython/sglang/srt/entrypoints/grpc_server.py
Full Changelog: gateway-v0.2.0...gateway-v0.2.1