sgl-project/sglang gateway-v0.2.1 on GitHub

🚀 SGLang Model Gateway v0.2.1 Released!

This release focuses on stability, cleanup, and two big new performance features.

🧾 Docs & CI

Updated router documentation to reflect recent feature additions

🧹 Code Cleanup

Refactored StopSequenceDecoder for cleaner incremental decoding
Added spec.rs test harness under spec/ for structured unit tests

🐞 Bug Fixes

Fixed UTF-8 boundary in stop-sequence decoding
Fixed gRPC timeout configuration
Fixed worker filtering, tool-choice normalization, and bootstrap-port handling
Additional gRPC server warm-up and concurrency fixes

🌟 New Features

Two-Level Tokenizer Caching (L0 + L1)
L0: exact-match cache for repeated prompts
L1: prefix-aware cache at special-token boundaries
OpenAI-Style Classification API → new /v1/classifications endpoint, shout out to yanbo for the contribution
Worker Management Workflow Engine → improved async registration, worker self discovery, and health orchestration

What's Changed in Gateway

Gateway Changes (26 commits)

[router] release router 0.2.1 (#11885) by @slin1237 in #11885
[router][grpc] Fix wram-up random token ids for small models (#11887) by @CatherineSue in #11887
[router] clean up workflow logs to debug for implementation details logs (#11886) by @slin1237 in #11886
fix(sql-router): fix conflict port in test (#11826) by @htiennv in #11826
[router][grpc] Remove continue_final_message in ChatTemplateParams and add minijinja-contrib (#11882) by @CatherineSue in #11882
[router] remove encoding header for oai router (#11881) by @slin1237 in #11881
[router] Worker Management Workflow Engine (#11868) by @slin1237 in #11868
[2/2] [feature] support openai like classification api in router (#11670) by @whybeyoung in #11670
[router] Add Configurable L0 and L1 Tokenizer Caching (#11688) by @slin1237 in #11688
[router][grpc] Support parallel queue puts in grpc_request_manager and remove mutex for grpc_client (#11798) by @CatherineSue in #11798
[Lint] Add python/sglang to ruff F401 checks and remove unused imports in files (#11685) by @CatherineSue in #11685
[router][grpc] Remove timeout for connections and remove max_tokens deprecation warning log (#11775) by @CatherineSue in #11775
[doc] update router document (#11767) by @key4ng in #11767
[router] fix grpc client time out to 1h (#11768) by @slin1237 in #11768
[router] Fix UTF-8 Boundary Panic in Stop Sequence Decoder (#11766) by @slin1237 in #11766
Revert "[router] fix get_models endpoint for openai router (#11687)" (#11740) by @key4ng in #11687
[router] Add rustfmt and set group imports by default (#11732) by @CatherineSue in #11732
[router] add spec.rs to enables tests under spec folder (#11734) by @key4ng in #11734
[router] Fix tool_choice normalization in ChatCompletionRequest and fix ut (#11731) by @CatherineSue in #11731
[router][grpc] add dissag info to warm up in grpc server (#11727) by @slin1237 in #11727
[router] fix p and d worker filtering and bootstrap port handling (#11729) by @slin1237 in #11729
[Router] Refactor protocol definitions: split spec.rs into modular files (#11677) by @key4ng in #11677
[router] fix get_models endpoint for openai router (#11687) by @key4ng in #11687
[router] Refactor StopSequenceDecoder to Use Sequence for Incremental Decoding (#11676) by @slin1237 in #11676
[router][grpc] Simplify model_id determination (#11684) by @CatherineSue in #11684
[router] Fix response api related spec (#11621) by @key4ng in #11621

Paths Included

sgl-router
python/sglang/srt/grpc
python/sglang/srt/entrypoints/grpc_server.py

Full Changelog: gateway-v0.2.0...gateway-v0.2.1

sgl-project/sglang gateway-v0.2.1 Release Gateway-v0.2.1 on GitHub

🚀 SGLang Model Gateway v0.2.1 Released!

🧾 Docs & CI

🧹 Code Cleanup

🐞 Bug Fixes

🌟 New Features

What's Changed in Gateway

Gateway Changes (26 commits)

Paths Included

sgl-project/sglang gateway-v0.2.1
Release Gateway-v0.2.1

on GitHub