🚀 SGLang Model Gateway v0.2.4 Released!
We're excited to announce SGLang Model Gateway v0.2.4 – a massive release focused on performance, security, and production-ready observability!
✨ Headline Features
⚡ Major Performance Optimizations
We've invested heavily in performance across the entire stack:
- Optimized radix tree for cache-aware load balancing – Smarter routing decisions with lower overhead
- Tokenizer optimization – Dramatically reduced CPU and memory footprint during tokenization
- Core module optimization – HTTP and gRPC routers now run leaner and faster
- Efficient OTEL implementation – Production-grade observability with minimal performance impact
🔌 Industry-First WASM Middleware Support
Programmable middleware using WebAssembly! Extend your gateway with safe, isolated plugins. Build custom routing logic, transform requests/responses, or integrate proprietary systems – all without touching core code. Your gateway, your rules.
📊 Production-Grade Observability
Full OpenTelemetry integration with distributed tracing for both HTTP and gRPC. Track requests across your entire inference stack with native trace context propagation. Finally, real visibility into your LLM infrastructure.
⚡ Built for speed. Hardened for security. Ready for production.
Gateway Changes (98 commits)
- [model-gateway] release gateway 0.2.4 (#14763) by @slin1237 in #14763
- [Perf] Optimize radix tree for cache-aware load balancin (#14758) by @slin1237 in #14758
- [SMG] perf: optimize tokenizer for reduced CPU and memory overhead (#14752) by @slin1237 in #14752
- [model-gateway] optimize core modules (#14751) by @slin1237 in #14751
- Tiny extract select_worker_min_load (#14648) by @fzyzcjy in #14648
- [ci][smg] fix docker release ci and add it to pr test (#14683) by @slin1237 in #14683
- Tiny support sgl-router http response status code metrics (#14689) by @fzyzcjy in #14689
- [SMG]feat: implement TokenGuardBody for managing token return (#14653) by @jimmy-evo in #14653
- [model-gateway] add OTEL integration to grpc router (#14671) by @slin1237 in #14671
- Fix cache-aware router should pick min load instead of min tenant size (#14650) by @fzyzcjy in #14650
- [model-gateway] Optimize memory usage in HTTP router (#14667) by @slin1237 in #14667
- [model-gateway] fix WASM arbitrary file read security vol (#14664) by @slin1237 in #14664
- [model-gateway] reduce cpu overhead in grpc router (#14663) by @slin1237 in #14663
- [model-gateway] reducing cpu overhead in various of places (#14658) by @slin1237 in #14658
- Fix dp-aware incompatible with service-discovery (#14629) by @fzyzcjy in #14629
- Super tiny fix unused code in router (#14618) by @fzyzcjy in #14618
- [model-gateway] fix WASM unbounded request/response body read vuln (#14612) by @slin1237 in #14612
- Super tiny remove unused select_worker_pair (#14609) by @fzyzcjy in #14609
- [model-gateway] refactor otel to be more efficient (#14604) by @slin1237 in #14604
- Tiny fix missing policy decision recording (#14605) by @fzyzcjy in #14605
- [model-gateway] fix WASM memory limit per module (#14600) by @slin1237 in #14600
- [model-gateway] reorganize metrics, logging, and otel to its own module (#14590) by @slin1237 in #14590
- [model-gateway] Fixed WASM Security Vulnerability - Execution Timeout (#14588) by @slin1237 in #14588
- [model-gateway] extra accumulator and tool handler in oai router (#14587) by @slin1237 in #14587
- [Bug fix] Add /model_info endpoint to mini_lb (#14535) by @alisonshao in #14535
- [model-gateway][tracing]: implement request tracing using OpenTelemetry with trace context propagation (HTTP) (#13897) by @sufeng-buaa in #13897
- [model-gateway] fix left over sgl-router names in wasm (#14514) by @slin1237 in #14514
- [model-gateway] fix logs in smg workflow (#14513) by @slin1237 in #14513
- [model-gateway] fix left over sgl-router names to sgl-model-gateway (#14512) by @slin1237 in #14512
- [model-gateway] change sgl-router to sgl-model-gateway (#14312) by @slin1237 in #14312
- [model-gateway] Make Tokenizer Builder Aware of Env Vars Like HF_ENDPOINT (#14405) by @xuwenyihust in #14405
- Fix removing worker will make it healthy forever in prometheus metrics (#14420) by @fzyzcjy in #14420
- [model-gateway] fix server info comment (#14508) by @slin1237 in #14508
- [model-gateway] reorganized conversation handler (#14507) by @slin1237 in #14507
- [model-gateway] Add WASM support for middleware (#12471) by @tonyluj in #12471
- [model-gateway] move conversation to first class routing (#14506) by @slin1237 in #14506
- [misc] add model arch and type to server info and use it for harmony (#14456) by @slin1237 in #14456
- [model-gateway] grpc to leverage event type (#14450) by @slin1237 in #14450
- [model-gateway] add mistral 3 image processor (#14445) by @slin1237 in #14445
- [model-gateway] move all responses api event from oai to proto (#14446) by @slin1237 in #14446
- [model-gateway] move oai header util to router header util (#14441) by @slin1237 in #14441
- [model-gateway] extract conversation out of oai router (#14440) by @slin1237 in #14440
- [model-gateway] add llama4 vision image processor (#14438) by @slin1237 in #14438
- [model-gateway] introduce request ctx for oai router (#14434) by @slin1237 in #14434
- [model-gateway] add phi4 vision image processor (#14430) by @slin1237 in #14430
- Add Mistral Large 3 support. (#14213) by @dcampora in #14213
- [model-gateway] introduce provider in openai router (#14394) by @slin1237 in #14394
- [model-gateway] add phi3 vision image processor (#14381) by @slin1237 in #14381
- [model-gateway][doc] Add STDIO Explicitly to Example in README (#14393) by @xuwenyihust in #14393
- Fix sgl-router silently parse selector wrongly causing OME fail to discover pods (#14359) by @fzyzcjy in #14359
- [model-gateway] add qwen3_vl model image processor (#14377) by @slin1237 in #14377
- [model-gateway] use worker crate in openai router (#14330) by @slin1237 in #14330
- [model-gateway] add qwen2.5_vl model image processor (#14375) by @slin1237 in #14375
- [model-gateway] add qwen2_vl model image processor and tests (#14374) by @slin1237 in #14374
- [model-gateway] add llava model image processor and tests (#14371) by @slin1237 in #14371
- [model-gateway] add image processor and transformer structure (#14344) by @slin1237 in #14344
- [model-gateway] multimodality initialization (#13350) by @slin1237 in #13350
- [model-gateway] add workflow for external model providers (#14323) by @slin1237 in #14323
- [model-gateway] change rust package name to sgl-model-gateway instead (#14283) by @slin1237 in #14283
- [model-gateway] fix version output (#14276) by @slin1237 in #14276
- [model-gateway] include smg version command in py binding (#14274) by @slin1237 in #14274
- [model-gateway] add audio and moderation in model card (#14263) by @slin1237 in #14263
- [model-gateway] Add e2e tests of streaming events and tool choice for response api (#13880) by @XinyueZhang369 in #13880
- [model-gateway] Migrate Worker trait to model-aware methods (#14250) by @slin1237 in #14250
- [model-gateway] add ModelCard support to WorkerMetadata (#14243) by @slin1237 in #14243
- [model-gateway] add ModelCard and ProviderType for model configuration (#14237) by @slin1237 in #14237
- [model-gateway] add ModelType bitflags and Endpoint enum for worker (#14230) by @slin1237 in #14230
- [model-gateway] fix v1/models response format to be oai compatible (#13693) by @CatherineSue in #13693
- [model-gateway] refactor oai router 1/n (#14228) by @slin1237 in #14228
- [model-gateway] Avoid logging MCP connection token (#13887) by @xuwenyihust in #13887
- [Minor] update docs (#14212) by @merrymercy in #14212
- [model-gateway] support VL models in router (#14140) by @ooapex in #14140
- Support numactl bind for CPU and memory before process starts (#14156) by @fzyzcjy in #14156
- [model-gateway] Add version command support to SMG (#12558) by @tonyluj in #12558
- [model-gateway] allow refill rate to be zero (#14030) by @slin1237 in #14030
- [model-gateway] Fix flaky test_circuit_breaker_half_open_failure_reopens (#14019) by @XinyueZhang369 in #14019
- [model-gateway][doc] Update transport terminology to protocol in README.md (#13872) by @xuwenyihust in #13872
- [ci] allow manual label to trigger ci in rust, change ci order (#14016) by @slin1237 in #14016
- [model gateway][grpc] Add tojson filter to override minijinja's tojson (#14013) by @CatherineSue in #14013
- [model-gateway] fix xpu ci (#14012) by @slin1237 in #14012
- [model-gateway] Add PostgreSQL support to binding (#13766) by @xuwenyihust in #13766
- [Router bugfix] Fix router_manager selecting the wrong router when enable-igw. (#13572) by @SYChen123 in #13572
- [model-gateway] Refactor router e2e responses tests (#13745) by @XinyueZhang369 in #13745
- [Fix] Fix uvloop get_event_loop() is not suitable for 0.22.x (#13612) by @tom-jerr in #13612
- [misc] Rename minilb install env & remove files & fix lint (#13831) by @hnyls2002 in #13831
- [model-gateway] clean up router manager function order (#13776) by @slin1237 in #13776
- Fix url: use https://roadmap.sglang.io for roadmap (#13733) by @merrymercy in #13733
- [model-gateway] fix gateway cli arg parser to not use = (#13685) by @CatherineSue in #13685
- [model-gateway] add both python and rust cli alias (#13678) by @slin1237 in #13678
- [router][grpc] Support num_reasoning_tokens in haromy models (#13047) by @CatherineSue in #13047
- [model-gateway] use worker startup time out for worker registration (#13473) by @slin1237 in #13473
- [model-gateway] Add Gateway Release Tooling (#13420) by @slin1237 in #13420
- refactor: replace worker pool with semaphore-based concurrency in jobqueue (#13383) by @RiversJin in #13383
- [router] bindings for go (#13384) by @whybeyoung in #13384
- [model-gateway] fix SDist step readme path (#13373) by @slin1237 in #13373
- [model-gateway] remove grpc feature flag and mark as default (#13330) by @slin1237 in #13330
- [router] Fix flaky router e2e tests (#13306) by @XinyueZhang369 in #13306
- [model-gateway] move python to binding folder (#13295) by @slin1237 in #13295
New Contributors
- @tonyluj made their first contribution in 6bad6a365
- @tom-jerr made their first contribution in a95a38078
- @RiversJin made their first contribution in 2a5773440
- @jimmy-evo made their first contribution in 6f657070e
- @dcampora made their first contribution in 842807843
- @alisonshao made their first contribution in cee93a6f2
Full Changelog: gateway-v0.2.3...gateway-v0.2.4