github sgl-project/sglang gateway-v0.2.0
Release Gateway-v0.2.0

latest releases: v0.5.6.post2, gateway-v0.2.4, v0.5.6.post1...
one month ago

🚀 Release: SGLang Model Gateway v0.2.0 (formerly “SGLang Router”)

🔥 What’s new

🧠 Multi-Model Inference Gateway (IGW) Mode

IGW turns one router into many — letting you manage multiple models at once, each with its own routing policy, priorities, and metadata. Think of it as running several routers under one roof, with shared reliability, observability, and API surface.
You can dynamically register models via /workers, assign labels like tier or policy, and let the gateway handle routing, health checks, and load balancing.
Whether you’re mixing Llama, Mistral, and DeepSeek, or orchestrating per-tenant routing in enterprise setups, IGW gives you total control.
Your fleet, your rules. ⚡

⚡ gRPC Mode: Rust-Powered, Built for Throughput

This is the heart of 0.2.0. The new gRPC data plane runs entirely in Rust — tokenizer, reasoning parser, and tool parser included — giving you native-speed performance, and lower latency.
You can connect to gRPC-based SGLang workers, stream tokens in real time, and even handle OpenAI-compatible APIs like

🌐 OpenAI-Compatible Gateway

Seamlessly proxy requests to OpenAI, while keeping data control local.
Conversation history, responses, and background jobs all flow through the gateway — same API, enterprise privacy.
💾 Pluggable History Storage
Choose between memory, none, or oracle for conversation and /v1/responses data.
memory: Fastest for ephemeral runs.none: Zero persistence, zero latency overhead.oracle: Full persistence via Oracle ATP with connection pooling and credentials support.🧩 Pluggable MCP Integration
The gateway now natively speaks MCP across all transports (STDIO, HTTP, SSE, Streamable), so your tools can plug directly into reasoning and response loops — perfect for agentic workflows and cross-model orchestration.

🛡️ Reliability & Observability Upgrades

Built-in:
Retries with exponential backoff + jitterPer-worker circuit breakersToken-bucket rate limiting & FIFO queuingPrometheus metrics for latency, load, queue depth, PD pipelines, tokenizer speed, and MCP activityStructured tracing & request-ID propagation

✨ SGLang Model Gateway v0.2.0 — built in Rust, designed for scale, ready for reasoning.

What's Changed in Gateway

Gateway Changes (238 commits)

New Contributors

Paths Included

  • sgl-router
  • python/sglang/srt/grpc
  • python/sglang/srt/entrypoints/grpc_server.py

Full Changelog: gateway-v0.1.9...gateway-v0.2.0

Don't miss a new sglang release

NewReleases is sending notifications on new releases.