github sgl-project/sglang gateway-v0.3.0
Release Gateway-v0.3.0

7 hours ago

🚀 SGLang Model Gateway v0.3.0 Released!

We're thrilled to announce SGLang Model Gateway v0.3.0 – a major release with powerful new features, architectural improvements, and important breaking changes!

⚠️ Breaking Changes

📊 Metrics Architecture Redesigned

Complete overhaul with new 6-layer metrics architecture covering protocol (HTTP/gRPC), router, worker, streaming (TTFT/TPOT), circuit breaker, and policy metrics with unified error codes.
Action Required: Update your Prometheus dashboards and alerting rules. Metric names and structure have changed.

🔧 UUID-Based Worker Resource Management

Workers are now identified by UUIDs instead of endpoints for cleaner resource management.
Action Required: Update any tooling or scripts that interact with the worker API.

✨ New Features

🌐 Unified Inference Gateway Mode (IGW)

Single gateway, entire fleet. IGW now supports ALL router types in a single deployment with Kubernetes service discovery:

  • gRPC router (PD and regular mode)
  • HTTP router (PD and regular mode)
  • OpenAI router
    Auto-enabled with service discovery. Deploy once, route everything - handle all traffic patterns across your entire inference fleet from a single gateway instance.

🔤 Tokenize/Detokenize HTTP Endpoints

  • Direct HTTP endpoints for tokenization operations
  • Dynamic tokenizer control plane: add, list, get, and remove tokenizers on-the-fly
  • TokenizerRegistry for efficient dynamic loading

🧠 Parser Endpoints

  • /parse/reasoning - Parse reasoning outputs
  • /parse/function_call - Parse function call responses
  • GLM-4 function call parser - Contributed directly by the GLM team for latest GLM models

📊 Embeddings Support

Native embeddings endpoint for gRPC router - expand beyond text generation to embedding workloads.

🔐 Server-Side TLS Support

Secure your gateway deployments with native TLS support.

🌐 Go Implementation, contributed by iFlytek MaaS team.

Complete Go SGLang Model Gateway with OpenAI-compatible API server - bringing SGLang to the Go ecosystem!

⚡ Major Enhancements

Control Plane - Workflow Engine

Intelligent lifecycle orchestration with:

  • DAG-based parallel execution with pre-computed dependency graphs
  • Concurrent event processing for maximum throughput
  • Modular add/remove/update workflows

Performance Optimization

  • Lock-free data structures: DashMap for policy lookups, lock-free router snapshots
  • Reduced CPU overhead: Optimized worker registry, gRPC client fetch, and worker selection
  • Optimized router management: Improved selection algorithms and state management

Resilience & Reliability:

  • Retry and circuit breaker support for OpenAI and gRPC routers
  • Enhanced circuit breaker with better state management
  • Graceful shutdown for TLS and non-TLS servers
  • Unified error responses with error codes and X-SMG-Error-Code headers

Infrastructure:

  • Multi-architecture Docker builds (Linux, macOS, Windows, ARM)
  • Custom Prometheus duration buckets
  • Improved logging across all modules

🐛 Bug Fixes & Stability

  • Fixed cache-aware routing in gRPC mode
  • Resolved load metric tracking and double-decrease issues for cache aware load balancing
  • Improved backward compatibility for GET endpoints
  • Fixed gRPC scheduler launcher issues
  • Fixed token bucket negative duration panics
  • Resolved MCP server initialization issues

📚 Documentation

Major documentation update with comprehensive guides, examples, and best practices for SGLang Model Gateway.

⚠️ Migration checklist:

  • Update Prometheus dashboards for new metrics
  • Update worker API integrations for UUID-based management
  • Review new error response format

⚡ Built for speed. Engineered for scale. Production-proven.

Gateway Changes (108 commits)

New Contributors

Full Changelog: gateway-v0.2.4...gateway-v0.3.0

Don't miss a new sglang release

NewReleases is sending notifications on new releases.