github llm-d/llm-d v0.7.0
Release v0.7.0

latest releases: v0.7, v0
3 hours ago

LLM-D Component Summary

  • ⚠️ BREAKING CHANGE — CUDA 13.0.2 runtime: All llm-d CUDA images now ship with CUDA 13.0.2 (upgraded from 12.x). This requires NVIDIA driver 580 or later on the host. Nodes running older drivers must be upgraded before deploying v0.7.0 images.
  • UX Change - due to the difficulty configuring gateways for many adopters, we have made the default deployment of llm-d to use "standalone mode" where we use a generic proxy instead of the more feature full gateway. We still recomend a fully gateway for customers in production.
Component Version Previous Version Type
llm-d/llm-d-inference-scheduler v0.8.0 v0.7.1 Image
llm-d/llm-d-uds-tokenizer vllm-v0.19.1 v0.7.1 Image
llm-d/llm-d-kv-cache v0.8.0 v0.7.1 Library
llm-d/llm-d-routing-sidecar v0.8.0 v0.7.1 Image
llm-d/llm-d-inference-sim v0.8.2 v0.7.1 Image
llm-d/llm-d-cuda v0.7.0 v0.6.0 Image
llm-d/llm-d-cuda (debug) v0.7.0 v0.6.0 Image
llm-d/llm-d-cuda-gb200 v0.7.0 N/A Image (New)
llm-d/llm-d-aws (EFA) v0.7.0 v0.6.0 Image
llm-d/llm-d-xpu v0.7.0 v0.6.0 Image
llm-d/llm-d-hpu v0.7.0 v0.6.0 Image
llm-d/llm-d-cpu v0.7.0 v0.6.0 Image
llm-d/llm-d-rocm v0.7.0 v0.6.0 Image
llm-d/llm-d-kv-cache/llmd_fs_backend_connector v0.19.1 v0.17.1 Wheel installed in llm-d
llm-d/llm-d-workload-variant-autoscaler v0.7.0 v0.6.0 Helm Chart + Image
llm-d-incubation/llm-d-infra (Deprecated) N/A v1.4.0 Helm Chart
llm-d-incubation/llm-d-modelservice (Deprecated) N/A v0.4.9 Helm Chart
vllm-project/vllm v0.19.1 v0.17.1 Wheel installed in llm-d
kubernetes-sigs/gateway-api-inference-extension v1.5.0 v1.4.0 Helm Chart

Infrastructure Changes

Component Version Previous Version
Gateway API v1.5.1 v1.4.0
Istio 1.29.1 1.28.1
agentgateway (old KGateway) v2.2.1 v2.1.1

What's Changed

New Contributors

Full Changelog: v0.6...v0.7.0

Don't miss a new llm-d release

NewReleases is sending notifications on new releases.