llm-d/llm-d v0.7.0 on GitHub

LLM-D Component Summary

⚠️ BREAKING CHANGE — CUDA 13.0.2 runtime: All llm-d CUDA images now ship with CUDA 13.0.2 (upgraded from 12.x). This requires NVIDIA driver 580 or later on the host. Nodes running older drivers must be upgraded before deploying v0.7.0 images.
UX Change - due to the difficulty configuring gateways for many adopters, we have made the default deployment of llm-d to use "standalone mode" where we use a generic proxy instead of the more feature full gateway. We still recomend a fully gateway for customers in production.

Component	Version	Previous Version	Type
llm-d/llm-d-inference-scheduler	`v0.8.0`	`v0.7.1`	Image
llm-d/llm-d-uds-tokenizer	`vllm-v0.19.1`	`v0.7.1`	Image
llm-d/llm-d-kv-cache	`v0.8.0`	`v0.7.1`	Library
llm-d/llm-d-routing-sidecar	`v0.8.0`	`v0.7.1`	Image
llm-d/llm-d-inference-sim	`v0.8.2`	`v0.7.1`	Image
llm-d/llm-d-cuda	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-cuda (debug)	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-cuda-gb200	`v0.7.0`	N/A	Image (New)
llm-d/llm-d-aws (EFA)	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-xpu	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-hpu	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-cpu	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-rocm	`v0.7.0`	`v0.6.0`	Image
llm-d/llm-d-kv-cache/llmd_fs_backend_connector	`v0.19.1`	`v0.17.1`	Wheel installed in `llm-d`
llm-d/llm-d-workload-variant-autoscaler	`v0.7.0`	`v0.6.0`	Helm Chart + Image
llm-d-incubation/llm-d-infra (Deprecated)	N/A	`v1.4.0`	Helm Chart
llm-d-incubation/llm-d-modelservice (Deprecated)	N/A	`v0.4.9`	Helm Chart
vllm-project/vllm	`v0.19.1`	`v0.17.1`	Wheel installed in `llm-d`
kubernetes-sigs/gateway-api-inference-extension	`v1.5.0`	`v1.4.0`	Helm Chart

Infrastructure Changes

Component	Version	Previous Version
Gateway API	`v1.5.1`	`v1.4.0`
Istio	`1.29.1`	`1.28.1`
agentgateway (old KGateway)	`v2.2.1`	`v2.1.1`

What's Changed

Simplify WVA guide test by @lionelvillard in #1072
fix concurrency group to sha not PR by @Gregory-Pereira in #1073
fix block-size alignment by @vMaroon in #1084
Revise maturity status and TPU VM type details by @seanhorgan in #1085
Fix formatting of automated test status in README by @seanhorgan in #1087
Fix image on pd user guide by @Edwinhr716 in #1086
Updated maturity testing level on all guides by @maugustosilva in #1094
Skip latest tag for release candidates by @diegocastanibm in #1034
fix(xpu): enable TP=2 for Qwen3-32B for fixing XPU prefix-cache test failed by @yuanwu2017 in #1081
[guides] Add a commented priorityClassName for use in nightly CI/CD by @maugustosilva in #1062
deps(actions): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #1089
deps(actions): bump actions/github-script from 7 to 8 by @dependabot[bot] in #1091
deps(actions): bump google-github-actions/auth from 2.1.12 to 3.0.0 by @dependabot[bot] in #1090
deps(actions): bump dorny/paths-filter from 3 to 4 by @dependabot[bot] in #1092
deps(actions): bump docker/login-action from 3 to 4 by @dependabot[bot] in #1093
Add shared drive for sig-rl by @petecheslock in #1109
Fix llm-d performance dashboard queries by @danehans in #1098
Fix/e2e validate single curl pod by @yuanwu2017 in #1078
Add Moreh as a contributor to the adopters list by @hhk7734 in #1111
Update guides with status badges. by @maugustosilva in #1110
fix: use vllmServe modelCommand in precise-prefix-cache-aware XPU values by @sharvil10 in #1101
Add deployment health-check smoke test for llm-d clusters by @lisperz in #767
[1/N] Documentation Revamp by @robertgshaw2-redhat in #1100
fix: split vLLM command array and fill in quickstart TODOs by @madhugoutham in #1126
add OCI well-lit-path for P/D disaggregation by @hexfusion in #1123
fix: use the proper git ref for tag extraction by @Gregory-Pereira in #1133
Update latency-predictor.md diagram by @LukeAVanDrie in #1127
docs: fix typos detected by nightly scan (issue #1056) by @ianliuy in #1135
[Docs][4/N] Update architecture/core/epp/scheduling.md docs by @ahg-g in #1122
[2/N] Proxy Doc by @robertgshaw2-redhat in #1121
[Docs][6/N]: add Istio gateway setup guide by @madhugoutham in #1146
docs: add EPP flow control reference by @LukeAVanDrie in #1130
[Docs] [3/N] Add Disaggregation Architecture Docs by @robertgshaw2-redhat in #1138
[Docs] Add glossary page by @ianliuy in #1148
Add agentgateway guide by @danehans in #1159
fix(docs): correct typos from nightly scan, add false-positive config by @ianliuy in #1160
[Docs] Guides Directory Cleanup by @robertgshaw2-redhat in #1163
[Docs][EPP] Doc for request handling and control by @zetxqx in #1128
fix(docker): add LIBRARY_PATH and ldconfig to CUDA runtime stage by @ianliuy in #1147
Update ms-inference-scheduling/values_tpu_v7.yaml to use RunAI model streamer by @amacaskill in #1102
deps(actions): bump hashicorp/setup-terraform from 3.1.2 to 4.0.0 by @dependabot[bot] in #1153
deps(actions): bump aws-actions/configure-aws-credentials from 4.3.1 to 6.1.0 by @dependabot[bot] in #1152
deps(actions): bump actions/github-script from 8 to 9 by @dependabot[bot] in #1150
deps(actions): bump j178/prek-action from 1 to 2 by @dependabot[bot] in #1151
deps(actions): bump actions/upload-artifact from 6 to 7 by @dependabot[bot] in #1149
[Docs] Remove customizing-a-guide.md by @robertgshaw2-redhat in #1165
[Docs] Move guides/benchmarks to helpers/benchmark.md by @robertgshaw2-redhat in #1164
[Docs][Istio] Align to AgentGateway Doc by @robertgshaw2-redhat in #1161
deps(docker): bump gdrcopy from v2.5.1 to v2.5.2 by @ianliuy in #1171
[Docs] Envoy Proxy -> GAIE-Conformant Proxy by @robertgshaw2-redhat in #1162
Updated the basic architecture diagram by @ahg-g in #1173
[Docs] Fix Broken Quickstart Links by @robertgshaw2-redhat in #1175
[Docs] Remove guide/prereq/infrastructure by @robertgshaw2-redhat in #1168
Added GKE gateway guide by @ahg-g in #1174
[Docs] Move Client Tools from Guides --> Helpers by @robertgshaw2-redhat in #1167
feat: add Rebellions as a supported accelerator vendor by @rebel-jinmoo in #1115
docs: rewrite predicted latency architecture and add well-lit path by @kaushikmitr in #1166
[Docs][Autoscaling][2/N] hpa/keda vs hpa design choices/features by @lionelvillard in #1157
[Docs][7/N] KV Indexer by @vMaroon in #1143
[Docs][autoscaling][1/N] Autoscaling intro by @lionelvillard in #1145
[Docs] Refine RDMA docs by @praveingk in #1181
Fixup merge conflict markers in tiered-prefix-cache/storage/README.md by @tlrmchlsmth in #1190
[Docs] fix broken link by @lionelvillard in #1191
docs: llm-d-inference-payload-processor proposal by @nilig in #1184
[Docs] Well Lit Paths Doc by @chcost in #1156
Tweaks to the latency predictor docs by @ahg-g in #1201
[Docs] Restructure Predicted Latency Well Lit Path Doc by @robertgshaw2-redhat in #1200
[Docs][Autoscaling][4/N] first pass at the HPA+IGW doc by @lionelvillard in #1192
Updating native HPA-based autoscaling guide to reference EPP instead of IGW by @ahg-g in #1212
[Docs] Rename Well-Lit-Paths -> Guides, Guides -> Resources by @robertgshaw2-redhat in #1199
deps(actions): bump actions/checkout from 4 to 6 by @dependabot[bot] in #1221
deps(actions): bump helm/kind-action from 1.12.0 to 1.14.0 by @dependabot[bot] in #1220
deps(actions): bump actions/download-artifact from 4 to 8 by @dependabot[bot] in #1219
[Docs][8/N] KV Offloader Doc by @vMaroon in #1144
[Docs] Fix Various Nits by @robertgshaw2-redhat in #1215
Update TPU recommendations in README by @seanhorgan in #1213
Fix typos by @diegocastanibm in #1223
[Docs] Remove envoy reference by @robertgshaw2-redhat in #1225
Proposal doc for Inference Resilience Operator by @aishukamal in #984
docs(epp): polish architecture and configuration guides by @LukeAVanDrie in #1227
[Docs] Consolidate Gateway Docs To Use YAML by @robertgshaw2-redhat in #1178
Update model server doc by @ahg-g in #1237
[Refactor Install]: Helm -> Kustomize for IIS by @Gregory-Pereira in #1131
Add workflow write by @diegocastanibm in #1245
fixing workflow call file pointers by @Gregory-Pereira in #1248
Added placeholder links by @ahg-g in #1246
Remove workflow permit by @diegocastanibm in #1260
fix: broken links in guides and docs by @zdtsw in #1265
Streamline optimized baseline guide by @liu-cong in #1249
Add FULL_DUPLEX_STREAMED requirement by @roytman in #1271
Update CI for optimized baseline guide by @liu-cong in #1268
Consolidated Gateway guides by @ahg-g in #1259
cleaup unused docs by @ahg-g in #1276
fixed incorrectly setting namespace to default in the optimized-baseline by @ahg-g in #1274
[Docs] Recover precise prefix cache-aware well-lit-path by @vMaroon in #1272
deps(actions): bump aquasecurity/trivy-action from 0.35.0 to 0.36.0 by @dependabot[bot] in #1275
Data layer docs by @ahg-g in #1242
precise-prefix-cache-scheduling guide helm -> kustomize by @vMaroon in #1258
Point optimized-baseline-ocp to the new reusable-nightly-e2e-openshift.yaml by @maugustosilva in #1278
Add details on gke patch for optimized baseline by @liu-cong in #1282
Added API and HTTP Headers References and updated the inferencepool architectural doc by @ahg-g in #1277
Align TPU v6/v7 optimized baseline with Qwen3-32B by @yangligt2 in #1286
fix: remove unsupported description field to fix precise-prefix-cache… by @revit13 in #1287
Add kustomize to prerequisite in client-setup/README.md by @revit13 in #1289
Add how to check gib is installed by @liu-cong in #1283
docs: add Before You Write Code contribution scope section by @hexfusion in #1214
fix: Fix typos lint by @ying-jeanne in #1267
Fix broken links to gateway guides by @ahg-g in #1291
Remove hashSeed/PYTHONHASHSEED and enable speculativeIndexing in precise-prefix-cache guide by @bongwoobak in #1290
fix: migrate e2e-optimized-baseline-xpu workflow from helmfile to kus… by @yuanwu2017 in #1251
Matrix of Badges by @diegocastanibm in #1281
docs: reorganize predicted latency docs and fix broken links by @kaushikmitr in #1247
[Docs][Autoscaling][3/N] Comprehensive WVA doc. by @lionelvillard in #1176
Ensure precise-prefix-cache-aware passes both CKS and OCP by @maugustosilva in #1296
remove support building images from upstream branches by @Gregory-Pereira in #1292
revert to pull request target by @Gregory-Pereira in #1300
[Docs] Feature matrix and artifacts by @chcost in #1185
ci: migrate e2e-prefix-cache-xpu from helmfile to kustomize by @xiaojun-zhang in #1297
docs: coordinator incubation repo proposal by @nilig in #1293
Add AWS EFS backend guide for tiered prefix cache storage by @sudoalok in #1264
Upgrade to CUDA13 and support GB200 by @tlrmchlsmth in #1134
Align WVA guide with optimized-baseline and kustomize install flow by @mamy-CS in #1285
Fix matrix by @diegocastanibm in #1314
Increase timeout in simulated accelerators by @diegocastanibm in #1316
docs: add batch-gateway well-lit path documentation by @lioraron in #1187
update the llm-d-fs-connector kustomizations for v0.8 by @effi-ofer in #1308
[Doc][Autoscaling] Remove wrong p/d support characterization by @lionelvillard in #1295
Add missing details tag in docs by @petecheslock in #1299
Transitioning to llm-d Router by @ahg-g in #1305
Update EPP image to v0.8.0 and add support for both 1.4 and 1.5 igw helm charts by @ahg-g in #1317
[Docs] PD Guide from Helm->Kustomize by @robertgshaw2-redhat in #1238
Fix precise-prefix-cache guide: remove tokenizer plugin, fix speculativeIndexing placement by @bongwoobak in #1321
Update wide-ep guide; temporarily use 0.5 image by @liu-cong in #1288
ci: install kustomize in client-setup install-deps.sh by @xiaojun-zhang in #1319
ci: migrate e2e-pd-xpu from helmfile to kustomize by @xiaojun-zhang in #1304
quick cleanup by @liu-cong in #1328
Updated the epp design drawing by @ahg-g in #1331
Added kv management umbrella docs by @ahg-g in #1332
fix: use WORKFLOW_TOKEN for fork PR /test-nightly push by @clubanderson in #1335
fix: remove invalid workflows permission, use WORKFLOW_TOKEN PAT by @clubanderson in #1336
[0.7] remove prepare data plugin by @robertgshaw2-redhat in #1337
fix: persist-credentials false so WORKFLOW_TOKEN works for fork pushes by @clubanderson in #1338
Bump deprecated kgateway path to v2.2.3 by @danehans in #1234
[Docs] Polish Getting Started and Glossary by @ahg-g in #1333
[Guides Storage]: Align tiered-prefix-cache storage guide with optimized-baseline by @kfirtoledo in #1318
Add flow control well-lit path guide. by @LukeAVanDrie in #1301
removed empty docs by @ahg-g in #1339
[Guides] Clean Up README by @robertgshaw2-redhat in #1341
[Guides] TPU PD Disaggregation to Qwen 3.5 on TPU v7 by @yangligt2 in #1327
[Guides] Remove TMP Doc by @robertgshaw2-redhat in #1343
[Bugfix] Offloading Connector by @Gregory-Pereira in #1312
[Docs] P/D Tweaks For Deprecated Features by @robertgshaw2-redhat in #1351
[Guides] Standardize H1 by @robertgshaw2-redhat in #1349
remove stale e2es by @Gregory-Pereira in #1352
[Release] Bump to GAIE v1.5.0 by @robertgshaw2-redhat in #1353
Fix AWS image by @diegocastanibm in #1356
[Docs] Added docs/well-lit-paths/README.md file by @ahg-g in #1358
[Docs] Added workload autoscaling and async processing well-lit paths docs by @ahg-g in #1359
[Docs] Clean Up Proposals by @robertgshaw2-redhat in #1342
[CI] Add predicted-latency-based-scheduling nightly E2E (OCP/GKE/CKS) by @kaushikmitr in #1347
fix: add missing versions to bug report template by @liulanze in #1357
[Guides] Add lightweight TPU landing page for P/D Disaggregation by @yangligt2 in #1346
[PD - 0.7 release] Update Image by @robertgshaw2-redhat in #1361
[Docs]: Update docs with latest data producer plugin changes by @rahulgurnani in #1313
fix: correct typos in documentation by @EzgiTastan in #1364
[Build] Use Proper Wheel Variant by @robertgshaw2-redhat in #1362
Moved the note on the intention of the guides up by @ahg-g in #1373
Fix HPU image name by @diegocastanibm in #1374
[Docs]: Metrics and Tracing by @madhugoutham in #1207
add back hpu optimized baseline by @Gregory-Pereira in #1378
[Guides] Refactor Tiered Prefix Cache by @liu-cong in #1345
[Build] DeepGEMM Version by @robertgshaw2-redhat in #1367
Update local build by @diegocastanibm in #1382
[Guides] Fix duplicate metrics service port in predicted-latency values by @kaushikmitr in #1383
[bugfix] Fix DeepGEMM JIT in llm-d builds by @robertgshaw2-redhat in #1386
[Docs] Promote wip-docs-new to production location by @chcost in #1340
deps(actions): bump github/codeql-action from 3 to 4 by @dependabot[bot] in #1394
deps(actions): bump docker/login-action from 3 to 4 by @dependabot[bot] in #1395
deps(actions): bump docker/setup-buildx-action from 3 to 4 by @dependabot[bot] in #1396
deps(actions): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #1397
deps(actions): bump aquasecurity/trivy-action from 0.35.0 to 0.36.0 by @dependabot[bot] in #1398
update: clean deprecated env variable for vllm to build AVX by @zdtsw in #1231
Polishing docs and drawings by @ahg-g in #1404
ci: migrate nightly GKE pd disaggregation test to Kustomize & Gateway… by @yangligt2 in #1403
Fix: nightly CI pipeline fails due to broken Helm values path by @weizhoublue in #1388
Fix/predicted latency duplicate metrics port by @kaushikmitr in #1406
Fix MDX syntax error in optimized-baseline guide by @chcost in #1405
[Docs] Migrate Async Processor Helmfile -> Direct Helm Application by @shimib in #1366
Add CNCF Sandbox project references to README by @alexagriffith in #1375
Enable /test-nightly build-image by @maugustosilva in #1409
one build approval per set of chagnes in a PR by @Gregory-Pereira in #1413
chore(xpu): bump VLLM_XPU_COMMIT_SHA to v0.19.1 by @xiaojun-zhang in #1418
enable security events on nightly-permissions workflow by @Gregory-Pereira in #1410
Simplified EPP components headers by @ahg-g in #1421
Make active-active HA the default for precise-prefix-cache-aware by @vMaroon in #1415
drop ci for simulated accelerators by @Gregory-Pereira in #1387
docs: document feature gate removal process by @rahulgurnani in #1419
[Docs] Make sure all references are relative by @ahg-g in #1407
deps(docker): bump LMCache from v0.4.1 to v0.4.4-cu13 by @liulanze in #1368
Added a how to configure section to the EPP docs by @ahg-g in #1425
Remove wva helmfile resources by @mamy-CS in #1424
Fix InferencePool diagram reference by @chcost in #1416
[Guides] Add TPU KV cache offloading support to tiered prefix cache by @dannawang0221 in #1411
[Docs] Clean Up Artifacts by @robertgshaw2-redhat in #1370
Unified drawings color theme, sizes and placement by @ahg-g in #1431
Added more clarifications on proxy terminology and its relation to cloud load balancers by @ahg-g in #1435
Updated feature matrix by @ahg-g in #1436
refine the terminology under router by @ahg-g in #1437
Improve clarity of RDMA and networking documentation by @alexagriffith in #1385
fixed typos by @ahg-g in #1438
remove upstream_versions by @diegocastanibm in #1439
added nightly badges for p/d on gke and predicted latency by @ahg-g in #1432
Add "llm-d.ai/subguide" tracking labels for Prefix Cache Offloading by @adinilfeld in #1428
Add a nightly P/D on GKE badge to the main matrix by @maugustosilva in #1444
AMD guides and docker update by @vcave in #1443
Bump Istio from 1.29.1 to 1.29.2 by @liulanze in #1447
Fix broken link in openshift-aws documentation by @petecheslock in #1451
Attempt to deploy full Qwen3-32B for tiered-prefix-cache by @maugustosilva in #1445
upgrade to CNCF runners by @Gregory-Pereira in #1455
add tiered prefix cache nightly for gke by @liu-cong in #1454
Revert "upgrade to CNCF runners" by @Gregory-Pereira in #1458
docs: disable OpenShift predicted-latency nightly by @kaushikmitr in #1452
optimized-baseline: mount /.triton and /.config emptyDirs for non-root pods by @rdwj in #1450
Bump agentgateway from v1.0.0 to v1.1.0 by @liulanze in #1459
Rename resources-new to resources by @chcost in #1460
Add GKE CPU offloading LMCache nightly, rename workflows, and fix GPU requirements by @liu-cong in #1457
[Guides] update precise prefix-cache routing guide's benchmark data by @vMaroon in #1462
feat: modularize GKE NCCL tuner patch into shared component by @liu-cong in #1461
Various fixes to the guides by @ahg-g in #1467
(Guides) refresh scheduling-guides benchmark reports with v0.8.0 numbers by @vMaroon in #1463
nightlies should comment status on PR by @Gregory-Pereira in #1465
remove feature-matrix.md by @ahg-g in #1474
Updated GKE's supported paths by @ahg-g in #1475
Updated GKE provider docs on what paths are supported on what hardware by @ahg-g in #1477
[Bugfix] Fix WideEP Build + IMA by @robertgshaw2-redhat in #1448
removed redundant nightly tests badges by @ahg-g in #1476
Update GKE PD nightly by @liu-cong in #1456
[Guides] Bump WideEP by @robertgshaw2-redhat in #1324
point to the async processor well lit path instead of the guide directly by @ahg-g in #1479
[Guides | Bugfix] Consistency + Fix HTTPRoutes in Gateway Mode by @Gregory-Pereira in #1348
Add recipe for llm-d-fs connector storage offloading with Lustre by @Sneha-at in #1427
Polish the main README.md file by @ahg-g in #1482
Fix: PD disaggregation CKS nightly values by @weizhoublue in #1484
Move badge matrix to release by @diegocastanibm in #1481
Add release template to repository by @maugustosilva in #1487
optimized-baseline README.md fix by @Amit-Berman in #1486
Final batch of CI/CD fixes before release v0.7.0 by @maugustosilva in #1480
Switch llm-d-cuda on wide-ep-lws to v0.7.0 by @maugustosilva in #1489
Add additional steps for release process. by @maugustosilva in #1491
Add documentation release branch step to release template by @chcost in #1492
Add Performance Highlights, v0.7 news, and badge bump to README by @chcost in #1483
Prep v0.7.0 by @diegocastanibm in #1270
fix pd.values by @diegocastanibm in #1493

New Contributors

@hhk7734 made their first contribution in #1111
@sharvil10 made their first contribution in #1101
@madhugoutham made their first contribution in #1126
@hexfusion made their first contribution in #1123
@LukeAVanDrie made their first contribution in #1127
@ianliuy made their first contribution in #1135
@amacaskill made their first contribution in #1102
@rebel-jinmoo made their first contribution in #1115
@praveingk made their first contribution in #1181
@nilig made their first contribution in #1184
@revit13 made their first contribution in #1287
@ying-jeanne made their first contribution in #1267
@xiaojun-zhang made their first contribution in #1297
@sudoalok made their first contribution in #1264
@lioraron made their first contribution in #1187
@liulanze made their first contribution in #1357
@rahulgurnani made their first contribution in #1313
@EzgiTastan made their first contribution in #1364
@weizhoublue made their first contribution in #1388
@alexagriffith made their first contribution in #1375
@adinilfeld made their first contribution in #1428
@rdwj made their first contribution in #1450
@Amit-Berman made their first contribution in #1486

Full Changelog: v0.6...v0.7.0

llm-d/llm-d v0.7.0 Release v0.7.0 on GitHub

LLM-D Component Summary

Infrastructure Changes

What's Changed

New Contributors

llm-d/llm-d v0.7.0
Release v0.7.0

on GitHub