LLM-D Component Summary
- ⚠️ BREAKING CHANGE — CUDA 13.0.2 runtime: All llm-d CUDA images now ship with CUDA 13.0.2 (upgraded from 12.x). This requires NVIDIA driver 580 or later on the host. Nodes running older drivers must be upgraded before deploying v0.7.0 images.
- UX Change - due to the difficulty configuring gateways for many adopters, we have made the default deployment of llm-d to use "standalone mode" where we use a generic proxy instead of the more feature full gateway. We still recomend a fully gateway for customers in production.
| Component | Version | Previous Version | Type |
|---|---|---|---|
| llm-d/llm-d-inference-scheduler | v0.8.0
| v0.7.1
| Image |
| llm-d/llm-d-uds-tokenizer | vllm-v0.19.1
| v0.7.1
| Image |
| llm-d/llm-d-kv-cache | v0.8.0
| v0.7.1
| Library |
| llm-d/llm-d-routing-sidecar | v0.8.0
| v0.7.1
| Image |
| llm-d/llm-d-inference-sim | v0.8.2
| v0.7.1
| Image |
| llm-d/llm-d-cuda | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-cuda (debug) | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-cuda-gb200 | v0.7.0
| N/A | Image (New) |
| llm-d/llm-d-aws (EFA) | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-xpu | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-hpu | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-cpu | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-rocm | v0.7.0
| v0.6.0
| Image |
| llm-d/llm-d-kv-cache/llmd_fs_backend_connector | v0.19.1
| v0.17.1
| Wheel installed in llm-d
|
| llm-d/llm-d-workload-variant-autoscaler | v0.7.0
| v0.6.0
| Helm Chart + Image |
| llm-d-incubation/llm-d-infra (Deprecated) | N/A | v1.4.0
| Helm Chart |
| llm-d-incubation/llm-d-modelservice (Deprecated) | N/A | v0.4.9
| Helm Chart |
| vllm-project/vllm | v0.19.1
| v0.17.1
| Wheel installed in llm-d
|
| kubernetes-sigs/gateway-api-inference-extension | v1.5.0
| v1.4.0
| Helm Chart |
Infrastructure Changes
| Component | Version | Previous Version |
|---|---|---|
| Gateway API | v1.5.1
| v1.4.0
|
| Istio | 1.29.1
| 1.28.1
|
| agentgateway (old KGateway) | v2.2.1
| v2.1.1
|
What's Changed
- Simplify WVA guide test by @lionelvillard in #1072
- fix concurrency group to sha not PR by @Gregory-Pereira in #1073
- fix block-size alignment by @vMaroon in #1084
- Revise maturity status and TPU VM type details by @seanhorgan in #1085
- Fix formatting of automated test status in README by @seanhorgan in #1087
- Fix image on pd user guide by @Edwinhr716 in #1086
- Updated maturity testing level on all guides by @maugustosilva in #1094
- Skip latest tag for release candidates by @diegocastanibm in #1034
- fix(xpu): enable TP=2 for Qwen3-32B for fixing XPU prefix-cache test failed by @yuanwu2017 in #1081
- [guides] Add a commented
priorityClassNamefor use in nightly CI/CD by @maugustosilva in #1062 - deps(actions): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #1089
- deps(actions): bump actions/github-script from 7 to 8 by @dependabot[bot] in #1091
- deps(actions): bump google-github-actions/auth from 2.1.12 to 3.0.0 by @dependabot[bot] in #1090
- deps(actions): bump dorny/paths-filter from 3 to 4 by @dependabot[bot] in #1092
- deps(actions): bump docker/login-action from 3 to 4 by @dependabot[bot] in #1093
- Add shared drive for sig-rl by @petecheslock in #1109
- Fix llm-d performance dashboard queries by @danehans in #1098
- Fix/e2e validate single curl pod by @yuanwu2017 in #1078
- Add Moreh as a contributor to the adopters list by @hhk7734 in #1111
- Update guides with status badges. by @maugustosilva in #1110
- fix: use vllmServe modelCommand in precise-prefix-cache-aware XPU values by @sharvil10 in #1101
- Add deployment health-check smoke test for llm-d clusters by @lisperz in #767
- [1/N] Documentation Revamp by @robertgshaw2-redhat in #1100
- fix: split vLLM command array and fill in quickstart TODOs by @madhugoutham in #1126
- add OCI well-lit-path for P/D disaggregation by @hexfusion in #1123
- fix: use the proper git ref for tag extraction by @Gregory-Pereira in #1133
- Update latency-predictor.md diagram by @LukeAVanDrie in #1127
- docs: fix typos detected by nightly scan (issue #1056) by @ianliuy in #1135
- [Docs][4/N] Update architecture/core/epp/scheduling.md docs by @ahg-g in #1122
- [2/N] Proxy Doc by @robertgshaw2-redhat in #1121
- [Docs][6/N]: add Istio gateway setup guide by @madhugoutham in #1146
- docs: add EPP flow control reference by @LukeAVanDrie in #1130
- [Docs] [3/N] Add Disaggregation Architecture Docs by @robertgshaw2-redhat in #1138
- [Docs] Add glossary page by @ianliuy in #1148
- Add agentgateway guide by @danehans in #1159
- fix(docs): correct typos from nightly scan, add false-positive config by @ianliuy in #1160
- [Docs] Guides Directory Cleanup by @robertgshaw2-redhat in #1163
- [Docs][EPP] Doc for request handling and control by @zetxqx in #1128
- fix(docker): add LIBRARY_PATH and ldconfig to CUDA runtime stage by @ianliuy in #1147
- Update ms-inference-scheduling/values_tpu_v7.yaml to use RunAI model streamer by @amacaskill in #1102
- deps(actions): bump hashicorp/setup-terraform from 3.1.2 to 4.0.0 by @dependabot[bot] in #1153
- deps(actions): bump aws-actions/configure-aws-credentials from 4.3.1 to 6.1.0 by @dependabot[bot] in #1152
- deps(actions): bump actions/github-script from 8 to 9 by @dependabot[bot] in #1150
- deps(actions): bump j178/prek-action from 1 to 2 by @dependabot[bot] in #1151
- deps(actions): bump actions/upload-artifact from 6 to 7 by @dependabot[bot] in #1149
- [Docs] Remove
customizing-a-guide.mdby @robertgshaw2-redhat in #1165 - [Docs] Move
guides/benchmarkstohelpers/benchmark.mdby @robertgshaw2-redhat in #1164 - [Docs][Istio] Align to AgentGateway Doc by @robertgshaw2-redhat in #1161
- deps(docker): bump gdrcopy from v2.5.1 to v2.5.2 by @ianliuy in #1171
- [Docs] Envoy Proxy -> GAIE-Conformant Proxy by @robertgshaw2-redhat in #1162
- Updated the basic architecture diagram by @ahg-g in #1173
- [Docs] Fix Broken Quickstart Links by @robertgshaw2-redhat in #1175
- [Docs] Remove guide/prereq/infrastructure by @robertgshaw2-redhat in #1168
- Added GKE gateway guide by @ahg-g in #1174
- [Docs] Move Client Tools from Guides --> Helpers by @robertgshaw2-redhat in #1167
- feat: add Rebellions as a supported accelerator vendor by @rebel-jinmoo in #1115
- docs: rewrite predicted latency architecture and add well-lit path by @kaushikmitr in #1166
- [Docs][Autoscaling][2/N] hpa/keda vs hpa design choices/features by @lionelvillard in #1157
- [Docs][7/N] KV Indexer by @vMaroon in #1143
- [Docs][autoscaling][1/N] Autoscaling intro by @lionelvillard in #1145
- [Docs] Refine RDMA docs by @praveingk in #1181
- Fixup merge conflict markers in
tiered-prefix-cache/storage/README.mdby @tlrmchlsmth in #1190 - [Docs] fix broken link by @lionelvillard in #1191
- docs: llm-d-inference-payload-processor proposal by @nilig in #1184
- [Docs] Well Lit Paths Doc by @chcost in #1156
- Tweaks to the latency predictor docs by @ahg-g in #1201
- [Docs] Restructure Predicted Latency Well Lit Path Doc by @robertgshaw2-redhat in #1200
- [Docs][Autoscaling][4/N] first pass at the HPA+IGW doc by @lionelvillard in #1192
- Updating native HPA-based autoscaling guide to reference EPP instead of IGW by @ahg-g in #1212
- [Docs] Rename Well-Lit-Paths -> Guides, Guides -> Resources by @robertgshaw2-redhat in #1199
- deps(actions): bump actions/checkout from 4 to 6 by @dependabot[bot] in #1221
- deps(actions): bump helm/kind-action from 1.12.0 to 1.14.0 by @dependabot[bot] in #1220
- deps(actions): bump actions/download-artifact from 4 to 8 by @dependabot[bot] in #1219
- [Docs][8/N] KV Offloader Doc by @vMaroon in #1144
- [Docs] Fix Various Nits by @robertgshaw2-redhat in #1215
- Update TPU recommendations in README by @seanhorgan in #1213
- Fix typos by @diegocastanibm in #1223
- [Docs] Remove envoy reference by @robertgshaw2-redhat in #1225
- Proposal doc for Inference Resilience Operator by @aishukamal in #984
- docs(epp): polish architecture and configuration guides by @LukeAVanDrie in #1227
- [Docs] Consolidate Gateway Docs To Use YAML by @robertgshaw2-redhat in #1178
- Update model server doc by @ahg-g in #1237
- [Refactor Install]: Helm -> Kustomize for IIS by @Gregory-Pereira in #1131
- Add workflow write by @diegocastanibm in #1245
- fixing workflow call file pointers by @Gregory-Pereira in #1248
- Added placeholder links by @ahg-g in #1246
- Remove workflow permit by @diegocastanibm in #1260
- fix: broken links in guides and docs by @zdtsw in #1265
- Streamline optimized baseline guide by @liu-cong in #1249
- Add FULL_DUPLEX_STREAMED requirement by @roytman in #1271
- Update CI for optimized baseline guide by @liu-cong in #1268
- Consolidated Gateway guides by @ahg-g in #1259
- cleaup unused docs by @ahg-g in #1276
- fixed incorrectly setting namespace to default in the optimized-baseline by @ahg-g in #1274
- [Docs] Recover precise prefix cache-aware well-lit-path by @vMaroon in #1272
- deps(actions): bump aquasecurity/trivy-action from 0.35.0 to 0.36.0 by @dependabot[bot] in #1275
- Data layer docs by @ahg-g in #1242
- precise-prefix-cache-scheduling guide helm -> kustomize by @vMaroon in #1258
- Point
optimized-baseline-ocpto the newreusable-nightly-e2e-openshift.yamlby @maugustosilva in #1278 - Add details on gke patch for optimized baseline by @liu-cong in #1282
- Added API and HTTP Headers References and updated the inferencepool architectural doc by @ahg-g in #1277
- Align TPU v6/v7 optimized baseline with Qwen3-32B by @yangligt2 in #1286
- fix: remove unsupported description field to fix precise-prefix-cache… by @revit13 in #1287
- Add kustomize to prerequisite in client-setup/README.md by @revit13 in #1289
- Add how to check gib is installed by @liu-cong in #1283
- docs: add Before You Write Code contribution scope section by @hexfusion in #1214
- fix: Fix typos lint by @ying-jeanne in #1267
- Fix broken links to gateway guides by @ahg-g in #1291
- Remove hashSeed/PYTHONHASHSEED and enable speculativeIndexing in precise-prefix-cache guide by @bongwoobak in #1290
- fix: migrate e2e-optimized-baseline-xpu workflow from helmfile to kus… by @yuanwu2017 in #1251
- Matrix of Badges by @diegocastanibm in #1281
- docs: reorganize predicted latency docs and fix broken links by @kaushikmitr in #1247
- [Docs][Autoscaling][3/N] Comprehensive WVA doc. by @lionelvillard in #1176
- Ensure
precise-prefix-cache-awarepasses both CKS and OCP by @maugustosilva in #1296 - remove support building images from upstream branches by @Gregory-Pereira in #1292
- revert to pull request target by @Gregory-Pereira in #1300
- [Docs] Feature matrix and artifacts by @chcost in #1185
- ci: migrate e2e-prefix-cache-xpu from helmfile to kustomize by @xiaojun-zhang in #1297
- docs: coordinator incubation repo proposal by @nilig in #1293
- Add AWS EFS backend guide for tiered prefix cache storage by @sudoalok in #1264
- Upgrade to CUDA13 and support GB200 by @tlrmchlsmth in #1134
- Align WVA guide with optimized-baseline and kustomize install flow by @mamy-CS in #1285
- Fix matrix by @diegocastanibm in #1314
- Increase timeout in simulated accelerators by @diegocastanibm in #1316
- docs: add batch-gateway well-lit path documentation by @lioraron in #1187
- update the llm-d-fs-connector kustomizations for v0.8 by @effi-ofer in #1308
- [Doc][Autoscaling] Remove wrong p/d support characterization by @lionelvillard in #1295
- Add missing details tag in docs by @petecheslock in #1299
- Transitioning to llm-d Router by @ahg-g in #1305
- Update EPP image to v0.8.0 and add support for both 1.4 and 1.5 igw helm charts by @ahg-g in #1317
- [Docs] PD Guide from Helm->Kustomize by @robertgshaw2-redhat in #1238
- Fix precise-prefix-cache guide: remove tokenizer plugin, fix speculativeIndexing placement by @bongwoobak in #1321
- Update wide-ep guide; temporarily use 0.5 image by @liu-cong in #1288
- ci: install kustomize in client-setup install-deps.sh by @xiaojun-zhang in #1319
- ci: migrate e2e-pd-xpu from helmfile to kustomize by @xiaojun-zhang in #1304
- quick cleanup by @liu-cong in #1328
- Updated the epp design drawing by @ahg-g in #1331
- Added kv management umbrella docs by @ahg-g in #1332
- fix: use WORKFLOW_TOKEN for fork PR /test-nightly push by @clubanderson in #1335
- fix: remove invalid workflows permission, use WORKFLOW_TOKEN PAT by @clubanderson in #1336
- [0.7] remove prepare data plugin by @robertgshaw2-redhat in #1337
- fix: persist-credentials false so WORKFLOW_TOKEN works for fork pushes by @clubanderson in #1338
- Bump deprecated kgateway path to v2.2.3 by @danehans in #1234
- [Docs] Polish Getting Started and Glossary by @ahg-g in #1333
- [Guides Storage]: Align tiered-prefix-cache storage guide with optimized-baseline by @kfirtoledo in #1318
- Add flow control well-lit path guide. by @LukeAVanDrie in #1301
- removed empty docs by @ahg-g in #1339
- [Guides] Clean Up README by @robertgshaw2-redhat in #1341
- [Guides] TPU PD Disaggregation to Qwen 3.5 on TPU v7 by @yangligt2 in #1327
- [Guides] Remove TMP Doc by @robertgshaw2-redhat in #1343
- [Bugfix] Offloading Connector by @Gregory-Pereira in #1312
- [Docs] P/D Tweaks For Deprecated Features by @robertgshaw2-redhat in #1351
- [Guides] Standardize H1 by @robertgshaw2-redhat in #1349
- remove stale e2es by @Gregory-Pereira in #1352
- [Release] Bump to GAIE v1.5.0 by @robertgshaw2-redhat in #1353
- Fix AWS image by @diegocastanibm in #1356
- [Docs] Added docs/well-lit-paths/README.md file by @ahg-g in #1358
- [Docs] Added workload autoscaling and async processing well-lit paths docs by @ahg-g in #1359
- [Docs] Clean Up Proposals by @robertgshaw2-redhat in #1342
- [CI] Add predicted-latency-based-scheduling nightly E2E (OCP/GKE/CKS) by @kaushikmitr in #1347
- fix: add missing versions to bug report template by @liulanze in #1357
- [Guides] Add lightweight TPU landing page for P/D Disaggregation by @yangligt2 in #1346
- [PD - 0.7 release] Update Image by @robertgshaw2-redhat in #1361
- [Docs]: Update docs with latest data producer plugin changes by @rahulgurnani in #1313
- fix: correct typos in documentation by @EzgiTastan in #1364
- [Build] Use Proper Wheel Variant by @robertgshaw2-redhat in #1362
- Moved the note on the intention of the guides up by @ahg-g in #1373
- Fix HPU image name by @diegocastanibm in #1374
- [Docs]: Metrics and Tracing by @madhugoutham in #1207
- add back hpu optimized baseline by @Gregory-Pereira in #1378
- [Guides] Refactor Tiered Prefix Cache by @liu-cong in #1345
- [Build] DeepGEMM Version by @robertgshaw2-redhat in #1367
- Update local build by @diegocastanibm in #1382
- [Guides] Fix duplicate metrics service port in predicted-latency values by @kaushikmitr in #1383
- [bugfix] Fix DeepGEMM JIT in llm-d builds by @robertgshaw2-redhat in #1386
- [Docs] Promote wip-docs-new to production location by @chcost in #1340
- deps(actions): bump github/codeql-action from 3 to 4 by @dependabot[bot] in #1394
- deps(actions): bump docker/login-action from 3 to 4 by @dependabot[bot] in #1395
- deps(actions): bump docker/setup-buildx-action from 3 to 4 by @dependabot[bot] in #1396
- deps(actions): bump docker/build-push-action from 6 to 7 by @dependabot[bot] in #1397
- deps(actions): bump aquasecurity/trivy-action from 0.35.0 to 0.36.0 by @dependabot[bot] in #1398
- update: clean deprecated env variable for vllm to build AVX by @zdtsw in #1231
- Polishing docs and drawings by @ahg-g in #1404
- ci: migrate nightly GKE pd disaggregation test to Kustomize & Gateway… by @yangligt2 in #1403
- Fix: nightly CI pipeline fails due to broken Helm values path by @weizhoublue in #1388
- Fix/predicted latency duplicate metrics port by @kaushikmitr in #1406
- Fix MDX syntax error in optimized-baseline guide by @chcost in #1405
- [Docs] Migrate Async Processor Helmfile -> Direct Helm Application by @shimib in #1366
- Add CNCF Sandbox project references to README by @alexagriffith in #1375
- Enable
/test-nightly build-imageby @maugustosilva in #1409 - one build approval per set of chagnes in a PR by @Gregory-Pereira in #1413
- chore(xpu): bump VLLM_XPU_COMMIT_SHA to v0.19.1 by @xiaojun-zhang in #1418
- enable security events on nightly-permissions workflow by @Gregory-Pereira in #1410
- Simplified EPP components headers by @ahg-g in #1421
- Make active-active HA the default for precise-prefix-cache-aware by @vMaroon in #1415
- drop ci for simulated accelerators by @Gregory-Pereira in #1387
- docs: document feature gate removal process by @rahulgurnani in #1419
- [Docs] Make sure all references are relative by @ahg-g in #1407
- deps(docker): bump LMCache from v0.4.1 to v0.4.4-cu13 by @liulanze in #1368
- Added a how to configure section to the EPP docs by @ahg-g in #1425
- Remove wva helmfile resources by @mamy-CS in #1424
- Fix InferencePool diagram reference by @chcost in #1416
- [Guides] Add TPU KV cache offloading support to tiered prefix cache by @dannawang0221 in #1411
- [Docs] Clean Up Artifacts by @robertgshaw2-redhat in #1370
- Unified drawings color theme, sizes and placement by @ahg-g in #1431
- Added more clarifications on proxy terminology and its relation to cloud load balancers by @ahg-g in #1435
- Updated feature matrix by @ahg-g in #1436
- refine the terminology under router by @ahg-g in #1437
- Improve clarity of RDMA and networking documentation by @alexagriffith in #1385
- fixed typos by @ahg-g in #1438
- remove upstream_versions by @diegocastanibm in #1439
- added nightly badges for p/d on gke and predicted latency by @ahg-g in #1432
- Add "llm-d.ai/subguide" tracking labels for Prefix Cache Offloading by @adinilfeld in #1428
- Add a nightly P/D on GKE badge to the main matrix by @maugustosilva in #1444
- AMD guides and docker update by @vcave in #1443
- Bump Istio from 1.29.1 to 1.29.2 by @liulanze in #1447
- Fix broken link in openshift-aws documentation by @petecheslock in #1451
- Attempt to deploy full Qwen3-32B for tiered-prefix-cache by @maugustosilva in #1445
- upgrade to CNCF runners by @Gregory-Pereira in #1455
- add tiered prefix cache nightly for gke by @liu-cong in #1454
- Revert "upgrade to CNCF runners" by @Gregory-Pereira in #1458
- docs: disable OpenShift predicted-latency nightly by @kaushikmitr in #1452
- optimized-baseline: mount /.triton and /.config emptyDirs for non-root pods by @rdwj in #1450
- Bump agentgateway from v1.0.0 to v1.1.0 by @liulanze in #1459
- Rename resources-new to resources by @chcost in #1460
- Add GKE CPU offloading LMCache nightly, rename workflows, and fix GPU requirements by @liu-cong in #1457
- [Guides] update precise prefix-cache routing guide's benchmark data by @vMaroon in #1462
- feat: modularize GKE NCCL tuner patch into shared component by @liu-cong in #1461
- Various fixes to the guides by @ahg-g in #1467
- (Guides) refresh scheduling-guides benchmark reports with v0.8.0 numbers by @vMaroon in #1463
- nightlies should comment status on PR by @Gregory-Pereira in #1465
- remove feature-matrix.md by @ahg-g in #1474
- Updated GKE's supported paths by @ahg-g in #1475
- Updated GKE provider docs on what paths are supported on what hardware by @ahg-g in #1477
- [Bugfix] Fix WideEP Build + IMA by @robertgshaw2-redhat in #1448
- removed redundant nightly tests badges by @ahg-g in #1476
- Update GKE PD nightly by @liu-cong in #1456
- [Guides] Bump WideEP by @robertgshaw2-redhat in #1324
- point to the async processor well lit path instead of the guide directly by @ahg-g in #1479
- [Guides | Bugfix] Consistency + Fix HTTPRoutes in Gateway Mode by @Gregory-Pereira in #1348
- Add recipe for llm-d-fs connector storage offloading with Lustre by @Sneha-at in #1427
- Polish the main README.md file by @ahg-g in #1482
- Fix: PD disaggregation CKS nightly values by @weizhoublue in #1484
- Move badge matrix to release by @diegocastanibm in #1481
- Add release template to repository by @maugustosilva in #1487
- optimized-baseline README.md fix by @Amit-Berman in #1486
- Final batch of CI/CD fixes before release v0.7.0 by @maugustosilva in #1480
- Switch
llm-d-cudaonwide-ep-lwstov0.7.0by @maugustosilva in #1489 - Add additional steps for release process. by @maugustosilva in #1491
- Add documentation release branch step to release template by @chcost in #1492
- Add Performance Highlights, v0.7 news, and badge bump to README by @chcost in #1483
- Prep v0.7.0 by @diegocastanibm in #1270
- fix pd.values by @diegocastanibm in #1493
New Contributors
- @hhk7734 made their first contribution in #1111
- @sharvil10 made their first contribution in #1101
- @madhugoutham made their first contribution in #1126
- @hexfusion made their first contribution in #1123
- @LukeAVanDrie made their first contribution in #1127
- @ianliuy made their first contribution in #1135
- @amacaskill made their first contribution in #1102
- @rebel-jinmoo made their first contribution in #1115
- @praveingk made their first contribution in #1181
- @nilig made their first contribution in #1184
- @revit13 made their first contribution in #1287
- @ying-jeanne made their first contribution in #1267
- @xiaojun-zhang made their first contribution in #1297
- @sudoalok made their first contribution in #1264
- @lioraron made their first contribution in #1187
- @liulanze made their first contribution in #1357
- @rahulgurnani made their first contribution in #1313
- @EzgiTastan made their first contribution in #1364
- @weizhoublue made their first contribution in #1388
- @alexagriffith made their first contribution in #1375
- @adinilfeld made their first contribution in #1428
- @rdwj made their first contribution in #1450
- @Amit-Berman made their first contribution in #1486
Full Changelog: v0.6...v0.7.0