This release marks the general availability of the GPU allocation plugin.
Important: When upgrading from a previous release, please follow the upgrade instructions below.
Highlights
- Dynamic MIG device management (requires at least Kubernetes 1.34, and setting the new
DynamicMIGfeature gate). - Significant reduction of the ComputeDomain formation time in large-scale clusters (~10 seconds for domains comprised of thousands of nodes).
Improvements
- Added preliminary support for VFIO passthrough devices in the GPU plugin (not enabled by default, set the
PassthroughSupportfeature gate, see #668). - Added the memory
addressingModeas an attribute to announced GPU devices (#717). - Added support for GPU health checks (not enabled by default, use the
NVMLDeviceHealthCheckfeature gate, see #689). - Enhanced robustness of the ComputeDomain controller in view of deliberately or accidentally running replicas (#868).
- Made the ComputeDomain kubelet plugin crash in view of obvious MNNVL fabric configuration errors or degraded fabric health (#844, #865).
- Added a
networkPolicyparameter to the Helm chart to support clusters with restricted networks (#708). - Tuned binary search paths for widening Linux distribution support (#706).
All commits since last release: v25.8.1...v25.12.0
Upgrades
First, update CRDs by running
kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-dra-driver-gpu/refs/tags/v25.12.0/deployments/helm/nvidia-dra-driver-gpu/crds/resource.nvidia.com_computedomains.yaml
Only then update the chart by running helm upgrade -i ... (instead of helm install).
New feature gates and known limitations
- For now, enabling
DynamicMIGis mutually exclusive with enabling any ofMPSSupport,NVMLDeviceHealthCheck, andPassthroughSupport. - The new fail-fast behavior in the CD kubelet plugin can be disabled with the new
CrashOnNVLinkFabricErrorsfeature gate. - The scalability improvements for ComputeDomains come with a number of architectural changes under the hood. These can be disabled to restore 25.8.x behavior by disabling the
ComputeDomainCliquesfeature gate.