github NVIDIA/dcgm-exporter 4.5.3-4.8.2

4 hours ago
  • Update to DCGM 4.5.3 and DCGM Exporter 4.8.2.
  • Improve GPU health metrics, including reporting GPU-wide health incidents such as fallen-off-bus XIDs.
  • Make /debug/pprof profiling endpoints opt-in via --enable-pprof / DCGM_EXPORTER_ENABLE_PPROF.
  • Add PodMapper informer caching for Kubernetes pod mapping (#626) (@jaeeyoungkim).
  • Add per-process GPU metrics for time-sharing and MIG (#594) (@krystiancastai).
  • Make Helm priorityClassName configurable with explicit defaults (#444) (@runzhliu).
  • Add MIG device support for HPC job labels (#602) (@jay-mckay).
  • Update go-dcgm field metadata handling, deprecated field alias resolution, health constants, policy registration handling, and version info APIs.
  • Document IPv6 address formats for remote hostengine and metrics listen addresses.
  • Refresh dependencies, container base images, Docker image references, Helm chart values, Kubernetes manifests, and tests for this release.

Don't miss a new dcgm-exporter release

NewReleases is sending notifications on new releases.