New checks
Node:
- Disk temperature — warns when a disk temperature exceeds the configured threshold (requires SMART data).
- Disk SMART errors — detects reallocated sectors, pending sectors, offline uncorrectable sectors, UDMA CRC errors and reported uncorrectable errors. Each is a separate check with its own threshold.
- ZFS vdev state — warns when a ZFS pool vdev is in a degraded or faulted state.
- ZFS vdev I/O errors — warns when a ZFS pool vdev has accumulated read, write or checksum errors.
- ZFS pool errors — warns when a ZFS pool reports errors.
- LVM-thin metadata usage — warns when LVM-thin metadata usage is high. A full metadata volume causes data corruption.
- IOWait — warns when node IOWait (from RRD data) exceeds the configured threshold.
- Root filesystem usage — warns when the root filesystem usage exceeds the configured threshold.
- SWAP usage — warns when SWAP usage exceeds the configured threshold.
- PSI CPU / IO / Memory pressure — warns when Linux Pressure Stall Information (PSI) metrics exceed thresholds (PVE 9.0+).
VM/CT:
- Pending config changes — warns when a VM or container has configuration changes that require a reboot to take effect.
- VM state in snapshot — warns when a snapshot includes the RAM state, which significantly increases snapshot size and restore time.
LXC:
- Privileged container — warns when a container runs as privileged (root inside = root on host).
- Privileged container without AppArmor — critical when a privileged container also has AppArmor disabled (no kernel confinement).
- Nesting without keyctl — warns when nesting is enabled but
keyctlis not (required for Docker-in-LXC). - Raw LXC config — warns when a container has raw LXC config entries that bypass PVE abstractions.
- Swap = 0 — warns when a container has swap disabled (OOM killer risk under memory pressure).
- No hostname — info when a container has no hostname configured.
Cluster:
- No backup job — warns when no backup job is configured for any VM/CT.
- Backup job without compression — info when a backup job has no compression configured.
- Backup job without retention — warns when a backup job has no maxfiles/prune policy (storage will fill up).
- No HA resources — info when no HA resources are configured (VMs won't restart on node failure).
- No storage replication — info when no storage replication jobs exist.
- Cluster firewall disabled — warns when the cluster-level firewall is disabled.
- Cluster firewall policy — warns when inbound or outbound firewall policy is not DROP.
- root@pam without TFA — critical when the root user has no two-factor authentication configured.
- Admin users without TFA — warns when admin users have no TFA configured.
- Overly broad permissions — warns when a user has the Administrator role at root path
/instead of pool- or node-scoped permissions. - Disabled user with active API token — warns when a disabled user still has valid API tokens that should be revoked.
Improvements
- Unique error code per check — every check now has a distinct code in the format
[gravity][context][0001-9999](e.g.WN0014,CQ0001). This makes it possible to ignore individual checks precisely via ignore rules. - macOS
.pkgpackages — releases now include.pkginstallers forosx-x64andosx-arm64. - Packages for Linux — releases now include
.deband.rpmpackages foramd64,arm64andarmhf/armv7hl. AUR package updated automatically on release.
Breaking changes
- Error codes changed — all codes have been reassigned. Existing ignore rules must be updated. Run
cv4pve-diag diagand check theCodecolumn for the new values. - Settings JSON structure changed — new top-level sections
Rrd,SmartDisk,Backup,NodeStoragehave been added. Regenerate yoursettings.jsonwithcv4pve-diag create-settings.
What's Changed
- Release v2.0.2 — new checks, unique ErrorCodes, macOS/deb/rpm packaging by @franklupo in #21
Full Changelog: v2.0.1...v2.0.2