Highlights
PVE 9.2 GA coexistence (HA CRS, custom CPU models, SDN fabrics, removed scheduling= key), runtime hardening (per-node + auth circuit breakers), Capacity Outlook rebuilt off the WMA-current endpoint onto true linear-regression forecasting, 10 customer-reported fixes batched, Docker plugin loading restored. 41 commits since v0.9.10.3.
⚖️ PVE 9.2 coexistence
- HA CRS auto-rebalance — two-layer skip logic so PegaProx LB doesn't ping-pong PVE-CRS-managed VMs. Cluster-wide gate on
/cluster/options.crs.ha-auto-rebalancemeans pre-9.2 clusters behave exactly as before. scheduling=payload key stripped on PVE 9.2+ (key was removed in 9.2; older PVE versions still accept it).- Surface coverage: HA resource
auto-rebalance+disarmedflag, SDN fabrics withwireguard/bgpprotocol + dryrun apply, SDN route-maps + prefix-lists CRUD passthrough, API-token regenerate endpoint, LXC mountpointidmap+keepattrsfields, storageapproximate-sizebadge, PBS/storage/{id}/identitypassthrough, migrationdowntime-limitcap to 2000s, OIDCaudiencesfield. - Dynamic per-arch CPU types pulled live from
/nodes/{n}/capabilities/qemu/cpu(replaces the hardcoded list — picks up custom CPU models on PVE 9.2+). - New CRS settings UI panel (HA + percent thresholds + auto-rebalance tuning) — also mirrored into the Native HA panel so both views stay in sync.
- VM CPU sub-options:
reported-model+levelinputs surfaced in the VM config dialog. - QEMU 11 (PVE 9.2) HMP rename: V2P now falls back from
block_job_complete/block_job_canceltojob_*so cross-hypervisor copies stop wedging on QEMU 11 hosts.
⚡ Dead-node performance
- Per-node circuit breaker stops the dead-node UI hang (~30s/poll → ~4s). Breaker opens after consecutive failures, half-open probe re-arms cleanly.
- SSH connect timeout capped at 8s via socket pre-create (was effectively 30s through the paramiko command-timeout path).
connect_to_proxmoxnow keeps a host-skip cache so it doesn't re-iterate a host that just timed out within the same request.- Per-node fan-outs parallelised via
gevent.spawn— 6-node clusters no longer serialise 4s timeouts. - Auto-recovery via PVE
/nodesaggregate (Corosync's view) — when PVE reports the node back online, breaker resets without needing a manual nudge. - #444 — Auth circuit-breaker stops stale-cred storms from DoS'ing
pveproxyafter a root-password rotation on the PVE side.
📊 Capacity Outlook rebuild
The corporate-dashboard Capacity Outlook card was presenting a WMA-current smoothed value as a forecast — a steady 60% CPU cluster was constantly flagged as "rising" because the rising threshold (70% of critical) tripped on normal mid-load. Rewired to /api/clusters/<id>/insights/forecast which runs proper linear regression with R²-gating against 30 days of metric history. New glance-card UI: one number (closest-ETA days, or stable) and one subtitle (worst metric + target threshold). Per-metric drilldown moved to the Insights page where it belongs. Duplicate /predictive-analysis route in reports.py removed — clusters.py always won the URL match anyway, the reports.py copy was dead code.
🛠️ Customer-reported fixes
- #448 @DarmokNoob — Cross-cluster LXC replication clone-call sent
name=(LXC schema rejects this; QEMU accepts it). Three call sites (xcrepl, intra-clusterrepl, manualclone_vm) now branch onvm_typeand emithostname=for LXC,name=for QEMU. - #446 @hugobugomugo — Encryption key rotation: column-name mismatches between the rotate transaction and the schema broke the whole pass. Aligned.
- #438 @crcro — ESXi → PVE migration: drive-mirror task logging,
ddprogress PID detection, dd-progress regex hardened. Plus the same diagnostic pattern applied to_ssh_pipe_transfer+ monitor. - #436 @aalandez — Snapshot-schedule race: the retention-delete pass kicked off before the create-task finished, hitting locked-snapshot errors on busy clusters. Schedule now waits for create-task completion before retention-delete.
- #422 —
xcreplsnapshot cleanup now usesforce=1and auto-unlocks the VM on ESTALE (target unreachable mid-cleanup). - #419 follow-ups — NodeCard netin/netout were treated as counter values (rendered absurd spikes); now interpreted as the rate the API returns. XCP-ng counterpart unified on the same rate contract.
- #417 follow-up (from @tfoks) — Keystore resolver now propagates
EACCESinstead of swallowing it; bundle support-log diagnostics also detects permission-block onjournalctl. - #412 follow-up —
oidc_allow_private_ipflag now covers token + userinfo endpoints, not only discovery (private-IP IdPs failed mid-flow before). - #357 follow-up —
PEGAPROX_LOG_LEVELenv var now also caps the per-clusterStreamHandler, not just the root logger. - Datacenter status / options endpoints now return
504instead of500when a host times out, and immediately pivot to a warm host instead of looping on the cold one.
🤝 Contributor PRs
- #445 @gyptazy / credativ — VM API route auto-resolves the node from
/cluster/resources?type=vm, so callers no longer need to look up the node first. Backward-compatible with explicit-node paths. - #421 @gyptazy — ISO sync now refuses to run on shared cluster storage (would result in duplicate copies; PVE handles cluster-wide sync natively for shared storage).
- #439 — Aikido container-autofix landed (Dockerfile hardening).
🐳 Docker / packaging
Dockerfilewas missingCOPY --chown=pegaprox:pegaprox plugins/ plugins/— the plugin loader (pegaprox/api/plugins.py) scanned theplugins/directory inside the container, found it empty, and quietly registered zero plugins. All five bundled plugins (client_portal,hello_world,notifications,proxmox-ha,status_page) were invisible to anyone running PegaProx on Docker. Single-line fix.- Sponsor asset updated:
images/sponsors/sponsor3.pngswapped from the wide banner to the square Banner Oranje logo and padded square so it renders at the same footprint as the Netwolk logo (200×200) in the README and the in-app sponsor card.
🌐 i18n
29 new translation keys across 7 languages (DE / EN / FR / ES / PT / KO / IT) covering the CRS settings panel, per-VM auto-rebalance, API-token rotate, OIDC audiences, plus a fresh Capacity-Outlook block (capacityCollecting, metricAtRisk, metricsAtRisk, allMetricsStable, overThreshold, etaIn, forecastEngine, stable, metric).
💎 Platinum Sponsors
- netwolk GmbH — Swiss managed-services partner
- Expertize.nl — Dutch Proxmox specialists
Sponsor PegaProx → opencollective.com/pegaprox | pegaprox.com/#sponsor
Upgrade: in-app updater, bash update.sh, or docker compose pull && docker compose up -d.
Docker: ghcr.io/pegaprox/pegaprox:v0.9.11 (linux/amd64 + linux/arm64).