Highlights
80 commits since v0.9.11. The big-ticket items are a critical gevent-pool truthy-bug that made every fanout in PegaProx silent-sequential since the helper was written, PEGAPROX_WORKERS actually getting plumbed through to the WSGI spawn-pool (the env var only renamed a log line before this release), nine new monitoring surfaces on the Node/VM/Insights tabs, and an SSE reconnect path that gets the UI back to green within ~30s instead of sitting on a 15s cooldown.
⚡ Performance + stability
The run_concurrent helper in pegaprox/utils/concurrent.py (and a duplicate copy in core/manager.py) checked if GEVENT_POOL and GEVENT_AVAILABLE: before dispatching. gevent.pool.Pool.__bool__ is len(self) — so on an empty pool the check was False, the helper fell through to the sequential path, and every cluster-fan-out, every PBS scan, every multi-cluster aggregator ran one-call-at-a-time. F1 numbers below are the order-of-magnitude this single is not None cost us in production:
/healthand/vms-backup-statusparallelised — vms-backup-status 10.4s → 0.45s wall (22× on dev)./healthstorage fanout now filters ONLINE nodes first so a dead node can't drag everything into its 10s timeout./cluster-healthseven SSH probes (corosync rings + pvecm quorum + 5 systemctl checks) run as parallel greenlets capped at 12s — was sequentially up to 56s.PEGAPROX_WORKERSformula changedmin(cpu*2, 8)→max(8, cpu*4)and actually wired through tospawn=Pool(workers)onWSGIServer. Before this release the env var only changed the startup log; gevent.pywsgi defaulted to unlimited greenlets.- SSE reconnect: error-cooldown 15s → 10s, immediate metrics push after a successful reconnect. Cluster drops + comes back → corporate-dashboard tile flips green inside 30s instead of waiting out the cooldown.
broadcast_ssenow doesjson.dumps(..., default=str)so a straydatetime/set/bytesfrom a downstream serialiser doesn't silently kill the whole broadcast loop.- Reconnect-log spam backoff: INFO on attempt #1 + every 50th (~8min), DEBUG between. Customers running with debug-logging weren't reading the actual reconnect messages anymore.
📈 Monitoring expansion
Nine surfaces added across Node detail / VM detail / Insights tabs:
- Disk SMART data viewer modal — replaces the old
alert(JSON.stringify(...))button. Per-attribute table, OK/threshold colour. - Per-NIC traffic + error/drop counters parsed out of
/proc/net/devvia SSH. - Top-N noisy neighbors card on Insights — CPU / RAM / Disk-fill / Disk-IO / Net-IO with metric switcher.
- Per-Tag and Per-Pool rollups on Insights (groups VMs by tag/pool, sums CPU/RAM/disk).
- Cluster Health panel on Node detail — corosync rings, pvecm quorum view, 5 service cards (
pveproxy,pvedaemon,pve-cluster,corosync,pvestatd). Renders an explainer instead of five empty?cards when SSH isn't configured. - lm-sensors panel (CPU temp / fan RPM / voltages) where the host has
lm-sensorsinstalled. - Guest-Agent enrichment expander on VM detail: kernel_version, per-NIC IPs, real filesystem fill, logged-in users, clock-skew vs host.
- Insights PDF export now carries Top Talkers + Rollups blocks.
- 200+ new i18n strings across DE / EN / FR / ES / PT / KO / IT.
🛠️ New features
- ACME DNS-01 challenge support with RFC 2136 dynamic update — BIND / PowerDNS / Knot. @gyptazy PR #429, sponsored by credativ.
- PVE node subscription management — new Subscriptions tab in Datacenter with cluster-wide aggregator + per-node set / refresh / delete. @gyptazy PR #508.
- Cluster Worldmap — offline geographic view with zoom + country capitals overlay (no Mapbox dependency).
- First-run setup wizard — replaces the hardcoded
pegaprox/admindefault. Fresh installs go through theNOT_INITIALIZEDflow and pick their own admin credentials.
🔒 Security
- 25 client error-response sites refactored through the
safe_error()helper — no more rawstr(e)leaking stack-fragments, internal paths, paramiko or sqlite internals to the browser. - OIDC
redirect_afteropen-redirect closure —//attacker.comwas passing the oldstartsWith('/')gate. New_safe_internal_pathrejects protocol-relative, CRLF, backslash, control chars. Mirrored client-side. - PBS notification create / update mass-assignment —
_sanitize_pbs_kwargswhitelist; key matches^[a-z][a-z0-9-]{0,30}$, scalar values only. migrate_db._row_counts_plainSQL hotfix — v0.9.10.1 had only patched the encrypted variant with_safe_quote_ident; the plain-DB variant still concatenated table names. Aligned.- Three defense-in-depth audit passes — RFC-1035 node-name regex on the new monitoring endpoints, payload-limit clamp on
/rollups,shlex.quoteon the SSH systemctl service name, PVE-boundary regex on returned node / storage names. - All third-party Actions pinned to commit SHA + tag comment (CodeAnt #511, #512).
- Template-injection fix on the release-tag interpolation in
release-images.yml(env: indirection). - Debian cloud-image SHA512-verified before
virt-customize. - Alert email payloads HTML-escape user-controlled fields; audit log strips CR / LF / U+2028 / U+2029 from log lines.
- ACME directory URL routed through
sanitize_outbound_url. - Plain-JSON config fallback removed from 5 load paths — the encrypted DB is the single source of truth, the
.jsonsiblings were only useful during migration anyway.
🐛 Customer-reported fixes
- #413 — Site Recovery: six layers fixed across the month. broadcast_sse crash on SDN list-shape, missing
get_vmsshim on the Proxmox manager, target_vmid awareness for xcrepl-backed plans, qmigrate-abort detection, planned/emergency re-runnable from completed/failed status, stale-token retry increate_api_token(Proxmox occasionally returns "Token already exists" on a token that was just deleted — retry with delete-first now). - #438 @crcro — ESXi migration: sshfs+drive_mirror live-pivot was not updating the persisted disk reference, so the pivot succeeded on disk but PegaProx still reported the old path.
- #444 — PVE auth circuit breaker: rotating root password on the PVE side no longer slow-DoSes
pveproxywith retry storms. - #451 — Login text invisible in Modern UI corporateLight theme (foreground vs background colour from the wrong CSS var).
- #455 / #456 — LXC replication race + hostname / MAC resolution.
- #484 — HA maintenance flag preservation across reboot + SSE heartbeat.
- #508 — Subscription-key masking on the cluster-wide aggregator (key showed up unmasked in the cluster view but was masked correctly per-node).
- #509 — SQLCipher
mlockENOMEM spam in rootless containers. NewPEGAPROX_CIPHER_MEMORY_SECURITYenv with anautodefault that detects rootless via euid + RLIMIT_MEMLOCK and skips the mlock attempt. - #510 — Per-node status log demoted info → debug (was filling up
logs/<id>.logwith status pings at INFO every poll).
⚙️ Defaults changed
PEGAPROX_WORKERS: formula nowmax(8, cpu_count * 4). Override via env still wins.PEGAPROX_CIPHER_MEMORY_SECURITY: new env (auto/on/off, defaultauto). At-rest AES-256 is unchanged — this only controls the mlock attempt on the in-memory key buffer.
💎 Platinum Sponsors
- netwolk GmbH — Swiss managed-services partner
- Expertize.nl — Dutch Proxmox specialists
Massive thanks 🙌. Sponsor PegaProx → opencollective.com/pegaprox | pegaprox.com/#sponsor
Upgrade: in-app updater, bash update.sh, or docker compose pull && docker compose up -d.
Docker: ghcr.io/pegaprox/pegaprox:v0.9.12 (linux/amd64 + linux/arm64).