ProxMenux v1.2.2
Stable consolidation of the v1.2.1.x beta cycle. Four prereleases of feature work and fixes land together on the stable channel: a much more configurable Health Monitor (per-category thresholds, per-event dismiss durations, an audit log of active suppressions), a notification stack that reaches ~80 services through Apprise and persists events across Quiet Hours, automatic update detection for LXC containers, and a long list of operator-visible fixes — HTTPS terminal handshakes, kernel update detection on PVE 9.x, NVIDIA installer flow on Alpine, and a quieter Monitor process on idle hosts.
✨ What's new
Health Monitor — more configurable, more granular
Three pieces that together let an operator dial the Health Monitor to their environment instead of working around its defaults.
-
Per-category thresholds. Every check the Health Monitor runs is parameterised by a pair of numbers — a Warning threshold and a Critical threshold — and both are now exposed in Settings → Health Monitor Thresholds. A homelab with a single-disk SSD may want to page earlier on capacity (75 / 90 %), a datacentre host with redundant Ceph nodes can be more relaxed on memory warnings (90 % is normal under ZFS ARC), a passively-cooled mini-PC needs lower temperature thresholds than a server with forced-air cooling. The same numbers also feed the colour ranges of the dashboard widgets — the temperature line in the disk-temperature modal, the bars on the storage cards, the chips on the CPU / memory panels — so the visual classification always matches what actually triggered the alert.
-
Per-event dismiss duration. The Dismiss button on every Health Monitor alert now opens a small dropdown with three options: 24 hours, 7 days or Permanently. The 24h / 7d paths behave like the previous time-limited dismiss; Permanently persists the alert with
suppression_hours = -1, never re-emits, and is marked with a distinct amber Permanent badge so the operator can always see which alerts are intentionally silenced.POST /api/health/acknowledgeaccepts an optionalsuppression_hoursbody field for this; omitting it preserves the previous behaviour (the category's configured default applies). -
Active Suppressions panel. A new section inside Settings → Health Monitor, right below the per-category suppression durations, lists every currently-silenced alert — both time-limited dismisses (with a remaining-time badge like 22h remaining) and permanent ones. Each row carries the
error_key, category, severity, the timestamp the alert was dismissed, plus a Re-enable button that clears the acknowledgment so the alert can fire again on the next scan. The Re-enable action is gated by the Health Monitor Edit mode at the top of the card and is committed alongside any per-category changes on Save. Permanent dismisses can only be reverted from here — the dashboard intentionally does not expose a per-alert un-dismiss affordance. -
Disk I/O severity tiers. Sliding 24 h window classifies dmesg ATA / SCSI errors into silent (0–10), WARNING (11–100) and CRITICAL (100+ or any hard error like UNC / Buffer I/O / Sense Key Hardware Error), so quiet days stay quiet and a single Buffer I/O event still pages immediately.
Notifications — Apprise, Quiet Hours buffering, AI rework
-
Apprise notification channel. One Apprise URL talks to ~80 services (Pushover, ntfy, Slack, Matrix, mailto, signal, Pushbullet, Mattermost, ...) without ProxMenux needing a dedicated adapter for each. The Apprise tab now exposes full feature parity with the native channels: the same Notification Categories block, per-event sub-toggles, Quiet Hours and Daily Digest controls as Telegram, Gotify, Discord and Email. The backend already supported per-channel filtering for Apprise via the generic
channel_overridesblock; the UI just wasn't surfacing it. -
Quiet Hours buffering. Events suppressed during a channel's quiet window are now persisted to SQLite and released as a grouped summary when the window closes, instead of being silently dropped.
-
AI Enhancement, redesigned. The AI Enhancement subsection in Notifications was rewritten from a muted uppercase row that testers consistently scrolled past, to a normal-case foreground label with a leading Sparkles icon and a persistent badge (green Active when AI is enabled, neutral Optional when it isn't) so the feature is visible regardless of state.
Container updates and tooling
-
LXC update detection. A new dedicated section in Settings (between Health Monitor Thresholds and Notifications) with a single toggle that gates the per-CT
apt list --upgradable/apk list -uscan end-to-end. Default ON. When OFF the scan stops entirely (nopct execcalls), everytype=lxcentry is purged from the managed-installs registry immediately, and the matching notification toggle in Notifications → Services disappears from the UI while preserving its stored preference. The checker also reads the mtime of the CT's package-manager metadata cache and refreshes it viapct execif it's older than 24 h — a Debian 12 CT with a 524-day-old cache went from "0 updates" to "117 (12 security)" on lab hardware. -
Post-install function update detection. The Monitor tracks installed ProxMenux optimizations (Log2Ram, Memory Settings, System Limits, Logrotate, ...) and notifies when a newer version of any of them is available, with one-click apply from Settings.
Hardware support
-
NVIDIA driver update notifications. Kernel-aware detection of newer compatible driver versions, surfaced in the Hardware tab and as notifications when an upstream build is published.
-
Coral TPU installer. Uninstall path mirroring the NVIDIA flow, and registry-driven update notifications for both the PCIe
gasket-dkmsdriver (tracked against feranick/gasket-driver) and the USBlibedgetpu1runtime (tracked viaapt). -
Secure Gateway (Tailscale) updates. One-click Tailscale update from Settings with Last-checked / Installed / Latest indicators and notification when a new version is available.
Other improvements
- Helper-Scripts menu — richer context. Each entry now ships more useful information so it's easier to know what every script does before running it.
- Disk temperature monitoring — improved readings, smarter caching across SMART probes, redesigned history modal opening at 24 h by default with min / avg / max statistics.
- VM and LXC modal — expanded so a single panel covers the data you previously had to look up across multiple tabs.
- Page load — faster first paint and lighter network usage on the Overview, Storage and Hardware tabs.
- Security tightening — tighter authentication checks across notification, scripts and terminal endpoints, plus a more conservative default policy for new installs.
🛠️ Notable fixes
-
Terminal modals on HTTPS hosts — every terminal modal (dashboard terminal, LXC terminal, script terminal) used to fail with WebSocket connection error on hosts with HTTPS enabled. Root cause: the
gevent + SSLpath stacked geventwebsocket'sWebSocketHandleron top of flask-sock's protocol implementation, so the server emitted two consecutiveHTTP/1.1 101 Switching Protocolsheaders and the browser closed the connection as a corrupt frame. Droppinghandler_class=WebSocketHandlerrestores a single 101 response and the handshake completes normally. -
Health Monitor kernel updates on PVE 9.x (#208) — the System Updates → Kernel / PVE row reported "Kernel/PVE up to date" on PVE 9.x hosts even when an update for the running kernel was waiting upstream. Three combined fixes: (a) the kernel-package prefix list now includes
proxmox-kernel-*andproxmox-firmware-*(PVE 9.x ships kernels underproxmox-kernel-, notpve-kernel-as in 7.x / 8.x), (b) the dry-run switched fromapt-get upgrade --dry-runtoapt-get dist-upgrade --dry-runso kernel updates packaged as new installs are visible at all, (c) the categoriser readsuname -rand flags an update as a running-kernel update when the package matches the running release. The row now distinguishes "Running kernel update available (reboot required)" from "N kernel update(s) available (none for running kernel)". -
NVIDIA installer kernel compatibility, Alpine LXC and NVENC — the version menu now respects the running kernel compatibility window, only offering driver branches that won't fail to compile. Container-side userspace install reworked so it succeeds on Alpine hosts, and free-space detection works reliably across all storage layouts. When the host has the NVENC patch applied, the version menu narrows to drivers supported by the patch so reinstalling never silently loses it.
-
Apprise integration hardening — three independent fixes:
- Mobile overflow on narrow viewports in the Apprise URL row (placeholder reduced to a single concise example, input wrapper enforces
min-w-0 / flex-1 / shrink-0, examples paragraph usesbreak-all min-w-0). - Backend whitelist rejecting Apprise with HTTP 400. The notifications-test validator's hard-coded channel set (
{telegram, gotify, discord, email, all}) was missingapprise, so every Apprise test or send returned 400 Invalid channel before the library was even invoked. The whitelist is now derived live fromnotification_channels.CHANNEL_TYPESso adding a new channel cannot silently regress this validator again. - Apprise error reporting. When a destination (
jsons://,ntfy://,slack://, ...) returns a non-2xx response, the channel now captures Apprise's internal logger duringnotify()and surfaces the real HTTP status plus the destination's response body (capped at 300 chars) instead of the opaque "Apprise rejected the notification (transport failure)" message.
- Mobile overflow on narrow viewports in the Apprise URL row (placeholder reduced to a single concise example, input wrapper enforces
-
fail2ban-client subprocess storm — the cache wrapper around
_f2b_get_banned_ips()only updated its timestamp on success, so on hosts wherefail2ban-clientreturnedENOENT(binary not installed) the function fell through the cache check on every single HTTP request and fired 250+ failedexecvecalls in a 10-minute window.shutil.which('fail2ban-client')is now resolved once at module load and the cache timestamp is updated unconditionally. -
smartctl scheduler collision — disk SMART temperature polling, CPU temperature read and latency probe used to fire at the same offset within each minute, producing a measurable CPU / IO spike when all subprocesses spawned together. The polls are now staggered (latency, then CPU temperature, then disk SMART) while preserving the per-disk 60 s cadence.
-
LXC inventory subprocess — the mount monitor used to call
lxc-info -n <vmid> -pfor every running CT just to get its PID. It now reads/proc/<lxc-start-pid>/task/<lxc-start-pid>/childrendirectly and falls back tolxc-infoonly when/procreads fail, eliminating one subprocess per CT per scan cycle. -
Browser-translated terminal pages — the terminal panel used to lose its WebSocket connection when the user enabled the browser's auto-translate feature, because the translator moved DOM nodes that React still held refs to. Added
translate="no"on the terminal container divs so the translator skips the embedded tty entirely. -
Active Suppressions UX — re-enables are now queued (green border + strike-through on the row + button label changes to Undo) and applied atomically when the user clicks Save, alongside any per-category dropdown changes. The list also refreshes automatically when an alert is dismissed from the dashboard while the Settings page is already open, via a
health-suppression-changedbrowser event plus listeners on windowfocusand documentvisibilitychange. -
Minor stability —
ATAdisk errors are now recorded indisk_observationsbefore the SMART gate (transient errors that don't yet trip SMART still build the per-disk history); the Quiet Hours toggle persists correctly after a refresh; the Login screen no longer swallows a 401 forever after a brief stale-token state; PVE webhook URLs follow the active SSL state automatically;log2ramrestarts after a configured size change.
⬆️ Upgrading from v1.2.1
ProxMenux notifies stable users automatically on the next menu launch. The Monitor service restarts in-place — no host reboot is needed for the upgrade itself. If you were running a 1.2.1.x beta, the same menu flow detects that you are now on the published stable channel and offers to switch you off the beta installer.
If you customised any Health Monitor settings before upgrading, they are preserved verbatim — the new Health Monitor Thresholds panel adds new defaults but does not overwrite existing values. The per-category suppression durations you had configured continue to apply as the default when a per-event Dismiss is fired without an explicit window choice.
🙏 Acknowledgments
This release would not look the way it does without the contributions and feedback from the community. Special thanks to:
Code contributors
@jcastro landed five direct improvements that ship with v1.2.2:
- Select VM ISOs from all ISO storages — new shared helper
scripts/global/iso_storage_helpers.shplus integration invm_creator.sh,select_linux_iso.shandselect_windows_iso.sh, so the ISO picker now reads from every storage tagged as ISO content instead of being pinned tolocal. Commit092b548d. - Release channel switcher in Settings — a proper menu under
scripts/menus/config_menu.shto flip between the stable and beta install channels in-place, with the rightversion.txt/beta_version.txthandling on each side. Commitf8a8c43d. - ZFS autotrim in the auto post-install —
auto_post_install.shnow enablesautotrim=onon root ZFS pools by default (with the matching disable in the uninstall path), so SSD-backed installs reclaim freed space without manual intervention. Commit8877f987. - Webhook loopback detection + update handoff —
flask_notification_routes.pycorrectly classifies127.0.0.1/localhostwebhooks as loopback, and themenuscript's update handoff no longer flakes on edge cases. Commit70ab072c. - Figurine bumped to 2.0.0 — banner tool refresh in
customizable_post_install.sh, with the doc page updated to match. Commitaba94028.
@pespinel fixed a beta-installer regression that broke service paths after the move to the new runtime layout — install_proxmenux_beta.sh now resolves the right systemd unit paths on first install and on update. Commit 0daab74a.
Field reports that shaped the GPU & Coral work
@ghosthvj's detailed reports and suggestions on the hardware passthrough flow drove the round of improvements that ship in v1.2.2 for the three GPU scripts:
scripts/gpu_tpu/nvidia_installer.sh— kernel-aware version menu, Alpine LXC userspace support, NVENC-patch awareness, uninstall feedback, free-space detection fixesscripts/gpu_tpu/switch_gpu_mode.sh— orphan audio cascade on detach, precise hostpci regex, vfio.conf cascade extension (the full GPU + audio companion lifecycle hardening described in the GPU + Audio Passthrough section above)scripts/gpu_tpu/add_gpu_vm.sh— iGPU audio-companion checklist on attach, two-pass scan that protects the HDMI audio of other dGPUs left in the VM
Coral TPU on LXC — latest upstream drivers
The Coral installer for LXC (scripts/gpu_tpu/install_coral_lxc.sh) was rewritten end-to-end to install the latest upstream gasket-dkms driver and libedgetpu1 runtime (220 lines added, 150 removed). Coral M.2 / mPCIe modules that previously failed on PVE 9 kernels now install and bind cleanly, and the registry-driven update notifications introduced in v1.2.1.2 keep both packages fresh going forward.
Everyone else
A huge thank you to every user who opened an issue, commented in GitHub Discussions, reported a bug on the community channel, or just stopped by to say what worked and what didn't on their hardware. Most of the internal improvements in this release — the smartctl scheduler stagger, the fail2ban cache fix, the lxc-info /proc replacement, the HTTPS terminal handshake, the kernel-update detection on PVE 9.x, the Apprise wiring — started as a report from somebody running into the issue. Keep them coming.