github MacRimi/ProxMenux v1.2.2.1-beta

pre-release8 hours ago

ProxMenux logo ProxMenux v1.2.2.1-beta

Version label bumped to v1.2.2.1-beta across the UI

AppImage/package.json, the release-notes modal constant, and the three hardcoded version strings in the UI (login footer, dashboard footer, and storage-report footer) all now read v1.2.2.1-beta. The release-notes modal's CURRENT_VERSION_FEATURES array was also rewritten to advertise this beta's three highlights (header Critical fix, auto-reconcile, and Send Test consistency) instead of carrying over the v1.2.2 stable list.

Fix — Coral PCIe (gasket-dkms) driver: false-positive "update available" notification after re-install

Issue: The Coral TPU driver update — v1.0-18.x notification kept firing every cycle, even immediately after running the installer.

Root cause: install_coral.sh uses dkms install gasket/1.0 from a Git clone of feranick/gasket-driver, so the version registered with DKMS is always the bare upstream 1.0 (no patch level). managed_installs._detect_coral_host fell back to dkms status (since the script doesn't ship a .deb, so dpkg-query gasket-dkms returns nothing), read 1.0, and the update comparator's "no -N segment ranks as 0" rule made any feranick tag (1.0-18.x) look newer.

Fix: install_coral.sh now resolves the latest feranick release tag via the GitHub API right after the DKMS install succeeds and writes it to /var/lib/proxmenux/coral_gasket_version. The full uninstall path removes the marker so a clean wipe leaves no stale signal. _detect_coral_host now reads that marker first as the authoritative installed version; the existing dpkg / dkms fallbacks remain in place for installations that pre-date this change.

Existing installations need to re-run the installer once for the marker to land — or, as a one-off recovery, write the current feranick tag manually to that path. The detector treats both as equivalent.

Fix #226 — Notification "Send Test" buttons: consistent position and channel color

Every notification channel's Send Test button now sits on the left side of the form and is tinted with the channel's brand color, matching the active-tab text. Telegram is blue, Gotify green, Discord indigo, Email amber, and Apprise cyan — all with white text. Previously, the Apprise button was right-aligned and the only one in color, while the other four were left-aligned and neutral. The fix unifies all five buttons to the same wrapper layout and applies channel-specific brand colors across the row.

Fix #228 — Header "Critical" badge respected user dismissals

Issue: When permanently dismissing every CRITICAL alert in a category, the popup at the bottom of the page correctly displayed "0 Critical" while the badge in the top right kept showing Critical.

Root cause: Every _check_* in the health monitor was already pushing its raw status into the critical_issues, warning_issues, and info_issues lists before the existing dismiss-aware merge ran, so the global overall status was computed off a pre-filter view. The popup compensated for this on the client side, which caused the two surfaces to disagree.

Fix: The rollup now runs a final pass — _apply_dismiss_aware_status — over details[<category>] after the checks complete, demotes a block to OK when every underlying error_key is acknowledged (annotating dismissed/permanent markers per entry), and rebuilds the issue lists from those post-filter statuses. The badge, the popup, future notifications, and any external API consumer of /api/system-info now see the same dismiss-aware view. The change is symmetric: a CRITICAL alert that has not been dismissed still pushes the badge to Critical exactly as before.

Health Monitor auto-reconcile — PVE storage removed, orphan remote mounts, deleted CT mounts

_cleanup_stale_resources (the periodic resolver that auto-clears errors for resources that no longer exist) gained three new cases that previously left alerts pinned forever:

  • PVE storage removed from pvesm. Errors keyed storage_unavailable_<id> and pve_storage_full_<id> now resolve automatically when the matching storage no longer appears in pvesm status. The check is gated on pvesm actually returning a non-empty list, so a momentary timeout doesn't wrongly clear genuine errors.
  • Orphan remote mounts. mount_<status>_<target> errors resolve when the target is no longer present in /proc/mounts as an NFS/CIFS/SMB entry. LXC-scoped entries (ct123:/mnt/foo) are deliberately left to the existing VM/CT cleanup path so the host-side reconciler doesn't poke around inside a stopped container.
  • LXC mount errors when the CT was deleted. lxc_mount_<vmid>_<mount> errors resolve when the matching VMID is no longer reported by qm/pct config. The existing VM/CT block was matching on vm_/ct_/vmct_ prefixes only, so the LXC-mount-capacity check fell through that filter and its alerts persisted across CT deletion.

Dashboard restyle — Overview / Storage / Network / VMs & LXCs header cards

The four header cards at the top of each main tab were rebuilt to reduce the visual fatigue of the original Card + Progress block pattern and to align them with the rest of the Monitor's design language. The data they expose is unchanged; the interface surface is.

The usage cards (CPU Usage, Memory, Local Used, Remote Used, Total CPU Allocated, and Total Memory) pair a 72 px SVG donut on the left with two mini-rows on the right — each row a label/value pair stacked over a 1.5 px progress bar. The trailing summary row (Cores, Total, etc.) sits flush below the last bar without a separator. The count cards (Active VM & LXC on Overview, Total VMs & LXCs on the VM tab) use a single 4xl headline with a running badge on the right, then secondary pills (9 VMs, 4 LXC, 10 stopped) underneath. Network Traffic shows ↓ Down / ↑ Up headlines with a two-segment proportional bar and a Down % / Up % legend. Network Status moved from a stacked-label grid to inline Label: value rows so Hostname, DNS, Errors, and Domain each fit on one line. Total Storage computes total capacity from totalLocalCapacity + totalRemoteCapacity (formatted through the existing formatStorage) instead of the previous storageData.total which read independently of the bar's breakdown; the Local / Remote / Free legend stacks vertically so values never wrap.

Total CPU Allocated drops the host-thread row — that information is now in the Overview's CPU Usage card as Cores 8/16 (physical / logical, surfaced through cpu_count(logical=False) and cpu_count(logical=True)) — and exposes the two guest-side numbers operators actually want: Configured (sum of maxcpu across every VM/LXC) and In use (same sum restricted to status === "running"). maxcpu was added to the /api/vms payload from pvesh /cluster/resources.

All four grids collapse to a single column under 640 px (grid-cols-1 sm:grid-cols-2 lg:grid-cols-4); the donuts uniformly use 72 × 72 SVG at every viewport. Badges in the four restyled cards no longer carry uppercase tracking-wider and use the default Badge sizing so they match the rest of the dashboard.

Backend — CPU user / system breakdown samples once per 5 s with the same baseline as cpu_percent

/api/system now exposes cpu_user and cpu_system alongside cpu_usage so the CPU Usage card can drive the User and System mini-bars under the donut. The first iteration called psutil.cpu_times_percent(interval=0) from the request handler — on a fresh process under gevent that returns 0 % for both categories regardless of actual load, because the baseline hadn't been primed since the last call (the same root cause that made the API stop calling psutil.cpu_percent directly and read from the sampler cache instead). The fix moves the breakdown sampling into health_monitor._sample_cpu_usage, the same 5 s tick that already primes cpu_percent, so the three values share one baseline window and read consistently every cycle. The handler now pulls value / user / system from the latest state_history['cpu_usage'] entry. The memory_cached_gb field added in the same pass (RAM cached + buffers) feeds the Memory card's Cached row.

Storage card → "Storage Used", Network Status stacked layout, 24 h chart smoothing

Three small follow-ups on the dashboard restyle:

  • "Total Storage" → "Storage Used". The first card on the Storage tab kept the original "Total Storage" headline with the total capacity figure (Local capacity + Remote capacity), which read confusingly next to a stacked bar that visualizes used portions. The headline now shows the sum of used storage (totalLocalUsed + totalRemoteUsed through formatStorage) and the title reads Storage Used. The bar and the Local / Remote / Free legend below it are unchanged — they were already representing the used breakdown.
  • Network Status: stacked cells + packet-loss color. The 2 × 2 grid that exposes Hostname, DNS, Errors, and Domain was inline (Label: value on a single row) which truncates long hostnames or full IPv6 DNS addresses. It now stacks the label on top and the value below in each cell so neither column ever wraps. The "Packet Loss" figure picks up a color ramp that mirrors the existing healthy / warning / critical thresholds: blue at 0 %, yellow above 0 %, orange $\ge 1%$, and red $\ge 5%$.
  • 24 h CPU & Network charts: backend downsample to 5-min buckets. The "24 Hours" timeframe on Overview's CPU Usage & Load Average chart and the Network Traffic chart was rendering ~1440 raw minute-level samples from Proxmox RRD, which plots as a dense thicket of vertical spikes. The 1 h and 7 d views were unaffected because their step sizes already produce ~50–300 points. /api/node/metrics now groups consecutive RRD points into 5-min buckets when timeframe=day, averaging each numeric field per bucket — the same shape get_temperature_history uses for its 24 h view, so the look is consistent across every 24 h chart on the dashboard. ~288 points, smooth, hour-granular labels intact.

Coral on LXC — non-Debian containers get an opt-in passthrough-only mode

scripts/gpu_tpu/install_coral_lxc.sh previously refused to run inside any container without apt-get, on the grounds that Google's libedgetpu APT repo only ships for Debian/Ubuntu and the rest of the script would crash with cryptic errors. That is a clean abort but leaves operators with Alpine / Arch / RHEL / SUSE containers — typically the Frigate-in-Docker setups, whose app image bundles the runtime — without any path to use the script for the device-passthrough half, which is distro-agnostic.

The script now detects the container distro by reading /etc/os-release (matching both ID and ID_LIKE, classifying into debian / alpine / arch / rhel / suse) before the apt-get block runs. If the family isn't debian, a whiptail prompt explains the situation, names the detected distro, points at the /etc/pve/lxc/<ctid>.conf that has already been written, and asks the operator whether to continue in passthrough-only mode (skip the runtime install, return success) or abort. The default button is No, so a user who just presses Enter or Esc lands on the exact same msg_error + return 1 path as the legacy script — the Debian/Ubuntu happy path is byte-for-byte unchanged.

The coral-tpu-lxc doc (EN + ES) was updated to describe the new prompt — the Debian / Ubuntu containers only section is now Non-Debian containers — passthrough-only mode, and the troubleshooting entry that previously called out Alpine as "install fails" now covers all four non-Debian families with guidance on installing the runtime manually or running an app container that bundles it (Frigate Docker being the canonical example, exposing the Coral with --device /dev/apex_0:/dev/apex_0 for M.2 or the existing USB bind mount).

Overview — top processes by CPU / Memory, plus per-process drill-down

The CPU Usage and Memory cards on the Overview tab are now clickable. Each one opens a sortable list of the top 25 processes (by pcpu or by RSS / pmem), refreshing every 5 s while the dialog is open — the CPU card sorts by %CPU, the Memory card sorts by resident memory. The list is generated server-side from ps -eo pid,user,pcpu,pmem,rss,comm and exposed at /api/processes?sort=cpu|mem&limit=25, so the polling cost stays on the host instead of asking each client to scrape /proc over HTTP. A search box filters by command, user, or PID without re-fetching, and the primary metric column carries an inline progress bar scaled to the largest value in the filtered set so visual ranking is preserved even when no process is near 100%.

The modal header, dot, and accent text inherit the color of the card that opened it — blue (#3b82f6) for CPU, indigo (#6366f1) for Memory — and rows pick up the same bg-white/5 hover treatment used by the disk-temperature card so the surface feels consistent with the rest of the dashboard. Under 640 px, the PID and User columns drop out so Command / CPU / Memory still fit on a phone screen without horizontal scroll.

Clicking any row opens a second modal with the full live picture of that one process: command line, executable path, working directory, parent (PPid + parent comm), state (R/S/D/Z...), thread count, open FD count, RSS / VSize / Swap, I/O read / write bytes, start time, and elapsed runtime. Sourced from /api/processes/<pid> which reads /proc/<pid>/{cmdline,exe,cwd,status,io,fd,comm} directly and calls ps -o lstart=,etime=,pcpu=,pmem= -p <pid> for the live fields the kernel doesn't expose in /proc. UID and GID are resolved to user / group names through pwd.getpwuid / grp.getgrgid. The detail modal refreshes every 3 s while open; if the process exits mid-modal, the next refresh surfaces Process exited instead of stale data.

Don't miss a new ProxMenux release

NewReleases is sending notifications on new releases.