ProxMenux v1.2.1

Maintenance release on top of v1.2.0 focused on three community-reported
areas: full SR-IOV awareness across the GPU/PCI subsystem, GPU + audio
companion handling during passthrough attach and detach (Intel iGPU with
chipset audio, discrete cards with HDMI audio, mixed-GPU VMs), and
compatibility fixes for the AI notification providers (OpenAI-compatible
custom endpoints such as LiteLLM/MLX/LM Studio, OpenAI reasoning models,
and Gemini 2.5+/3.x thinking models). Also bundles quality-of-life fixes
in the NVIDIA installer, the disk health monitor, and the LXC lifecycle
helpers used by the passthrough wizards.

Main changes in v1.2.1

SR-IOV Awareness Across the GPU Subsystem

Intel i915-sriov-dkms and AMD MxGPU split a GPU's Physical Function (PF)
into Virtual Functions (VFs) that can be assigned independently to LXCs and
VMs. Until now ProxMenux had zero SR-IOV awareness: it treated VFs and PFs
identically, which could rewrite vfio.conf with the PF's vendor:device ID,
collapse the VF tree on the next boot, and leave users unable to start their
guests. Every path that could have disrupted an active VF tree has been
audited and hardened.

Detection helpers — new _pci_is_vf, _pci_has_active_vfs,
_pci_sriov_role, _pci_sriov_filter_array in
scripts/global/pci_passthrough_helpers.sh. The HTTP/JSON equivalent lives
in the Flask GPU route, so the Monitor UI reads VF/PF state directly from
sysfs (physfn, sriov_totalvfs, sriov_numvfs, virtfn*).

Pre-start hook (gpu_hook_guard_helpers.sh) — the VM pre-start guard
now recognises Virtual Functions. Both the slot-only syntax branch (where
it used to iterate every function of the slot and demand vfio-pci
everywhere) and the full-BDF branch skip VFs, so Proxmox can perform its
per-VF vfio-pci rebind as usual. The false "GPU passthrough device is not
ready" block on SR-IOV VMs is gone.

Mode-switch scripts refuse SR-IOV operations — switch_gpu_mode.sh,
switch_gpu_mode_direct.sh, add_gpu_vm.sh, add_gpu_lxc.sh,
vm_creator.sh, synology.sh, zimaos.sh, add_controller_nvme_vm.sh.
Selecting a VF or a PF with active VFs now triggers a clear "SR-IOV
Configuration Detected" dialog and aborts before any host-side VFIO
rewrite. For the VM-creation wizards the message is delivered through
whiptail so it interrupts the mid-flow output and is acknowledged
explicitly, followed by a per-device msg_warn line for the log trail.

New "SR-IOV active" state in the Monitor UI — the GPU card in the
Hardware page gains a third visual state with a dedicated teal colour,
an in-line SR-IOV ×N pill (or SR-IOV VF for a Virtual Function), and
dashed/faded LXC and VM branches. The Edit button is hidden because the
state is hardware-managed.

Modal dashboard for SR-IOV GPUs — opening the modal for a Physical
Function with active VFs now shows:

Aggregate-metrics banner ("Metrics below reflect the Physical Function
(aggregate across N VFs)")
Normal GPU real-time telemetry for the PF
A Virtual Functions table, one row per VF, with the current driver
(i915, vfio-pci, unbound) and the specific VM or LXC that consumes
it, including running/stopped state. Consumers are discovered by
cross-referencing hostpci entries and /dev/dri/renderDN mount lines
against the VF's BDF and DRM render node.

Opening the modal for a Virtual Function shows its parent PF (clickable
to navigate back to the PF's modal), current driver, and consumer.

VM Conflict Policy popup no longer fires for SR-IOV VFs — the regex
in detect_affected_vms_for_selected matched the slot (00:02) against
VMs that had a VF (00:02.1) assigned, producing a confusing "Keep GPU
in VM config" dialog. With the SR-IOV gate upstream, the flow never
reaches that code path for SR-IOV slots, so the false conflict is gone.

Thanks to the community bug report that surfaced this whole area and
included the pointer to /sys/bus/pci/devices/<BDF>/physfn as the VF
marker.

AI Provider Compatibility — OpenAI-Compatible, Reasoning, and Thinking Models

Three coordinated fixes that unblock model categories previously rejected
by the notification enhancement pipeline.

OpenAI-compatible endpoints (LiteLLM, MLX, LM Studio, vLLM, LocalAI,
Ollama-proxy, ...) — the provider's list_models() used to require
"gpt" in every model name, so local setups serving mlx-community/...,
Qwen3-..., mistralai/... saw an empty model list. When a Custom Base
URL is set, the "gpt" substring check is now skipped and
EXCLUDED_PATTERNS (embeddings, whisper, tts, dall-e) is the only
filter. The Flask route layer also stops intersecting the result against
verified_ai_models.json for custom endpoints — the verified list only
describes OpenAI's official model IDs and was erasing every local model
the user actually served.

OpenAI reasoning models (o1, o3, o3-mini, o4-mini, gpt-5,
gpt-5-mini, gpt-5.1, gpt-5.2-pro, gpt-5.4-nano, etc., excluding
the *-chat-latest variants) — these use a stricter API contract that
requires max_completion_tokens instead of max_tokens and does not
accept temperature. Sending the classic chat parameters produced HTTP
400 Bad Request for every one of them. A detector in openai_provider.py
now branches the payload accordingly. The payload also sets
reasoning_effort: "minimal" for these models: by default they consume
their output token budget on internal reasoning and return an empty
reply for short requests like notification translation, and minimal
keeps that overhead low so the visible response fits inside the
notification budget.

Gemini 2.5+/3.x thinking models — gemini-2.5-flash, 2.5-pro,
gemini-3-pro-preview, gemini-3.1-pro-preview, and others have
internal "thinking" enabled by default. With the small token budget used
for notification enrichment (≤250 tokens), the thinking budget consumed
the entire allowance and the model returned empty output with
finishReason: MAX_TOKENS. gemini_provider.py now sets
thinkingConfig.thinkingBudget: 0 for non-lite variants of 2.5+ and
3.x, so the available tokens go to the user-visible response. Lite
variants (which don't have thinking enabled) are untouched.

Thanks to the community bug report about LiteLLM → MLX that prompted the
first fix and led to auditing the other two.

Verified AI Models Refresh

AppImage/config/verified_ai_models.json refreshed for the providers we
re-tested against live APIs:

Provider	New recommended	Notes
OpenAI	`gpt-4.1-nano`	`gpt-4.1-nano`, `gpt-4.1-mini`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4o`, `gpt-5-chat-latest`, plus the new `gpt-5.4-nano` / `gpt-5.4-mini` from the 2026-03 generation. Dated snapshots and legacy models (`gpt-3.5-turbo`, `gpt-4-0613`) are excluded in favour of stable aliases. Reasoning models (`o`, `gpt-5` non-chat) are supported by the code but not listed by default — they are slower/costlier without improving notification quality.
Gemini	`gemini-2.5-flash-lite`	`gemini-2.5-flash-lite`, `gemini-2.5-flash` (works now with the thinking-budget fix), `gemini-3-flash-preview`. The `gemini-flash-latest` / `gemini-flash-lite-latest` aliases are intentionally omitted — they resolved to different underlying models across verifier runs and produced timeouts in some regions. Pro variants reject `thinkingBudget=0` and are overkill for the notification-translation use case.
Groq / Anthropic / OpenRouter	unchanged	Marked with a `_note` — will be re-verified as soon as keys are available.

A new private maintenance tool (kept out of the AppImage) re-runs a
standardised translate+explain test against every model each provider
advertises, classifies pass / warn / fail, and prints a ready-to-paste
JSON snippet. Re-run before each ProxMenux release to keep the list
current.

Disk Health Monitor — Observation Persistence in the Journal Watcher

A latent bug in notification_events.py::_check_disk_io meant real-time
kernel I/O errors caught by the journal watcher were surfaced as
notifications but never written to the permanent per-disk observations
table. In practice the parallel periodic dmesg scan usually wrote the
observation shortly after, so no data was lost in typical cases — but
under timing edge cases (stale dmesg window, service restart right
after the error, buffer rotation) the observation could go missing.

The journal watcher now records the observation before the 24h
notification cooldown gate, using the same family-based signature
classification (io_<disk>_ata_connection_error,
io_<disk>_block_io_error, io_<disk>_ata_failed_command) as the
periodic scan. Both paths now deduplicate into the same row via the
UPSERT in record_disk_observation, so occurrence counts are accurate
regardless of which detector fired first.

NVIDIA Installer Polish

lsmod race condition silenced — during reinstall, the module-unload
verification in unload_nvidia_modules produced spurious
lsmod: ERROR: could not open '/sys/module/nvidia_uvm/holders' errors
because lsmod reads /proc/modules and then opens each module's
holders/ directory, which disappears transiently while the module is
being removed. The check now reads /proc/modules directly and inserts
short sleeps to let the kernel finalise the unload before re-verifying.
Applied in the same spirit to the four other lsmod call sites in the
script.

Dialog → whiptail in the LXC update flow — the "Insufficient Disk
Space" message in update_lxc_nvidia and the "Update NVIDIA in LXC
Containers" confirmation now use whiptail-style dialogs consistent
with the rest of the in-flow messaging, avoiding the visual break that
dialog --msgbox caused when rendered mid-sequence in the
container-update phase.

GPU + Audio Passthrough — Full Lifecycle Hardening

A round of fixes around how GPU passthrough handles its audio companion
device. Previously, only the .1 sibling of a discrete GPU was picked
up automatically; Intel iGPU passthrough to a VM — where the audio
lives separately on the chipset at 00:1f.3 and not at 00:02.1 — was
silently skipped. On detach, the old sed that wiped hostpci lines by
slot substring could also remove an unrelated GPU whose BDF happened to
contain the search slot as a substring (e.g. slot 00:02 matching
inside 0000:02:00.0). Both paths are now robust.

iGPU audio-companion checklist on attach —
add_gpu_vm.sh::detect_optional_gpu_audio keeps the auto-include fast
path for the classic .1 sibling (discrete NVIDIA / AMD with HDMI
audio on the card). When no .1 audio exists, the script now scans
sysfs for every PCI audio controller on the host, skips anything
already covered by the GPU's IOMMU group, and asks the user via a
_pmx_checklist (dialog in standalone mode, whiptail in wizard
mode called from vm_creator/synology/zimaos) which ones to pass
through alongside the GPU. Each entry displays its current host driver
(snd_hda_intel, snd_hda_codec_*, etc.) so the decision is
informed. Default is none.

Orphan audio cascade on detach — when the user picks
"Remove GPU from VM config" during a mode switch, the scripts now
follow up with a targeted cleanup:

switch_gpu_mode.sh,
switch_gpu_mode_direct.sh
and add_gpu_vm.sh::cleanup_vm_config
(source-VM cleanup on the "move GPU" flow) all call the shared
helper _vm_list_orphan_audio_hostpci from
pci_passthrough_helpers.sh.
The helper uses a two-pass scan of the VM config: pass 1 records
slot bases of display/3D hostpci entries; pass 2 classifies audio
entries and skips any audio whose slot still has a display
sibling in the same VM. This protects the HDMI audio of other
dGPUs left in the VM. Previously the bare substring match would
have flagged NVIDIA's 02:00.1 as orphan when detaching an Intel
iGPU at 00:02.0.
The interactive switch flow confirms removals with a dialog
checklist (default ON). The web variant auto-removes without
prompting — the runner has no good way to render a checklist — and
logs every BDF it touched.

vfio.conf cascade extension — for each audio removed by the
cascade, the switch-mode scripts now check whether its BDF is still
referenced by any other VM via _pci_bdf_in_any_vm. If nothing else
uses it, the vendor:device is appended to SELECTED_IOMMU_IDS before
the /etc/modprobe.d/vfio.conf update runs. That closes the loop for
the Intel iGPU case: 8086:51c8 (PCH HD Audio) is now pulled from
vfio.conf alongside 8086:46a3 (iGPU) when both leave VM mode and
no other VM references them. If another VM still uses the audio, the
ID is deliberately kept — no breaking side effects on other VMs.
add_gpu_vm.sh does NOT extend the cleanup because the GPU is being
moved (still in use elsewhere) — vfio.conf IDs must remain.

Precise hostpci removal regex — every inline sed used to detach a
GPU from a VM config previously matched the slot as a free substring:

/^hostpci[0-9]+:.*${slot}/d

For slot=00:02 that regex matches the substring inside
0000:02:00.0 (an unrelated NVIDIA dGPU at slot 02:00) because the
characters 00:02 appear within the longer BDF. The fix anchors the
match to the real BDF shape — optional 0000:, exact slot, required
\.[0-7], and a trailing delimiter:

/^hostpci[0-9]+:[[:space:]]*(0000:)?${slot}\.[0-7]([,[:space:]]|$)/d

Applied in switch_gpu_mode.sh, switch_gpu_mode_direct.sh and
add_gpu_vm.sh::cleanup_vm_config. The awk-based helper in
vm_storage_helpers.sh::_remove_pci_slot_from_vm_config (used by the
NVMe wizards) already used the correct pattern and did not need
changes.

LXC Lifecycle Helper — Timeout-Safe Stop

A plain pct stop can hang indefinitely when the container has a
stale lock from a previous aborted operation, when processes inside
(Plex, Jellyfin, databases) ignore TERM and fall into
uninterruptible-sleep while the GPU they were using is yanked out, or
when pct shutdown --timeout isn't enforced by pct itself. Field
reports of 5+ min waits during GPU mode switches made this a real UX
hazard.

New shared helper _pmx_stop_lxc <ctid> [log_file] in
pci_passthrough_helpers.sh:

Returns 0 immediately if the container is not running.
Best-effort pct unlock (silent on failure) — most containers
aren't actually locked; we only care about the cases where they are.
pct shutdown --forceStop 1 --timeout 30 wrapped in an external
timeout 45 so we never wait longer than that for the graceful
phase, even if pct stalls on backend I/O.
Verifies actual status via pct status — pct can return non-zero
while the container is in fact stopped.
If still running, pct stop wrapped in timeout 60. Verify again.
Return 1 only if the container is truly stuck after ~107 s total
(processes in D state — requires manual intervention, but the
wizard moves on instead of hanging).

Wired into the three GPU-mode paths that stop LXCs during a switch:
switch_gpu_mode.sh, switch_gpu_mode_direct.sh, and
add_gpu_vm.sh::cleanup_lxc_configs.

`add_gpu_vm.sh` Reboot Prompt Stability

The final "Reboot Required" prompt of the GPU-to-VM assignment wizard
was triggering spurious reboots in certain menu-chain invocations
(menu → main_menu → hw_grafics_menu → add_gpu_vm). With the
_pmx_yesno helper it sometimes returned exit 0 without the user
having actually confirmed, calling reboot immediately. With a bare
read in its place the process would get SIGTTIN-suspended when the
menu chain detached the script from the terminal's foreground process
group, leaving [N]+ Stopped menu on the parent shell with no chance
to answer.

The prompt now uses whiptail --yesno invoked directly (the pattern
verified to work reliably in that menu chain) and inserts a
Press Enter to continue ... read -r pause between the "Yes" answer
and the actual reboot call — so an accidental Enter on the confirm
button cannot trigger an immediate reboot without a visible
confirmation step first.

Thank you for using ProxMenux, and especially to the users who
reported the SR-IOV, LiteLLM and GPU-audio cases — these improvements
exist because of detailed, reproducible reports. Feel free to report
issues or suggest improvements 🙌.

MacRimi/ProxMenux v1.2.1 on GitHub