mescon/Healarr v1.3.12 on GitHub

A full safety audit of the remediation pipeline (15 verified findings, fixed across seven PRs), two fixes for a live user issue, and a scan-control UX overhaul. The common thread: Healarr must never delete the wrong file, and the UI must show what the scanner is actually doing.

Fixed

Remediation consent can no longer be invented (#329). Auto-remediate and dry-run now always resolve from the scan path's current configuration (matched against the file's path), never from values embedded in old events. A remediation retried after a config change follows today's config, and when no scan path matches the file anymore, Healarr refuses to delete (treats it as dry-run). The stuck-remediation monitor also respects a user's "ignore" veto instead of resurrecting the remediation.
Tool failures can never be classified as file corruption (#330). When ffmpeg/HandBrake/mediainfo itself fails (missing binary, OOM-kill, exec error), the verdict is a recoverable infrastructure error, never CorruptStream. A table test pins 15 distinct failure shapes to the safe side, and an unregistered error type now fails loudly in tests instead of silently defaulting.
Scan lifecycle integrity (#332). Resume progress is persisted from the parallel scanner's contiguous-done watermark (workers complete out of order; the old counter could make a resumed scan silently skip unfinished files), shutdown-vs-cancel races resolve deterministically, a panicking worker no longer hangs the scan, and aborted scans are not resurrected as interrupted.
Verification failures are triaged before counting against the retry budget (#333). A NAS hiccup during post-remediation verification used to burn a retry (or worse, fail the remediation); recoverable infrastructure errors are now retried with a delay and never emit a terminal event, and verification queries the *arr with the *arr's own path view via the path mapper.
Webhook scans wait out unstable files (#334). A file whose size is still changing at webhook time (import still copying) is deferred for rescan instead of scanned mid-write, a true-corruption verdict is re-probed once after a stability delay before being trusted, and the duplicate-journey check re-checks the live database, not just a scan-start snapshot.
The corruption state machine ignores notification bookkeeping (#335). A "remediation complete" notification used to overwrite the corruption's state (knocking it out of the resolved filter forever), and a broken notification provider could exhaust the retry budget by itself. The summary trigger now only follows lifecycle events, retry counting uses one explicit list everywhere, and a migration repairs rows already clobbered. The recurring-corruption loop-breaker is also media-keyed now, so a renamed file (the Tdarr/AV1 scenario) can't evade it, while a deliberate manual retry overrides a paused loop-breaker.
Control-plane consistency (#336). The dashboard state counts and the /corruptions filters are built from one shared state-to-bucket mapping (nine states used to fall through one view or the other), pause-all/cancel-all act on the statuses scans actually report (they matched nothing before), overlapping scans (a path and its parent/child) can no longer run concurrently, webhook rate limiting no longer starves bursty senders forever (a season-pack import's webhooks used to 429 until restart), and scan retention now actually prunes aborted/interrupted rows.
*Media lookup works for Windows arr paths (#337). With Radarr/Sonarr on Windows reporting UNC paths (\\server\share\Movies\...), the fallback media matcher split paths with platform-locked filepath calls and a hardcoded /, so remediation failed with media not found for path and the corruption stuck at Deletion Failed. All *arr-path matching now goes through the shared separator-agnostic helpers (fourth and hopefully last member of the separator bug class: #298, #305, #322). Reported by alex882001 in #331.
/api/health reports corruptions that still need attention (#337). pending_corruptions only counted freshly-detected corruptions, so one stuck mid-remediation (e.g. Deletion Failed) reported 0 and homepage dashboards showed all-clear while remediation was failing. It now counts everything not yet resolved or ignored. Also from #331.

Added

Scan controls where the scans are (#338). Per-scan Pause/Resume buttons (the API existed since the beginning; no UI ever exposed it) on the Dashboard's Active Scans rows and the Scan Details header, plus Pause all / Resume all / Cancel all on the Active Scans header itself instead of only on the Config page. Pause/resume now also persist to the database, so a paused scan still shows as paused after a reload.

Changed

Scan statuses are consistent and human-readable (#338). Pages used to check for the literal status running, but a scan's database row advances to scanning moments after start (and can sit at paused), so a live scan showed a Rescan button instead of Cancel and Scan Details stopped its live refresh. Active-scan detection is now shared across all pages, raw statuses like enumerating render as friendly labels ("Counting files"), paused scans get a distinct badge, and the sidebar's Active Scans count updates instantly via websocket instead of lagging up to 30 seconds.