github derekshreds/Snacks v2.8.1
Snacks v2.8.1

5 hours ago

Snacks v2.8.1

Automated Video Library Encoder

A patch release focused on cluster reliability and UI cost. The release closes a class of orphan-job bugs by giving the master-side completion path the same videoCopy keep/delete signal the local path uses, snapshotting per-job options at dispatch time so a settings save mid-encode can't flip the keep/delete decision, and adding a worker → master GET /api/cluster/jobs/{id} probe so a worker's pending-completion queue can distinguish "master is still tracking this" from "master forgot — drop it locally" instead of re-POSTing forever. Worker-side cleanup DELETEs now retry on a 5s/15s/45s ladder instead of fire-and-forget, the watchdog can recover from a wedged _activeDownloads entry instead of deadlocking on it, and the Re-evaluate Queue Skipped→Unseen flip no longer wipes LastScannedAt (so it doesn't trigger a probe storm). The cluster slot reconciler is extracted to a unit-testable static, the dispatch loop pre-claims _activeUploads synchronously to close a slot-double-booking gap, and a new /api/cluster/cluster-state endpoint lets a worker's UI proxy the master's authoritative node list / settings / version info instead of rendering its stale local _nodes map. The dashboard now also patches itself live on a worker via the worker's own SignalR hub. CSS-side, the work-item list and progress bars move their hot animations to composite-only properties (transform/opacity on pseudo-elements) plus content-visibility for offscreen rows, dropping per-frame paint cost dramatically on long queues.


Bug Fixes & Reliability

Cluster keep/delete decision shared between local and remote paths

  • videoCopy flag now flows worker → master — the worker's JobCompletion payload (and its persisted PendingCompletion row) carries a VideoCopy boolean set from WorkItem.OutputUsedVideoCopy. The master's HandleRemoteCompletion consumes it directly instead of recomputing mux-pass eligibility from options + probe. Without this, the HEVC-at-target-bitrate copy path (which CalculateBitrates enables independently of EncodingMode) was invisible to the master's recompute and a remote mux-pass output whose size landed at-or-above the source was silently deleted as "no savings."
  • New shared ShouldKeepEncodedOutput / IsMuxPass helpers — both the local ConvertVideoAsync and the cluster HandleRemoteCompletion paths now call the same helper, so the keep/delete predicate (smaller-than-source, or configured audio fan-out, or video-copy mux pass) can't drift apart again. Backed by a new EncodedOutputKeepDecisionTests suite.

Per-job options snapshot at dispatch

  • _dispatchedOptions snapshot keyed by jobId — the master now snapshots the encoder options each remote job was dispatched under and reads them back in the completion path via ResolveJobOptions(jobId). Previously every read of _transcodingService.GetLastOptions() picked up whatever was current on the master at that moment, so a settings save / Re-evaluate Queue / any in-flight options mutation between dispatch and completion silently flipped keep/delete and output-path decisions for jobs already encoding. Crash-recovery rebuilds the snapshot from the persisted folder/node assignment.

Worker pending-completion probe + master lookup endpoint

  • New GET /api/cluster/jobs/{jobId} endpoint — workers ask the master "do you still have this job tracked?" before re-POSTing a pending completion. Always returns 200 with { tracked, phase, recovering }. A real 404 means "endpoint doesn't exist on this master" (older build), so the worker falls back to the pre-probe behavior of just re-POSTing instead of dropping a legitimate completion. recovering: true tells the worker to skip this heartbeat because the master is still rebuilding _remoteJobs from the DB after a restart.
  • Workers drop on tracked: false, retry on tracked: true — a worker holding a persisted pending completion now drops the entry locally (and cleans the temp dir + cached files) when the master reports no record. Without the probe, an orphaned completion sat in the worker's pending-completions.json forever, re-POSTing on every heartbeat and never reclaiming disk.

Worker-cleanup DELETE retries

  • DeleteWorkerResourceWithRetryAsync (5s/15s/45s ladder) — every "tell the worker to drop its cached output / job state" path (failure handling, requeue, cancellation, finalize, no-savings finalize) is now routed through this retry helper instead of fire-and-forget try { ... } catch { }. A network blip on the single attempt previously left workers holding orphan jobs indefinitely; the worker's own duplicate-detection makes the retry idempotent. 404 from the worker is treated as success (already dropped).

Watchdog can recover from wedged downloads

  • HandleNodeFailureAsync no longer deadlocks on a leaked _activeDownloads entry — the previous unconditional early-exit on _activeDownloads.ContainsKey(jobId) was a deadlock: a hung HttpClient.SendAsync never throws, so the finally that clears _activeDownloads never runs, and every subsequent watchdog tick returned right there. The check now respects a 2-minute freshness window keyed off WorkItem.LastUpdatedAt; a download that hasn't advanced in that window is force-cleared so the work item can recover.
  • Download path Touches LastUpdatedAt on every chunk + retryClusterFileTransferService.DownloadOutputAsync now calls workItem.Touch() on every successful chunk and on every retry. Without this, a download that was making progress but couldn't compute a percent (no X-Total-Size header, or a status not yet flipped to Downloading) was invisible to the watchdog's freshness gate; a long backoff (12+ failures × 60s) could otherwise wedge the freshness window on an actively-retrying download.

Finalize / no-savings completion cleanup ordering

  • FinalizeCompletionAsync wraps the WAL transition + side effects in try/catch/finally — any throw in the WAL transition or output placement now (a) marks the work item Failed with a concrete error message, (b) clears the assignment so the watchdog can see it, and (c) still runs the worker-side DELETE and slot release. Without this, a duplicate Downloading→Completed transition from a heartbeat re-fire would leave the worker holding the job forever and the slot reserved.
  • No-savings finalize is also wrapped in try/finally — a DB / SignalR / ledger throw in the no-savings path no longer orphans the work item in _remoteJobs with Status=Completed (which the watchdog wouldn't sweep because it only sweeps active states). The cleanup block now also clears _activeUploads, _dispatchedOptions, and the _idle_grace_ / _validation_ retry-count entries that the success path was already clearing.

Re-evaluate doesn't wipe LastScannedAt

  • ReevaluateUnseenAsync keeps LastScannedAt — flipping a row Skipped→Unseen no longer null-ifies its LastScannedAt. The cached AudioStreams / SubtitleStreams / Bitrate / Codec are still on the row, and the scanner's freshness check uses LastScannedAt to skip re-probing. Wiping it forced a probe storm on every Re-evaluate, which was the expensive recovery action that made the button painful to use.

Cluster Slot Accounting

SlotReconciler extracted to a unit-testable static

  • Snacks/Services/SlotReconciler.cs (new) — the per-entry decision the master makes when reconciling a worker's optimistic ActiveJobs list against the worker's heartbeat report (keep on encoding, keep on master-side upload, keep on master-side download, keep on _remoteJobs ownership, drop otherwise) is now a pure static (ShouldPreserveEntry) so the rules can be exercised directly from SlotReconcilerTests without spinning up ClusterService. The behavior is unchanged from v2.8.0 — only the location moved.

Dispatch loop pre-claims _activeUploads synchronously

  • Pre-claim under _dispatchLock closes a heartbeat-timing gap — the gap between adding a slot to bestNode.ActiveJobs and the Task.Run-ed DispatchToNodeAsync body actually running was wide enough for a heartbeat reconciliation between awaits to strip the entry, after which the next dispatch tick would double-book the slot. The dispatch loop now inserts into _activeUploads synchronously alongside ActiveJobs.Add. DispatchToNodeAsync takes a new preClaimed: true flag so its own TryAdd doesn't falsely trip the duplicate-detection block. An outer try/catch around the Task.Run body releases both the _activeUploads entry and the ActiveJobs slot if the dispatch task throws synchronously before the body's own cleanup paths register.

Receiving-phase chips show the right device on workers

  • ClusterController.RegisterMetadata records the assigned device — the master sends a JobMetadata register call before the first chunk arrives. The metadata's DeviceId is now stashed in ClusterNodeJobService._receivingDeviceIds and surfaced through GetActiveJobs so a worker's self-card chip counts the slot as occupied during the upload phase, not only once encoding begins. The receiving-state cleanup paths (ClearReceivingJob, ExpireStaleReceiving, RegisterEncodingJob) all clear the entry symmetrically.

Worker UI parity with master

/api/cluster/cluster-state endpoint

  • New master endpoint returns the master's authoritative cluster view — the full nodes list (including the master itself, with runtime status / per-device slot fill / paused flag / completed + failed counts stamped on BuildSelfNode) plus the master-side nodeSettings (concurrency overrides + 4K routing flags). Workers proxy this through their own admin status so a browser viewing a worker's UI sees the same cluster state the master sees instead of the worker's stale local _nodes map (which only contains peers the worker discovered directly and is never reconciled against the master's heartbeats).

Worker proxy in ClusterAdminController.GetStatus / GetNodeSettings

  • Worker /api/cluster-admin/status proxies the master view — when running as a worker, the nodes list and nodeSettings come from the master's cluster-state (cached against the worker's own heartbeat poll, with a one-shot fallback fetch on cold start, and a final fallback to the worker's local view on master-fetch failure so the page still loads during a transient master outage). localActiveJobs on a worker now sources from ClusterService.GetActiveJobs() instead of TranscodingService.GetActiveLocalJobs(), so the self-card chips reflect a slot the moment a file starts uploading — not only once encoding begins.
  • selfVersion is now part of the status payload — the dashboard renders • v{version} next to the self-card status string and next to every remote node's status string (read from the version field already present on ClusterNode).

Worker hub mirrors master deltas

  • RefreshMasterClusterStateAsync runs on every worker heartbeat tick — polls the master's cluster-state, diffs against a worker-local cache, and re-broadcasts deltas (WorkerConnected / WorkerUpdated / WorkerDisconnected / NodeSettingsChanged) on the worker's own SignalR hub. The dashboard's existing handlers patch a worker UI live without the user having to refresh. First successful refresh after startup populates the cache silently to avoid duplicate "node connected" toasts on a freshly-loaded page.

Dashboard CSS — composite-only animations

Status-badge halo pulse

  • pulse-badge keyframes now animate transform + opacity instead of box-shadow spread — the halo is now produced by an ::after pseudo-element with a static box-shadow whose transform and opacity animate. Both are composite-only, so the GPU never repaints the halo. position: relative; isolation: isolate; on the badge contains the pseudo's z-index: -1. Status-Uploading / -Downloading / -Processing all share the new pseudo and the same animation.

Progress-bar shimmer + shine

  • Shimmer + shine carried by a single composite-only translateX overlay — the original background-position shimmer (paint per frame) on the bar itself is dropped; the moving highlight is now an ::after whose translateX animates with will-change: transform; contain: paint;. Gradient stops are slightly widened so the band's plateau is more substantial — it now carries both the original shine sweep and the subtle motion the shimmer was producing on the bar.

Work-item rows

  • transition narrowed from all to transform, box-shadowtransition: all was triggering transitions over every computed-style delta the browser saw on every SignalR-driven attribute change (class flips, style updates from updateWorkItemDom). Narrowed to the two properties the hover state actually animates.
  • content-visibility: auto + contain-intrinsic-size: 0 8rem — offscreen work-item rows now skip layout/paint/animation work entirely. Onscreen rows render exactly as before; offscreen rows cost nothing until they scroll into view. The intrinsic size keeps scrollbar height stable and prevents reflow as rows enter/exit.

Files Changed

Cluster keep/delete + per-job options

  • Snacks/Models/JobAssignment.csJobCompletion.VideoCopy field for worker → master propagation
  • Snacks/Models/PendingCompletion.csVideoCopy persisted so a worker-restart-then-retry sends the correct flag
  • Snacks/Models/WorkItem.csOutputUsedVideoCopy set by ConvertVideoAsync when the output is kept
  • Snacks/Services/TranscodingService.cs — shared ShouldKeepEncodedOutput + IsMuxPass helpers; HandleRemoteCompletion accepts the worker's videoCopy; local path stashes the actual outcome on the work item for cluster forwarding
  • Snacks/Services/ClusterService.cs_dispatchedOptions snapshot dictionary; ResolveJobOptions helper used everywhere _transcodingService.GetLastOptions() was previously read; cleanup symmetric across success/failure/cancel/recovery paths
  • Snacks/Services/ClusterNodeJobService.csPersistCompletedJobAsync accepts and persists videoCopy; the heartbeat retry flow probes the master before re-POSTing
  • Snacks/Controllers/ClusterController.cs — new GET jobs/{jobId} lookup endpoint; /complete now reads videoCopy from the body
  • Snacks.Tests/Pipeline/EncodedOutputKeepDecisionTests.cs (new) — covers the shared keep/delete predicate
  • Snacks.Tests/Settings/SettingsRoundTripTests.cs — adjusted for the new options shape

Worker DELETE retries + watchdog freshness

  • Snacks/Services/ClusterService.csDeleteWorkerResourceWithRetryAsync(5s/15s/45s) replaces every fire-and-forget try { DeleteAsync } catch { }; HandleNodeFailureAsync honors a 2-minute freshness window before force-clearing wedged _activeDownloads
  • Snacks/Services/ClusterFileTransferService.csworkItem.Touch() on every chunk and on every retry so the watchdog freshness window doesn't expire on actively-retrying downloads
  • Snacks/Services/TranscodingService.csFinalizeCompletionAsync wraps WAL + side effects in try/catch/finally with a Failed fallback; no-savings finalize wrapped in try/finally with full cleanup
  • Snacks/Data/MediaFileRepository.csReevaluateUnseenAsync no longer wipes LastScannedAt

Slot reconciler extraction

  • Snacks/Services/SlotReconciler.cs (new) — ShouldPreserveEntry pure static
  • Snacks/Services/ClusterService.cs — heartbeat reconciliation loop now calls SlotReconciler.ShouldPreserveEntry; dispatch pre-claim under _dispatchLock; DispatchToNodeAsync takes preClaimed
  • Snacks.Tests/Cluster/SlotReconcilerTests.cs (new) — covers all four preserve cases (encoding / uploading / downloading / _remoteJobs ownership) + the stale-drop case
  • Snacks/Services/ClusterNodeJobService.cs_receivingDeviceIds map; RegisterReceivingDevice capture; receiving-phase ActiveJobInfo carries the assigned DeviceId
  • Snacks/Controllers/ClusterController.csRegisterMetadata calls RegisterReceivingDevice before the first chunk arrives

Worker UI master-state proxy

  • Snacks/Controllers/ClusterController.cs — new GET /api/cluster/cluster-state returning master's nodes (with BuildSelfNode enriched with runtime status) + nodeSettings
  • Snacks/Controllers/ClusterAdminController.csGetStatus proxies the master view on workers (cache + cold-start fetch + local fallback); GetNodeSettings proxies likewise; selfVersion added to the status payload; worker localActiveJobs sourced from ClusterService.GetActiveJobs()
  • Snacks/Services/ClusterService.cs_cachedMasterNodes / _cachedMasterNodeSettings cache; RefreshMasterClusterStateAsync polled on every heartbeat tick; broadcasts WorkerConnected / WorkerUpdated / WorkerDisconnected / NodeSettingsChanged deltas on the worker's own hub; helpers (IsLocalEncodingEnabled, LocalCompletedJobs, LocalFailedJobs, GetEnrichedSelfActiveJobs) consumed by cluster-state
  • Snacks/wwwroot/js/cluster/cluster-dashboard.js_selfVersion pulled from status; self-card and remote-node-card status lines render • v{version}

CSS

  • Snacks/wwwroot/css/site.css — progress-bar shimmer/shine moved to a composite-only ::after translateX overlay; status-badge halo moved to an ::after pseudo with composite-only transform + opacity keyframes; .work-item transition narrowed to transform, box-shadow; .work-item gains content-visibility: auto + contain-intrinsic-size: 0 8rem

Version bumps

  • Snacks/Controllers/HomeController.cs
  • Snacks/Services/ClusterDiscoveryService.cs — protocol version bump to 2.8.1
  • Snacks/Views/Shared/_Layout.cshtml
  • README.md
  • build-and-export.bat
  • electron-app/package.json / package-lock.json

Full documentation: README.md

Don't miss a new Snacks release

NewReleases is sending notifications on new releases.