Snacks v2.8.1
Automated Video Library Encoder
A patch release focused on cluster reliability and UI cost. The release closes a class of orphan-job bugs by giving the master-side completion path the same videoCopy keep/delete signal the local path uses, snapshotting per-job options at dispatch time so a settings save mid-encode can't flip the keep/delete decision, and adding a worker → master GET /api/cluster/jobs/{id} probe so a worker's pending-completion queue can distinguish "master is still tracking this" from "master forgot — drop it locally" instead of re-POSTing forever. Worker-side cleanup DELETEs now retry on a 5s/15s/45s ladder instead of fire-and-forget, the watchdog can recover from a wedged _activeDownloads entry instead of deadlocking on it, and the Re-evaluate Queue Skipped→Unseen flip no longer wipes LastScannedAt (so it doesn't trigger a probe storm). The cluster slot reconciler is extracted to a unit-testable static, the dispatch loop pre-claims _activeUploads synchronously to close a slot-double-booking gap, and a new /api/cluster/cluster-state endpoint lets a worker's UI proxy the master's authoritative node list / settings / version info instead of rendering its stale local _nodes map. The dashboard now also patches itself live on a worker via the worker's own SignalR hub. CSS-side, the work-item list and progress bars move their hot animations to composite-only properties (transform/opacity on pseudo-elements) plus content-visibility for offscreen rows, dropping per-frame paint cost dramatically on long queues.
Bug Fixes & Reliability
Cluster keep/delete decision shared between local and remote paths
videoCopyflag now flows worker → master — the worker'sJobCompletionpayload (and its persistedPendingCompletionrow) carries aVideoCopyboolean set fromWorkItem.OutputUsedVideoCopy. The master'sHandleRemoteCompletionconsumes it directly instead of recomputing mux-pass eligibility from options + probe. Without this, the HEVC-at-target-bitrate copy path (whichCalculateBitratesenables independently ofEncodingMode) was invisible to the master's recompute and a remote mux-pass output whose size landed at-or-above the source was silently deleted as "no savings."- New shared
ShouldKeepEncodedOutput/IsMuxPasshelpers — both the localConvertVideoAsyncand the clusterHandleRemoteCompletionpaths now call the same helper, so the keep/delete predicate (smaller-than-source, or configured audio fan-out, or video-copy mux pass) can't drift apart again. Backed by a newEncodedOutputKeepDecisionTestssuite.
Per-job options snapshot at dispatch
_dispatchedOptionssnapshot keyed by jobId — the master now snapshots the encoder options each remote job was dispatched under and reads them back in the completion path viaResolveJobOptions(jobId). Previously every read of_transcodingService.GetLastOptions()picked up whatever was current on the master at that moment, so a settings save /Re-evaluate Queue/ any in-flight options mutation between dispatch and completion silently flipped keep/delete and output-path decisions for jobs already encoding. Crash-recovery rebuilds the snapshot from the persisted folder/node assignment.
Worker pending-completion probe + master lookup endpoint
- New
GET /api/cluster/jobs/{jobId}endpoint — workers ask the master "do you still have this job tracked?" before re-POSTing a pending completion. Always returns 200 with{ tracked, phase, recovering }. A real 404 means "endpoint doesn't exist on this master" (older build), so the worker falls back to the pre-probe behavior of just re-POSTing instead of dropping a legitimate completion.recovering: truetells the worker to skip this heartbeat because the master is still rebuilding_remoteJobsfrom the DB after a restart. - Workers drop on
tracked: false, retry ontracked: true— a worker holding a persisted pending completion now drops the entry locally (and cleans the temp dir + cached files) when the master reports no record. Without the probe, an orphaned completion sat in the worker'spending-completions.jsonforever, re-POSTing on every heartbeat and never reclaiming disk.
Worker-cleanup DELETE retries
DeleteWorkerResourceWithRetryAsync(5s/15s/45s ladder) — every "tell the worker to drop its cached output / job state" path (failure handling, requeue, cancellation, finalize, no-savings finalize) is now routed through this retry helper instead of fire-and-forgettry { ... } catch { }. A network blip on the single attempt previously left workers holding orphan jobs indefinitely; the worker's own duplicate-detection makes the retry idempotent. 404 from the worker is treated as success (already dropped).
Watchdog can recover from wedged downloads
HandleNodeFailureAsyncno longer deadlocks on a leaked_activeDownloadsentry — the previous unconditional early-exit on_activeDownloads.ContainsKey(jobId)was a deadlock: a hungHttpClient.SendAsyncnever throws, so thefinallythat clears_activeDownloadsnever runs, and every subsequent watchdog tick returned right there. The check now respects a 2-minute freshness window keyed offWorkItem.LastUpdatedAt; a download that hasn't advanced in that window is force-cleared so the work item can recover.- Download path Touches
LastUpdatedAton every chunk + retry —ClusterFileTransferService.DownloadOutputAsyncnow callsworkItem.Touch()on every successful chunk and on every retry. Without this, a download that was making progress but couldn't compute a percent (noX-Total-Sizeheader, or a status not yet flipped toDownloading) was invisible to the watchdog's freshness gate; a long backoff (12+ failures × 60s) could otherwise wedge the freshness window on an actively-retrying download.
Finalize / no-savings completion cleanup ordering
FinalizeCompletionAsyncwraps the WAL transition + side effects in try/catch/finally — any throw in the WAL transition or output placement now (a) marks the work itemFailedwith a concrete error message, (b) clears the assignment so the watchdog can see it, and (c) still runs the worker-sideDELETEand slot release. Without this, a duplicateDownloading→Completedtransition from a heartbeat re-fire would leave the worker holding the job forever and the slot reserved.- No-savings finalize is also wrapped in try/finally — a DB / SignalR / ledger throw in the no-savings path no longer orphans the work item in
_remoteJobswithStatus=Completed(which the watchdog wouldn't sweep because it only sweeps active states). The cleanup block now also clears_activeUploads,_dispatchedOptions, and the_idle_grace_/_validation_retry-count entries that the success path was already clearing.
Re-evaluate doesn't wipe LastScannedAt
ReevaluateUnseenAsynckeepsLastScannedAt— flipping a row Skipped→Unseen no longer null-ifies itsLastScannedAt. The cachedAudioStreams/SubtitleStreams/Bitrate/Codecare still on the row, and the scanner's freshness check usesLastScannedAtto skip re-probing. Wiping it forced a probe storm on every Re-evaluate, which was the expensive recovery action that made the button painful to use.
Cluster Slot Accounting
SlotReconciler extracted to a unit-testable static
Snacks/Services/SlotReconciler.cs(new) — the per-entry decision the master makes when reconciling a worker's optimisticActiveJobslist against the worker's heartbeat report (keep on encoding, keep on master-side upload, keep on master-side download, keep on_remoteJobsownership, drop otherwise) is now a pure static (ShouldPreserveEntry) so the rules can be exercised directly fromSlotReconcilerTestswithout spinning upClusterService. The behavior is unchanged from v2.8.0 — only the location moved.
Dispatch loop pre-claims _activeUploads synchronously
- Pre-claim under
_dispatchLockcloses a heartbeat-timing gap — the gap between adding a slot tobestNode.ActiveJobsand theTask.Run-edDispatchToNodeAsyncbody actually running was wide enough for a heartbeat reconciliation between awaits to strip the entry, after which the next dispatch tick would double-book the slot. The dispatch loop now inserts into_activeUploadssynchronously alongsideActiveJobs.Add.DispatchToNodeAsynctakes a newpreClaimed: trueflag so its ownTryAdddoesn't falsely trip the duplicate-detection block. An outertry/catcharound theTask.Runbody releases both the_activeUploadsentry and theActiveJobsslot if the dispatch task throws synchronously before the body's own cleanup paths register.
Receiving-phase chips show the right device on workers
ClusterController.RegisterMetadatarecords the assigned device — the master sends aJobMetadataregister call before the first chunk arrives. The metadata'sDeviceIdis now stashed inClusterNodeJobService._receivingDeviceIdsand surfaced throughGetActiveJobsso a worker's self-card chip counts the slot as occupied during the upload phase, not only once encoding begins. The receiving-state cleanup paths (ClearReceivingJob,ExpireStaleReceiving,RegisterEncodingJob) all clear the entry symmetrically.
Worker UI parity with master
/api/cluster/cluster-state endpoint
- New master endpoint returns the master's authoritative cluster view — the full
nodeslist (including the master itself, with runtime status / per-device slot fill / paused flag / completed + failed counts stamped onBuildSelfNode) plus the master-sidenodeSettings(concurrency overrides + 4K routing flags). Workers proxy this through their own admin status so a browser viewing a worker's UI sees the same cluster state the master sees instead of the worker's stale local_nodesmap (which only contains peers the worker discovered directly and is never reconciled against the master's heartbeats).
Worker proxy in ClusterAdminController.GetStatus / GetNodeSettings
- Worker
/api/cluster-admin/statusproxies the master view — when running as a worker, thenodeslist andnodeSettingscome from the master'scluster-state(cached against the worker's own heartbeat poll, with a one-shot fallback fetch on cold start, and a final fallback to the worker's local view on master-fetch failure so the page still loads during a transient master outage).localActiveJobson a worker now sources fromClusterService.GetActiveJobs()instead ofTranscodingService.GetActiveLocalJobs(), so the self-card chips reflect a slot the moment a file starts uploading — not only once encoding begins. selfVersionis now part of the status payload — the dashboard renders• v{version}next to the self-card status string and next to every remote node's status string (read from theversionfield already present onClusterNode).
Worker hub mirrors master deltas
RefreshMasterClusterStateAsyncruns on every worker heartbeat tick — polls the master'scluster-state, diffs against a worker-local cache, and re-broadcasts deltas (WorkerConnected/WorkerUpdated/WorkerDisconnected/NodeSettingsChanged) on the worker's own SignalR hub. The dashboard's existing handlers patch a worker UI live without the user having to refresh. First successful refresh after startup populates the cache silently to avoid duplicate "node connected" toasts on a freshly-loaded page.
Dashboard CSS — composite-only animations
Status-badge halo pulse
pulse-badgekeyframes now animatetransform+opacityinstead ofbox-shadowspread — the halo is now produced by an::afterpseudo-element with a staticbox-shadowwhose transform and opacity animate. Both are composite-only, so the GPU never repaints the halo.position: relative; isolation: isolate;on the badge contains the pseudo'sz-index: -1. Status-Uploading / -Downloading / -Processing all share the new pseudo and the same animation.
Progress-bar shimmer + shine
- Shimmer + shine carried by a single composite-only
translateXoverlay — the originalbackground-positionshimmer (paint per frame) on the bar itself is dropped; the moving highlight is now an::afterwhose translateX animates withwill-change: transform; contain: paint;. Gradient stops are slightly widened so the band's plateau is more substantial — it now carries both the original shine sweep and the subtle motion the shimmer was producing on the bar.
Work-item rows
transitionnarrowed fromalltotransform, box-shadow—transition: allwas triggering transitions over every computed-style delta the browser saw on every SignalR-driven attribute change (class flips, style updates fromupdateWorkItemDom). Narrowed to the two properties the hover state actually animates.content-visibility: auto+contain-intrinsic-size: 0 8rem— offscreen work-item rows now skip layout/paint/animation work entirely. Onscreen rows render exactly as before; offscreen rows cost nothing until they scroll into view. The intrinsic size keeps scrollbar height stable and prevents reflow as rows enter/exit.
Files Changed
Cluster keep/delete + per-job options
Snacks/Models/JobAssignment.cs—JobCompletion.VideoCopyfield for worker → master propagationSnacks/Models/PendingCompletion.cs—VideoCopypersisted so a worker-restart-then-retry sends the correct flagSnacks/Models/WorkItem.cs—OutputUsedVideoCopyset byConvertVideoAsyncwhen the output is keptSnacks/Services/TranscodingService.cs— sharedShouldKeepEncodedOutput+IsMuxPasshelpers;HandleRemoteCompletionaccepts the worker'svideoCopy; local path stashes the actual outcome on the work item for cluster forwardingSnacks/Services/ClusterService.cs—_dispatchedOptionssnapshot dictionary;ResolveJobOptionshelper used everywhere_transcodingService.GetLastOptions()was previously read; cleanup symmetric across success/failure/cancel/recovery pathsSnacks/Services/ClusterNodeJobService.cs—PersistCompletedJobAsyncaccepts and persistsvideoCopy; the heartbeat retry flow probes the master before re-POSTingSnacks/Controllers/ClusterController.cs— newGET jobs/{jobId}lookup endpoint;/completenow readsvideoCopyfrom the bodySnacks.Tests/Pipeline/EncodedOutputKeepDecisionTests.cs(new) — covers the shared keep/delete predicateSnacks.Tests/Settings/SettingsRoundTripTests.cs— adjusted for the new options shape
Worker DELETE retries + watchdog freshness
Snacks/Services/ClusterService.cs—DeleteWorkerResourceWithRetryAsync(5s/15s/45s)replaces every fire-and-forgettry { DeleteAsync } catch { };HandleNodeFailureAsynchonors a 2-minute freshness window before force-clearing wedged_activeDownloadsSnacks/Services/ClusterFileTransferService.cs—workItem.Touch()on every chunk and on every retry so the watchdog freshness window doesn't expire on actively-retrying downloadsSnacks/Services/TranscodingService.cs—FinalizeCompletionAsyncwraps WAL + side effects in try/catch/finally with aFailedfallback; no-savings finalize wrapped in try/finally with full cleanupSnacks/Data/MediaFileRepository.cs—ReevaluateUnseenAsyncno longer wipesLastScannedAt
Slot reconciler extraction
Snacks/Services/SlotReconciler.cs(new) —ShouldPreserveEntrypure staticSnacks/Services/ClusterService.cs— heartbeat reconciliation loop now callsSlotReconciler.ShouldPreserveEntry; dispatch pre-claim under_dispatchLock;DispatchToNodeAsynctakespreClaimedSnacks.Tests/Cluster/SlotReconcilerTests.cs(new) — covers all four preserve cases (encoding / uploading / downloading /_remoteJobsownership) + the stale-drop caseSnacks/Services/ClusterNodeJobService.cs—_receivingDeviceIdsmap;RegisterReceivingDevicecapture; receiving-phaseActiveJobInfocarries the assignedDeviceIdSnacks/Controllers/ClusterController.cs—RegisterMetadatacallsRegisterReceivingDevicebefore the first chunk arrives
Worker UI master-state proxy
Snacks/Controllers/ClusterController.cs— newGET /api/cluster/cluster-statereturning master'snodes(withBuildSelfNodeenriched with runtime status) +nodeSettingsSnacks/Controllers/ClusterAdminController.cs—GetStatusproxies the master view on workers (cache + cold-start fetch + local fallback);GetNodeSettingsproxies likewise;selfVersionadded to the status payload; workerlocalActiveJobssourced fromClusterService.GetActiveJobs()Snacks/Services/ClusterService.cs—_cachedMasterNodes/_cachedMasterNodeSettingscache;RefreshMasterClusterStateAsyncpolled on every heartbeat tick; broadcastsWorkerConnected/WorkerUpdated/WorkerDisconnected/NodeSettingsChangeddeltas on the worker's own hub; helpers (IsLocalEncodingEnabled,LocalCompletedJobs,LocalFailedJobs,GetEnrichedSelfActiveJobs) consumed bycluster-stateSnacks/wwwroot/js/cluster/cluster-dashboard.js—_selfVersionpulled from status; self-card and remote-node-card status lines render• v{version}
CSS
Snacks/wwwroot/css/site.css— progress-bar shimmer/shine moved to a composite-only::aftertranslateX overlay; status-badge halo moved to an::afterpseudo with composite-onlytransform+opacitykeyframes;.work-itemtransition narrowed totransform, box-shadow;.work-itemgainscontent-visibility: auto+contain-intrinsic-size: 0 8rem
Version bumps
Snacks/Controllers/HomeController.csSnacks/Services/ClusterDiscoveryService.cs— protocol version bump to 2.8.1Snacks/Views/Shared/_Layout.cshtmlREADME.mdbuild-and-export.batelectron-app/package.json/package-lock.json
Full documentation: README.md