Snacks v2.2.2
Automated Video Library Encoder
Patch release fixing cluster node state machine race conditions that caused uploads and downloads to collide on the same worker node, and several related state-tracking bugs across the dispatch, heartbeat, and recovery code paths.
Bug Fixes
Node State Machine Overhaul
- New
UploadingandDownloadingnode states -- theNodeStatusenum now includesUploadingandDownloadingin addition toOnline,Busy,Offline,Unreachable, andPaused. These states are set by the master when a file transfer begins and are owned exclusively by the transfer codepath -- the heartbeat loop cannot override them. - Heartbeat no longer clobbers active transfers -- previously the heartbeat would see a node as "idle" (no
currentJobIdreported) during a master-side upload or download and resetActiveWorkItemIdto null. The next dispatch cycle would then assign a new job to that node, causing both the upload and download to run simultaneously in a loop. The heartbeat now skips nodes inUploadingorDownloadingstate entirely. - Pause no longer overrides transfer state -- the
isPausedheartbeat check was evaluated before the transfer state guard, allowing a pause during an active upload/download to clobber the node status. The transfer guard now takes priority.
Upload Lifecycle
_activeUploadskey leak fixed -- when a work item ID was swapped to a reused database ID mid-upload, the original key was never removed from_activeUploadsand thefinallyblock removed the wrong key. Upload tracking now correctly migrates the key on ID swap.- Duplicate upload guard cleans up -- if
_activeUploads.TryAddrejects a duplicate, the node is now reset toOnlineand the work item is requeued instead of being silently dropped. - Node timeout during upload requeues the work item -- previously, a node timeout cancelled the upload CTS, but
DispatchToNodeAsynctreated timeout-triggered cancellation identically to user cancellation and did not requeue. The work item was silently lost. Timeout cancellations are now detected vianode.Status == Unreachableand properly requeued.
Download Lifecycle
- Node no longer released during download retries --
RetryDownloadAsyncandDownloadOutputAsyncschedule delayed retries (5s and 60s respectively), but thefinallyblock previously released the node immediately. During the delay, dispatch could assign new work to the same node. Thefinallyblock now only releases the node when the job is fully complete (removed from_remoteJobs). - Early return in
HandleRemoteCompletionAsyncno longer leaves node stuck -- if the job was concurrently cancelled and removed from_remoteJobs, the method returned before thetry/finallyblock, leaving the node permanently stuck inDownloadingstate. The early return now detects and releases stuck nodes. - Max download retries releases the node -- when download retries are exhausted, the node's
ActiveWorkItemIdand status are now properly cleared before requeuing.
Recovery Path
- Recovery downloads set
Downloadingstate --RecoverRemoteJobsAsynccallsHandleRemoteCompletionAsyncfor recovered jobs but never set the node toDownloadingfirst, leaving the download unprotected from heartbeat interference. Recovery now sets the node state before initiating the download. - Recovery upload failures release the node -- all error paths in the recovery upload (duplicate guard, verification failure, exceptions) now reset the node to
Onlineinstead of leaving it stuck inUploading.
Frontend
- Worker node panel shows new states -- the cluster worker node UI now displays "Uploading" and "Downloading" status with the correct enum values and colors.
Files Changed
Modified Files
Snacks/Models/ClusterNode.cs-- addedUploadingandDownloadingtoNodeStatusenumSnacks/Services/ClusterService.cs-- heartbeat state priority, upload/download lifecycle fixes, recovery path cleanup, node timeout requeueSnacks/wwwroot/js/transcoding.js-- updatedNodeStatusenum mapping for new statesSnacks/Controllers/HomeController.cs-- version bumpSnacks/Services/ClusterDiscoveryService.cs-- version bumpelectron-app/package.json-- version bumpREADME.md-- version bump
Full documentation: README.md