Snacks v2.10.0
Automated Video Library Encoder
A minor release built around fixing cluster network saturation — every per-slot reservation in the cluster is now owned by a single authoritative SlotLedger (replacing the historical scheme where slot truth was scattered across ClusterNode.ActiveJobs / _activeUploads / _activeDownloads / _remoteJobs / worker heartbeats — any one of which could silently lose an entry and cause double-dispatch), and a new master-side TransferThrottle gates how many uploads/downloads can be in flight (cluster-wide and per-node) and how fast they go (token-bucket MB/s caps), configured from a new Networking tab in Settings. The default per-node upload cap of 1 directly addresses the user-reported pattern; settings changes take effect on the next chunk without rebuilding any state. A new Cluster Logs page (/cluster-logs) surfaces a live tail of any node's operations log with a one-click ZIP export — the master proxies remote nodes' logs through the cluster shared-secret channel so the operator never has to ssh into a worker. Two encoder/cluster fixes ride along: NVIDIA encodes now explicitly pin the NVDEC (cuvid) decoder instead of relying on -hwaccel cuda's auto-attach (which silently fell back to a software decoder on some driver/setup combinations — datacenter Windows drivers, vGPU profiles — pegging the CPU while NVENC kept the GPU encode path going); and the master now pulls subtitle sidecar files (.srt / .ass / .vtt) from the worker after a remote encode so they ride home alongside the main output before the worker temp dir is wiped on cleanup.
Authoritative slot ledger
SlotLedger — single source of truth for cluster slots
Snacks/Services/Slots/SlotLedger.cs(new) — every per-slot reservation in the cluster lives in oneConcurrentDictionary<jobId, SlotReservation>behind a write-lock.TryReserveis an atomic capacity-check + insert (capacity is resolved through an injected(nodeId, deviceId) => intso the ledger doesn't have to know aboutNodeSettings/HardwareDevice),Releaseis idempotent and emits oneReleaseReason-attributed log line,UpdateProgressis a heartbeat hook that mutates only Phase / Progress and never deletes. The dashboard chip list is no longer a write target —Snapshot(nodeId)materialises a wire-compatibleActiveJobInfolist at broadcast time so worker heartbeats can never silently lose a slot the master is still using.Snacks/Services/Slots/SlotPhase.cs(new) —Reserved → Uploading → Encoding → Downloading → Completinglifecycle states withToWireString/FromWireStringmappers so the dashboard's existing phase strings (Uploading/Encoding/Downloading) keep rendering unchanged. The pre-transferReservedphase surfaces asUploadinguntil bytes flow.Snacks/Services/Slots/SlotReservation.cs(new) —JobId/NodeId/DeviceIdare immutable for the row's lifetime;FileName/Phase/Progress/PhaseEnteredAtare mutable through ledger methods only.Snacks/Services/Slots/ReleaseReason.cs(new) —Completed/NoSavings/Cancelled/NodeFailed/DispatchThrew/ValidationFailed/DownloadRetriesExhausted/Recovered. Logged on every release so the lifecycle of any leaked-or-not slot is auditable from the operations log alone.
Heartbeat reconcile is observation-only
ReconcileMultiSlotHeartbeatAsyncno longer rewritesClusterNode.ActiveJobs— the legacySlotReconciler.ShouldPreserveEntryladder (and its preserve-rule unit tests) is removed. A heartbeat now updatesPhase/Progresson ledger rows the worker is reporting, surfaces an anomaly log line for jobs the worker reports without a matching reservation, and rebuildsClusterNode.ActiveJobsfrom_slotLedger.Snapshot(nodeId)so the dashboard sees ledger truth on every tick. This eliminates Race A (heartbeat strips an entry between dispatch and the dispatch task running on the thread pool) and Race B (recovery-time double-attach) by construction.Snacks/Services/SlotReconciler.cs(deleted) — and its companionSnacks.Tests/Cluster/SlotReconcilerTests.cs. Replaced wholesale by the ledger contract and its own test suite.
Atomic reservation at dispatch
ClusterService.ProcessQueueAsync— dispatch now calls_slotLedger.TryReserve(nodeId, deviceId, jobId, fileName)instead of synthesising anActiveJobInfoand pushing it onto the node. On a lost CAS race (another tick / recovery path grabbed the same slot first) the work item is silently re-queued and the dispatch loop tries the next candidate. Phase transitions toUploadingimmediately after the reservation, and a legacy projection entry is stamped onClusterNode.ActiveJobsso any UI broadcast between dispatch and the next heartbeat shows the new reservation without waiting.- Dispatch-pool filter accepts
Uploading/Downloading—IsDispatchableStatusnow allows the new transient transfer statuses in addition toOnline/Busy. Without it, the very first dispatch in a tick would knock the node out of the candidate pool for the rest of that tick — silently serialising every multi-slot node to one upload at a time. ReleaseActiveSlot(node, jobId, reason)— every failure / cancel / completion path now passes the reason it's releasing (NodeFailed,DispatchThrew,Cancelled, etc.) so the log line attributes the cause.
Persisted device assignment for restart recovery
MediaFile.DispatchedDeviceId(new column) — theHardwareDevice.DeviceIdthe master allocated at dispatch time (intel,nvidia,cpu, …) is persisted alongside the existingRemote*fields soSlotLedgerrecovery rebuilds the same per-device occupancy after a master restart without having to query worker heartbeats. Cleared on completion / failure / re-queue alongside the other Remote fields.- Migration
20260505003522_AddMediaFileDispatchedDeviceId— adds the column and is auto-applied at startup. MediaFileRepository.AssignToRemoteNodeAsync(..., string? dispatchedDeviceId = null)— optional parameter so older callers leave the field untouched;ClearRemoteAssignmentAsyncalways clears it so a re-queued job picks a fresh slot on the next tick.
Tests
Snacks.Tests/Cluster/SlotLedgerTests.cs(new) — pins capacity gating, idempotent release, observation-only heartbeat updates, the dashboard projection, and the migration's contract that heartbeats can no longer delete reservations. Race A and Race B (the cases the legacy preserve-rule reconciler indirectly guarded) become trivially safe under the ledger and have explicit regression tests.
Transfer throttle + Networking settings
TransferThrottle — concurrency + bandwidth gate
Snacks/Services/Cluster/TransferThrottle.cs(new) — two orthogonal layers, both master-side. Concurrency caps track in-flight uploads/downloads via atomic CAS-only counters (cluster-wide and per-node), with the cap re-read from the live settings snapshot on every acquire attempt — settings changes take effect on the next acquire without rebuilding any state, holders that are over-cap stay in flight until they release naturally. Bandwidth caps useTokenBucketRateLimiterwith a 100 ms replenish cadence; chunks acquire tokens equal to their byte count before the master sends them. Acquires are sliced into 1 MB pieces so a single call never asks the bucket for more permits than itsTokenLimit, which means a settings change mid-transfer (raise, lower, or unlimited) takes effect on the very next slice rather than killing the upload. Per-node buckets are rebuilt only when the cap numerically changes — saving an unrelated setting (e.g. chunk size) doesn't disrupt an active transfer's pacing.- Cap of
0= unlimited for every layer; the corresponding gate is bypassed entirely (no acquire). Orthogonal to theSlotLedger— the ledger decides whether a job can be dispatched at all (per-device hardware capacity); the throttle decides how many of those dispatched jobs can transfer bytes concurrently and how fast. ForgetNode(nodeId)— drops per-node counters and rate limiters when a node disconnects permanently so a rejoin starts from a clean slate.
NetworkingSettings — the configuration surface
Snacks/Models/NetworkingSettings.cs(new) —MaxConcurrentUploads/MaxConcurrentUploadsPerNode/MaxConcurrentDownloads/MaxConcurrentDownloadsPerNode/MaxUploadMBps/MaxUploadMBpsPerNode/MaxDownloadMBps/MaxDownloadMBpsPerNode/ChunkSizeMB. All caps default to0(unlimited) so an empty config preserves pre-2.10 behaviour — exceptMaxConcurrentUploadsPerNode/MaxConcurrentDownloadsPerNode, which default to1so a node with multiple device slots receives one file at a time.Snacks/Services/NetworkingSettingsService.cs(new) — loads / persists${SNACKS_WORK_DIR}/config/networking.json; atomic write-and-replace; fires aChangedevent after every successful save so the throttle rebuilds rate limiters without a service restart. Validation rejects negative caps and clampsChunkSizeMBto[4, 256].Snacks/Controllers/NetworkingController.cs(new) —GET /api/networkingandPOST /api/networking. Validation errors return 400 so the UI can surface them inline.Snacks/Views/Shared/_NetworkingSettings.cshtml(new) — a Networking tab partial under Settings. Concurrency caps, bandwidth caps, and chunk size in three sections, each with the "0 = unlimited" affordance.Snacks/wwwroot/js/settings/panels/networking-panel.js(new) — load / save logic. Idempotent init; lazy first-fetch on activation; inline save-status feedback.
Wired into the transfer paths
ClusterFileTransferServicereads_throttle.ChunkSizeBytes— the historical 50 MB constant is now the fallback when no throttle is supplied (slave-only deployments). Master uploads / downloads ask the throttle for the current chunk size on every transfer, and callAcquireUploadBandwidthAsync/AcquireDownloadBandwidthAsyncper chunk so caps apply at byte granularity. The throttle handle is acquired around the upload viaAcquireUploadAsyncand disposed infinallyso a thrown exception releases the slot.ClusterServiceinjects the throttle —TransferThrottle?is constructor-injected (null on slave-only deployments) and threaded intoClusterFileTransferService. The download-side gate fires fromDispatchToNodeAsyncafter the encode completes and before the master pulls the encoded output back.
Tests
Snacks.Tests/Cluster/TransferThrottleTests.cs(new) — pins concurrency cap enforcement, the unlimited-cap fast path, settings live-mutation taking effect on the next acquire without ratcheting, the slice-loop's tolerance to a limiter swap mid-transfer, per-node counter scoping, and the "lower a cap from 4 to 1; existing holders run to completion; new acquires queue" semantics that make this safe to live-tune.
Cluster Logs page
/cluster-logs — live tail and ZIP export, any node
Snacks/Controllers/ClusterLogsController.cs(new) — pure view route at/cluster-logs; the data is served byDiagnosticsController's now-cluster-aware endpoints.Snacks/Views/ClusterLogs/Index.cshtml(new) — node picker, line-count selector (100 / 200 / 500 / 1000 / 5000), auto-refresh toggle (defaults on at 5 s polling), Refresh button, Download .zip button. Tail rendered in a monospaced<pre>with virtualised scroll.Snacks/wwwroot/js/cluster-logs/cluster-logs.js(new) — registers as the page handler for/cluster-logs; pollsGET /api/diagnostics/log?nodeId=…&lines=…every 5s; cancels in-flight fetches on node / lines change; rebuilds the download anchor'shrefon every selection change so the ZIP comes from the right node.- Logs nav button in the header — sits between Dashboard and the Pause / Settings / Browse Library cluster.
DiagnosticsController is now cluster-aware
GET /api/diagnostics/logandGET /api/diagnostics/logs.zipaccept?nodeId=— when omitted (or matching the local node) the request is served from the locallogs/directory; when the id is a remote node the request is proxied to that node's/api/cluster/diagnostics/*mirror over the cluster shared-secret channel. Pattern matchesDashboardController's remote aggregations.ClusterControllermirrorsGET diagnostics/logandGET diagnostics/logs.zip— the mirror endpoints the master proxies to. Logs may contain file paths and job names but no credentials, so the existing cluster shared-secret gate is sufficient.- Proxy responses surface 503 with
{ nodeId, hostname, lastSeen }when a remote node is offline so the polling UI keeps its last good snapshot and recovers automatically when the node reappears. - ZIP proxy streams the upstream body straight through to
Response.Bodywithout buffering server-side, and carries through the upstreamContent-Dispositionso the saved file is labeled with the remote node's hostname.
Shared LogArchiveService
Snacks/Services/LogArchiveService.cs(new) —ReadLatestLogTail(logsDir, lines)andWriteLogsZip(stream, logsDir). Reads withFileShare.ReadWriteso it doesn't fight Serilog's writer; per-file 50 MB skip in the ZIP path bounds the worst case to ≈ 70 MB (7 daily rolls × 10 MB cap + per-job FFmpeg logs); a file disappearing mid-enumeration (Serilog rolling at exactly the wrong moment) is swallowed and skipped. Used by both the local-sideDiagnosticsControllerand the worker-sideClusterControllermirrors so the master can re-stream a remote node's logs without duplicating the read logic.
Frontend plumbing
Snacks/wwwroot/js/api.js— newdiagnosticsApi.getLogTail(nodeId, lines)anddiagnosticsApi.logsZipUrl(nodeId); newnetworkingApi.getConfig()/saveConfig()for the Networking panel.Snacks/wwwroot/js/utils/download.js(new) —streamDownload(url, button, fallbackName)shared utility that fetches a URL as a Blob, surfaces a spinner on the triggering button while the request is in flight, and saves the body via a synthesised anchor click (honoringContent-Disposition). For endpoints whose first byte takes seconds — e.g. the master proxying a worker's log archive — a plain<a download>shows no feedback and feels broken.
NVIDIA NVDEC explicit decoder attach
-c:v <codec>_cuvid instead of relying on -hwaccel cuda auto-attach
TranscodingServiceConvertVideoAsync nvidia path —-hwaccel cudais just a hint. On some driver / setup combinations (datacenter Windows drivers, vGPU profiles) ffmpeg silently falls back to a software decoder while NVENC keeps working, which pegs the CPU on what should be a GPU-only encode. The encode path now explicitly emits-c:v <codec>_cuvid(h264_cuvid,hevc_cuvid,av1_cuvid,vp9_cuvid,vp8_cuvid,vc1_cuvid,mpeg2_cuvid,mpeg4_cuvid,mjpeg_cuvid) ahead of the input so NVDEC engagement is deterministic. Skipped on the mux pass (no decode), software fallback, or whenforceSwDecode=truefrom the retry chain.GetNvidiaInputDecoder(string? sourceCodec)— internal static mapper exposed for the unit tests.
Retry path drops the explicit decoder on cuvid failures
HandleConversionFailureRetry 3 — the existing "software decode + VAAPI encode" hwaccel-error retry now also fires for NVIDIA, dropping the explicit_cuviddecoder and falling back to-hwaccel cuda's auto-attach (which can in turn fall back to a software decoder while NVENC keeps the GPU encode). Triggered oncuvid/nvcuvid/ffnvcodec/Failed to get HW configerrors in addition to the existing VAAPI error patterns.
Tests
Snacks.Tests/Video/HardwareEncoderTests.cs— pins the codec → cuvid mapping for every codec the matrix supports, plus the no-mapping fallthrough for unknown source codecs.
Subtitle sidecar pull-back
Master fetches .srt / .ass / .vtt from the worker after a remote encode
ClusterControllerexposesGET /api/cluster/files/{jobId}/sidecars— lists subtitle sidecar files (.srt/.ass/.vtt) the worker wrote alongside the encoded output in the job's temp directory. Filtered to known subtitle extensions; basenames only.ClusterControllerexposesGET /api/cluster/files/{jobId}/sidecars/{name}— streams a single sidecar by basename. Defense in depth: rejects path-traversal sequences, rejects invalid filename characters, restricts extensions to the same subtitle whitelist, and re-roots the requested name under the job's temp directory with aStartsWithassertion.ClusterService.DownloadSidecarsFromNodeAsync— called immediately after the main download succeeds and beforeCleanupFileswipes the worker temp dir. Best-effort: any per-file failure is logged and skipped (the encode itself succeeded; the worker's copy is reaped on cleanup either way). Writes sidecars under the master's destination dir using the worker's basenames unchanged soTranscodingService.HandleOutputPlacement→MoveSidecarsAlongsideAsyncpicks them up automatically when it moves or renames the main output.
Files Changed
Slot ledger
Snacks/Services/Slots/SlotLedger.cs(new) — authoritative reservation storeSnacks/Services/Slots/SlotPhase.cs(new) —Reserved → Uploading → Encoding → Downloading → Completing+ wire-string mappersSnacks/Services/Slots/SlotReservation.cs(new)Snacks/Services/Slots/ReleaseReason.cs(new)Snacks/Services/SlotReconciler.cs(deleted)Snacks/Services/ClusterService.cs— atomicTryReserveat dispatch;IsDispatchableStatusacceptsUploading/Downloading;ReleaseActiveSlottakes aReleaseReason; observation-only heartbeat reconcile rebuildingActiveJobsfrom the ledger snapshotSnacks/Models/MediaFile.cs—DispatchedDeviceIdcolumnSnacks/Data/MediaFileRepository.cs—AssignToRemoteNodeAsyncacceptsdispatchedDeviceId; clear-assignment paths null itSnacks/Data/Migrations/20260505003522_AddMediaFileDispatchedDeviceId.cs(new) + DesignerSnacks/Program.cs— DI registration;SlotLedgerexposed viaClusterService.SlotLedgerSnacks.Tests/Cluster/SlotLedgerTests.cs(new)Snacks.Tests/Cluster/SlotReconcilerTests.cs(deleted)
Transfer throttle + Networking settings
Snacks/Services/Cluster/TransferThrottle.cs(new) — concurrency counters + token-bucket bandwidth limitersSnacks/Models/NetworkingSettings.cs(new)Snacks/Services/NetworkingSettingsService.cs(new)Snacks/Controllers/NetworkingController.cs(new)Snacks/Views/Shared/_NetworkingSettings.cshtml(new)Snacks/Views/Shared/_AppModals.cshtml— Networking tab nav + paneSnacks/wwwroot/js/settings/panels/networking-panel.js(new)Snacks/wwwroot/js/main.js— wiresinitNetworkingPanelSnacks/wwwroot/js/api.js—networkingApiSnacks/Services/ClusterFileTransferService.cs— chunk size from throttle; per-chunkAcquireUploadBandwidthAsync/AcquireDownloadBandwidthAsyncSnacks/Services/ClusterService.cs— throttle injection;AcquireUploadAsync/AcquireDownloadAsyncaround transfer pathsSnacks.Tests/Cluster/TransferThrottleTests.cs(new)
Cluster Logs page
Snacks/Controllers/ClusterLogsController.cs(new)Snacks/Views/ClusterLogs/Index.cshtml(new)Snacks/wwwroot/js/cluster-logs/cluster-logs.js(new)Snacks/Services/LogArchiveService.cs(new) — shared tail + ZIP read logicSnacks/Controllers/DiagnosticsController.cs—?nodeId=proxying to remote nodesSnacks/Controllers/ClusterController.cs— worker-sidediagnostics/log+diagnostics/logs.zipmirrorsSnacks/Views/Shared/_Layout.cshtml— Logs nav buttonSnacks/wwwroot/js/api.js—diagnosticsApi.getLogTail/logsZipUrlSnacks/wwwroot/js/utils/download.js(new) —streamDownloadhelperSnacks/Program.cs—LogArchiveServiceregistration
NVIDIA NVDEC
Snacks/Services/TranscodingService.cs— explicit-c:v <codec>_cuvidon the nvidia path;GetNvidiaInputDecodermapper; retry-3 drops the explicit decoder on cuvid/nvcuvid/ffnvcodec failuresSnacks.Tests/Video/HardwareEncoderTests.cs— codec → cuvid mapping pins
Subtitle sidecar pull-back
Snacks/Controllers/ClusterController.cs—GET files/{jobId}/sidecarsandGET files/{jobId}/sidecars/{name}Snacks/Services/ClusterService.cs—DownloadSidecarsFromNodeAsyncafter main download
Version bumps
Snacks/Controllers/HomeController.cs— health endpoint versionSnacks/Services/ClusterDiscoveryService.cs—ClusterVersionprotocol bump to 2.10.0Snacks/Views/Shared/_Layout.cshtml— footer versionREADME.md— badge + footerbuild-and-export.bat— Docker tag versionelectron-app/package.json/package-lock.json
Full documentation: README.md