github derekshreds/Snacks v2.10.0
Snacks v2.10.0

5 hours ago

Snacks v2.10.0

Automated Video Library Encoder

A minor release built around fixing cluster network saturation — every per-slot reservation in the cluster is now owned by a single authoritative SlotLedger (replacing the historical scheme where slot truth was scattered across ClusterNode.ActiveJobs / _activeUploads / _activeDownloads / _remoteJobs / worker heartbeats — any one of which could silently lose an entry and cause double-dispatch), and a new master-side TransferThrottle gates how many uploads/downloads can be in flight (cluster-wide and per-node) and how fast they go (token-bucket MB/s caps), configured from a new Networking tab in Settings. The default per-node upload cap of 1 directly addresses the user-reported pattern; settings changes take effect on the next chunk without rebuilding any state. A new Cluster Logs page (/cluster-logs) surfaces a live tail of any node's operations log with a one-click ZIP export — the master proxies remote nodes' logs through the cluster shared-secret channel so the operator never has to ssh into a worker. Two encoder/cluster fixes ride along: NVIDIA encodes now explicitly pin the NVDEC (cuvid) decoder instead of relying on -hwaccel cuda's auto-attach (which silently fell back to a software decoder on some driver/setup combinations — datacenter Windows drivers, vGPU profiles — pegging the CPU while NVENC kept the GPU encode path going); and the master now pulls subtitle sidecar files (.srt / .ass / .vtt) from the worker after a remote encode so they ride home alongside the main output before the worker temp dir is wiped on cleanup.


Authoritative slot ledger

SlotLedger — single source of truth for cluster slots

  • Snacks/Services/Slots/SlotLedger.cs (new) — every per-slot reservation in the cluster lives in one ConcurrentDictionary<jobId, SlotReservation> behind a write-lock. TryReserve is an atomic capacity-check + insert (capacity is resolved through an injected (nodeId, deviceId) => int so the ledger doesn't have to know about NodeSettings / HardwareDevice), Release is idempotent and emits one ReleaseReason-attributed log line, UpdateProgress is a heartbeat hook that mutates only Phase / Progress and never deletes. The dashboard chip list is no longer a write target — Snapshot(nodeId) materialises a wire-compatible ActiveJobInfo list at broadcast time so worker heartbeats can never silently lose a slot the master is still using.
  • Snacks/Services/Slots/SlotPhase.cs (new)Reserved → Uploading → Encoding → Downloading → Completing lifecycle states with ToWireString / FromWireString mappers so the dashboard's existing phase strings (Uploading / Encoding / Downloading) keep rendering unchanged. The pre-transfer Reserved phase surfaces as Uploading until bytes flow.
  • Snacks/Services/Slots/SlotReservation.cs (new)JobId / NodeId / DeviceId are immutable for the row's lifetime; FileName / Phase / Progress / PhaseEnteredAt are mutable through ledger methods only.
  • Snacks/Services/Slots/ReleaseReason.cs (new)Completed / NoSavings / Cancelled / NodeFailed / DispatchThrew / ValidationFailed / DownloadRetriesExhausted / Recovered. Logged on every release so the lifecycle of any leaked-or-not slot is auditable from the operations log alone.

Heartbeat reconcile is observation-only

  • ReconcileMultiSlotHeartbeatAsync no longer rewrites ClusterNode.ActiveJobs — the legacy SlotReconciler.ShouldPreserveEntry ladder (and its preserve-rule unit tests) is removed. A heartbeat now updates Phase / Progress on ledger rows the worker is reporting, surfaces an anomaly log line for jobs the worker reports without a matching reservation, and rebuilds ClusterNode.ActiveJobs from _slotLedger.Snapshot(nodeId) so the dashboard sees ledger truth on every tick. This eliminates Race A (heartbeat strips an entry between dispatch and the dispatch task running on the thread pool) and Race B (recovery-time double-attach) by construction.
  • Snacks/Services/SlotReconciler.cs (deleted) — and its companion Snacks.Tests/Cluster/SlotReconcilerTests.cs. Replaced wholesale by the ledger contract and its own test suite.

Atomic reservation at dispatch

  • ClusterService.ProcessQueueAsync — dispatch now calls _slotLedger.TryReserve(nodeId, deviceId, jobId, fileName) instead of synthesising an ActiveJobInfo and pushing it onto the node. On a lost CAS race (another tick / recovery path grabbed the same slot first) the work item is silently re-queued and the dispatch loop tries the next candidate. Phase transitions to Uploading immediately after the reservation, and a legacy projection entry is stamped on ClusterNode.ActiveJobs so any UI broadcast between dispatch and the next heartbeat shows the new reservation without waiting.
  • Dispatch-pool filter accepts Uploading / DownloadingIsDispatchableStatus now allows the new transient transfer statuses in addition to Online / Busy. Without it, the very first dispatch in a tick would knock the node out of the candidate pool for the rest of that tick — silently serialising every multi-slot node to one upload at a time.
  • ReleaseActiveSlot(node, jobId, reason) — every failure / cancel / completion path now passes the reason it's releasing (NodeFailed, DispatchThrew, Cancelled, etc.) so the log line attributes the cause.

Persisted device assignment for restart recovery

  • MediaFile.DispatchedDeviceId (new column) — the HardwareDevice.DeviceId the master allocated at dispatch time (intel, nvidia, cpu, …) is persisted alongside the existing Remote* fields so SlotLedger recovery rebuilds the same per-device occupancy after a master restart without having to query worker heartbeats. Cleared on completion / failure / re-queue alongside the other Remote fields.
  • Migration 20260505003522_AddMediaFileDispatchedDeviceId — adds the column and is auto-applied at startup.
  • MediaFileRepository.AssignToRemoteNodeAsync(..., string? dispatchedDeviceId = null) — optional parameter so older callers leave the field untouched; ClearRemoteAssignmentAsync always clears it so a re-queued job picks a fresh slot on the next tick.

Tests

  • Snacks.Tests/Cluster/SlotLedgerTests.cs (new) — pins capacity gating, idempotent release, observation-only heartbeat updates, the dashboard projection, and the migration's contract that heartbeats can no longer delete reservations. Race A and Race B (the cases the legacy preserve-rule reconciler indirectly guarded) become trivially safe under the ledger and have explicit regression tests.

Transfer throttle + Networking settings

TransferThrottle — concurrency + bandwidth gate

  • Snacks/Services/Cluster/TransferThrottle.cs (new) — two orthogonal layers, both master-side. Concurrency caps track in-flight uploads/downloads via atomic CAS-only counters (cluster-wide and per-node), with the cap re-read from the live settings snapshot on every acquire attempt — settings changes take effect on the next acquire without rebuilding any state, holders that are over-cap stay in flight until they release naturally. Bandwidth caps use TokenBucketRateLimiter with a 100 ms replenish cadence; chunks acquire tokens equal to their byte count before the master sends them. Acquires are sliced into 1 MB pieces so a single call never asks the bucket for more permits than its TokenLimit, which means a settings change mid-transfer (raise, lower, or unlimited) takes effect on the very next slice rather than killing the upload. Per-node buckets are rebuilt only when the cap numerically changes — saving an unrelated setting (e.g. chunk size) doesn't disrupt an active transfer's pacing.
  • Cap of 0 = unlimited for every layer; the corresponding gate is bypassed entirely (no acquire). Orthogonal to the SlotLedger — the ledger decides whether a job can be dispatched at all (per-device hardware capacity); the throttle decides how many of those dispatched jobs can transfer bytes concurrently and how fast.
  • ForgetNode(nodeId) — drops per-node counters and rate limiters when a node disconnects permanently so a rejoin starts from a clean slate.

NetworkingSettings — the configuration surface

  • Snacks/Models/NetworkingSettings.cs (new)MaxConcurrentUploads / MaxConcurrentUploadsPerNode / MaxConcurrentDownloads / MaxConcurrentDownloadsPerNode / MaxUploadMBps / MaxUploadMBpsPerNode / MaxDownloadMBps / MaxDownloadMBpsPerNode / ChunkSizeMB. All caps default to 0 (unlimited) so an empty config preserves pre-2.10 behaviour — except MaxConcurrentUploadsPerNode / MaxConcurrentDownloadsPerNode, which default to 1 so a node with multiple device slots receives one file at a time.
  • Snacks/Services/NetworkingSettingsService.cs (new) — loads / persists ${SNACKS_WORK_DIR}/config/networking.json; atomic write-and-replace; fires a Changed event after every successful save so the throttle rebuilds rate limiters without a service restart. Validation rejects negative caps and clamps ChunkSizeMB to [4, 256].
  • Snacks/Controllers/NetworkingController.cs (new)GET /api/networking and POST /api/networking. Validation errors return 400 so the UI can surface them inline.
  • Snacks/Views/Shared/_NetworkingSettings.cshtml (new) — a Networking tab partial under Settings. Concurrency caps, bandwidth caps, and chunk size in three sections, each with the "0 = unlimited" affordance.
  • Snacks/wwwroot/js/settings/panels/networking-panel.js (new) — load / save logic. Idempotent init; lazy first-fetch on activation; inline save-status feedback.

Wired into the transfer paths

  • ClusterFileTransferService reads _throttle.ChunkSizeBytes — the historical 50 MB constant is now the fallback when no throttle is supplied (slave-only deployments). Master uploads / downloads ask the throttle for the current chunk size on every transfer, and call AcquireUploadBandwidthAsync / AcquireDownloadBandwidthAsync per chunk so caps apply at byte granularity. The throttle handle is acquired around the upload via AcquireUploadAsync and disposed in finally so a thrown exception releases the slot.
  • ClusterService injects the throttleTransferThrottle? is constructor-injected (null on slave-only deployments) and threaded into ClusterFileTransferService. The download-side gate fires from DispatchToNodeAsync after the encode completes and before the master pulls the encoded output back.

Tests

  • Snacks.Tests/Cluster/TransferThrottleTests.cs (new) — pins concurrency cap enforcement, the unlimited-cap fast path, settings live-mutation taking effect on the next acquire without ratcheting, the slice-loop's tolerance to a limiter swap mid-transfer, per-node counter scoping, and the "lower a cap from 4 to 1; existing holders run to completion; new acquires queue" semantics that make this safe to live-tune.

Cluster Logs page

/cluster-logs — live tail and ZIP export, any node

  • Snacks/Controllers/ClusterLogsController.cs (new) — pure view route at /cluster-logs; the data is served by DiagnosticsController's now-cluster-aware endpoints.
  • Snacks/Views/ClusterLogs/Index.cshtml (new) — node picker, line-count selector (100 / 200 / 500 / 1000 / 5000), auto-refresh toggle (defaults on at 5 s polling), Refresh button, Download .zip button. Tail rendered in a monospaced <pre> with virtualised scroll.
  • Snacks/wwwroot/js/cluster-logs/cluster-logs.js (new) — registers as the page handler for /cluster-logs; polls GET /api/diagnostics/log?nodeId=…&lines=… every 5s; cancels in-flight fetches on node / lines change; rebuilds the download anchor's href on every selection change so the ZIP comes from the right node.
  • Logs nav button in the header — sits between Dashboard and the Pause / Settings / Browse Library cluster.

DiagnosticsController is now cluster-aware

  • GET /api/diagnostics/log and GET /api/diagnostics/logs.zip accept ?nodeId= — when omitted (or matching the local node) the request is served from the local logs/ directory; when the id is a remote node the request is proxied to that node's /api/cluster/diagnostics/* mirror over the cluster shared-secret channel. Pattern matches DashboardController's remote aggregations.
  • ClusterController mirrors GET diagnostics/log and GET diagnostics/logs.zip — the mirror endpoints the master proxies to. Logs may contain file paths and job names but no credentials, so the existing cluster shared-secret gate is sufficient.
  • Proxy responses surface 503 with { nodeId, hostname, lastSeen } when a remote node is offline so the polling UI keeps its last good snapshot and recovers automatically when the node reappears.
  • ZIP proxy streams the upstream body straight through to Response.Body without buffering server-side, and carries through the upstream Content-Disposition so the saved file is labeled with the remote node's hostname.

Shared LogArchiveService

  • Snacks/Services/LogArchiveService.cs (new)ReadLatestLogTail(logsDir, lines) and WriteLogsZip(stream, logsDir). Reads with FileShare.ReadWrite so it doesn't fight Serilog's writer; per-file 50 MB skip in the ZIP path bounds the worst case to ≈ 70 MB (7 daily rolls × 10 MB cap + per-job FFmpeg logs); a file disappearing mid-enumeration (Serilog rolling at exactly the wrong moment) is swallowed and skipped. Used by both the local-side DiagnosticsController and the worker-side ClusterController mirrors so the master can re-stream a remote node's logs without duplicating the read logic.

Frontend plumbing

  • Snacks/wwwroot/js/api.js — new diagnosticsApi.getLogTail(nodeId, lines) and diagnosticsApi.logsZipUrl(nodeId); new networkingApi.getConfig() / saveConfig() for the Networking panel.
  • Snacks/wwwroot/js/utils/download.js (new)streamDownload(url, button, fallbackName) shared utility that fetches a URL as a Blob, surfaces a spinner on the triggering button while the request is in flight, and saves the body via a synthesised anchor click (honoring Content-Disposition). For endpoints whose first byte takes seconds — e.g. the master proxying a worker's log archive — a plain <a download> shows no feedback and feels broken.

NVIDIA NVDEC explicit decoder attach

-c:v <codec>_cuvid instead of relying on -hwaccel cuda auto-attach

  • TranscodingService ConvertVideoAsync nvidia path-hwaccel cuda is just a hint. On some driver / setup combinations (datacenter Windows drivers, vGPU profiles) ffmpeg silently falls back to a software decoder while NVENC keeps working, which pegs the CPU on what should be a GPU-only encode. The encode path now explicitly emits -c:v <codec>_cuvid (h264_cuvid, hevc_cuvid, av1_cuvid, vp9_cuvid, vp8_cuvid, vc1_cuvid, mpeg2_cuvid, mpeg4_cuvid, mjpeg_cuvid) ahead of the input so NVDEC engagement is deterministic. Skipped on the mux pass (no decode), software fallback, or when forceSwDecode=true from the retry chain.
  • GetNvidiaInputDecoder(string? sourceCodec) — internal static mapper exposed for the unit tests.

Retry path drops the explicit decoder on cuvid failures

  • HandleConversionFailure Retry 3 — the existing "software decode + VAAPI encode" hwaccel-error retry now also fires for NVIDIA, dropping the explicit _cuvid decoder and falling back to -hwaccel cuda's auto-attach (which can in turn fall back to a software decoder while NVENC keeps the GPU encode). Triggered on cuvid / nvcuvid / ffnvcodec / Failed to get HW config errors in addition to the existing VAAPI error patterns.

Tests

  • Snacks.Tests/Video/HardwareEncoderTests.cs — pins the codec → cuvid mapping for every codec the matrix supports, plus the no-mapping fallthrough for unknown source codecs.

Subtitle sidecar pull-back

Master fetches .srt / .ass / .vtt from the worker after a remote encode

  • ClusterController exposes GET /api/cluster/files/{jobId}/sidecars — lists subtitle sidecar files (.srt / .ass / .vtt) the worker wrote alongside the encoded output in the job's temp directory. Filtered to known subtitle extensions; basenames only.
  • ClusterController exposes GET /api/cluster/files/{jobId}/sidecars/{name} — streams a single sidecar by basename. Defense in depth: rejects path-traversal sequences, rejects invalid filename characters, restricts extensions to the same subtitle whitelist, and re-roots the requested name under the job's temp directory with a StartsWith assertion.
  • ClusterService.DownloadSidecarsFromNodeAsync — called immediately after the main download succeeds and before CleanupFiles wipes the worker temp dir. Best-effort: any per-file failure is logged and skipped (the encode itself succeeded; the worker's copy is reaped on cleanup either way). Writes sidecars under the master's destination dir using the worker's basenames unchanged so TranscodingService.HandleOutputPlacementMoveSidecarsAlongsideAsync picks them up automatically when it moves or renames the main output.

Files Changed

Slot ledger

  • Snacks/Services/Slots/SlotLedger.cs (new) — authoritative reservation store
  • Snacks/Services/Slots/SlotPhase.cs (new) — Reserved → Uploading → Encoding → Downloading → Completing + wire-string mappers
  • Snacks/Services/Slots/SlotReservation.cs (new)
  • Snacks/Services/Slots/ReleaseReason.cs (new)
  • Snacks/Services/SlotReconciler.cs (deleted)
  • Snacks/Services/ClusterService.cs — atomic TryReserve at dispatch; IsDispatchableStatus accepts Uploading/Downloading; ReleaseActiveSlot takes a ReleaseReason; observation-only heartbeat reconcile rebuilding ActiveJobs from the ledger snapshot
  • Snacks/Models/MediaFile.csDispatchedDeviceId column
  • Snacks/Data/MediaFileRepository.csAssignToRemoteNodeAsync accepts dispatchedDeviceId; clear-assignment paths null it
  • Snacks/Data/Migrations/20260505003522_AddMediaFileDispatchedDeviceId.cs (new) + Designer
  • Snacks/Program.cs — DI registration; SlotLedger exposed via ClusterService.SlotLedger
  • Snacks.Tests/Cluster/SlotLedgerTests.cs (new)
  • Snacks.Tests/Cluster/SlotReconcilerTests.cs (deleted)

Transfer throttle + Networking settings

  • Snacks/Services/Cluster/TransferThrottle.cs (new) — concurrency counters + token-bucket bandwidth limiters
  • Snacks/Models/NetworkingSettings.cs (new)
  • Snacks/Services/NetworkingSettingsService.cs (new)
  • Snacks/Controllers/NetworkingController.cs (new)
  • Snacks/Views/Shared/_NetworkingSettings.cshtml (new)
  • Snacks/Views/Shared/_AppModals.cshtml — Networking tab nav + pane
  • Snacks/wwwroot/js/settings/panels/networking-panel.js (new)
  • Snacks/wwwroot/js/main.js — wires initNetworkingPanel
  • Snacks/wwwroot/js/api.jsnetworkingApi
  • Snacks/Services/ClusterFileTransferService.cs — chunk size from throttle; per-chunk AcquireUploadBandwidthAsync / AcquireDownloadBandwidthAsync
  • Snacks/Services/ClusterService.cs — throttle injection; AcquireUploadAsync / AcquireDownloadAsync around transfer paths
  • Snacks.Tests/Cluster/TransferThrottleTests.cs (new)

Cluster Logs page

  • Snacks/Controllers/ClusterLogsController.cs (new)
  • Snacks/Views/ClusterLogs/Index.cshtml (new)
  • Snacks/wwwroot/js/cluster-logs/cluster-logs.js (new)
  • Snacks/Services/LogArchiveService.cs (new) — shared tail + ZIP read logic
  • Snacks/Controllers/DiagnosticsController.cs?nodeId= proxying to remote nodes
  • Snacks/Controllers/ClusterController.cs — worker-side diagnostics/log + diagnostics/logs.zip mirrors
  • Snacks/Views/Shared/_Layout.cshtml — Logs nav button
  • Snacks/wwwroot/js/api.jsdiagnosticsApi.getLogTail / logsZipUrl
  • Snacks/wwwroot/js/utils/download.js (new) — streamDownload helper
  • Snacks/Program.csLogArchiveService registration

NVIDIA NVDEC

  • Snacks/Services/TranscodingService.cs — explicit -c:v <codec>_cuvid on the nvidia path; GetNvidiaInputDecoder mapper; retry-3 drops the explicit decoder on cuvid/nvcuvid/ffnvcodec failures
  • Snacks.Tests/Video/HardwareEncoderTests.cs — codec → cuvid mapping pins

Subtitle sidecar pull-back

  • Snacks/Controllers/ClusterController.csGET files/{jobId}/sidecars and GET files/{jobId}/sidecars/{name}
  • Snacks/Services/ClusterService.csDownloadSidecarsFromNodeAsync after main download

Version bumps

  • Snacks/Controllers/HomeController.cs — health endpoint version
  • Snacks/Services/ClusterDiscoveryService.csClusterVersion protocol bump to 2.10.0
  • Snacks/Views/Shared/_Layout.cshtml — footer version
  • README.md — badge + footer
  • build-and-export.bat — Docker tag version
  • electron-app/package.json / package-lock.json

Full documentation: README.md

Don't miss a new Snacks release

NewReleases is sending notifications on new releases.