Snacks v2.7.0
Automated Video Library Encoder
A minor release that adds per-device concurrency — every node now exposes its hardware encoders (NVIDIA / Intel / AMD / Apple / CPU) as discrete dispatch slots, with user-tunable caps per device, so a single beefy box can run several encodes at once instead of a one-job-at-a-time bottleneck. Pairs with a new Encode Dashboard at /dashboard — a persistent ledger of every completed encode with hero stats, savings-over-time, per-device utilization, codec mix, node throughput, top wins, and a recent-activity table. The release also lands a stuck-job watchdog (per-job and master-wide) that recovers items orphaned in Processing / Uploading / Downloading, an OCR slot lock so parallel encodes can't race the shared Tesseract engine, UTC-correct timestamps across the API/SignalR boundary, and an SPA shell so navigating between the queue and dashboard pages keeps SignalR alive instead of tearing down state.
New Features
Per-Device Concurrency
- Hardware as discrete slots -- a worker now reports a
Devices[]list (one entry per vendor family it can drive concurrently —nvidia,intel,amd,apple,cpu) instead of a singleGpuVendorstring. Each device carries aDefaultConcurrency(NVIDIA defaults to 2, QSV/AMF/VideoToolbox to 1, CPU to a fraction of logical cores) and the master treats every (node, device) pair as a slot pool the dispatcher draws from. A node with NVENC + iGPU now actually runs two encodes at once instead of serializing on a singlecurrentJobId. - Master-side scoring with vendor preference + load-spread -- when picking where to send a job, the master scores every free slot across the cluster: codec compatibility (rejects unsupported), the user's
HardwareAccelerationpreference (auto/none/ specific vendor), and a small headroom bonus that lightly prefers the device with more free slots so two NVENCs ahead of you don't both pile onto card 0. CPU is reserved for explicit "Software" jobs and for "auto" jobs on nodes with no hardware encoder at all — under "auto" with detected hardware, CPU is excluded outright so a job won't silently land on a slow software encode while a GPU sits idle. - Per-node device overrides -- the cluster override dialog now has a Hardware Concurrency section listing every detected device on the target node, with a per-device enable toggle and a
MaxConcurrencycap. Useful for "let this iGPU encode but only one job at a time", "disable my AMD card outright while I drive it for gaming", or "raise NVIDIA from 2→3 on the box that can actually take it". Saved overrides push to all clients via SignalR so chip rendering reflects the change immediately, not on the next reload. - CPU is never a chip, always a fallback, pinned to 1 -- the per-device chip strip on each node card shows hardware encoders only (
NVIDIA 0/2,INTEL 1/1); CPU is treated as the implicit fallback and doesn't earn a chip — it would just appear on every card and dilute the signal. CPU is also exempt from per-node overrides — it's hidden from the override dialog and pinned to a single concurrent encode regardless of any savedMaxConcurrencyvalue, since CPU exists only as a destination for explicit "Software" jobs and parallel software encodes would just thrash. Active CPU jobs still appear in the per-node active-jobs list (prefixed[cpu]). - Standalone gets the same control panel -- "Edit Hardware Settings" on the local-node card opens the same dialog in standalone mode with only the Hardware Concurrency section visible (4K dispatch rules and encoding overrides only fire during cluster dispatch). A user with a single beefy desktop can now tell Snacks to run 2 NVENC encodes at once without pretending to be a cluster.
- Live capacity changes wake the running scheduler -- raising a cap from 1→2 mid-run used to only take effect after the currently running encode finished, because the local scheduler was parked in
WhenAny(inflight). The scheduler now races inflight task completions against an explicit settings-change wake signal so the new cap is honored on the next tick. Same for the cluster dispatcher: settings mutations now fire a fresh dispatch pass instead of waiting for the next 2s tick. - Per-job device pinning end-to-end -- the master stamps the chosen
DeviceIdand effectiveMaxConcurrencyinto the worker'sJobMetadataenvelope; the worker's slot pool grows on demand to honor the master's cap (so a fixed-size semaphore can't 503 a job the master legitimately scheduled under a higher override) and pins the encode to that device family by overridingEncoderOptions.HardwareAcceleration. The sameDispatchedDeviceIdflows into the encode-history ledger so dashboard analytics attribute the work even after the slot has been released. - Cancel scoping -- killing a single remote job no longer unwinds peer encodes on other slots. Each in-flight encode (local and remote) carries its own
CancellationTokenSourceso a master-issued cancel only kills that specific job; everything else on the node keeps running. - Legacy single-slot path retired -- the old
currentJobId/_currentRemoteJob/_activeProcessfields are gone. The legacy heartbeat shape (currentJobId,progress,completedJobId,receivingJobId) is still emitted alongside the new multi-slot fields (activeJobs[],completedJobIds[],receivingJobIds[]) so older nodes still see something, but the master no longer falls back to single-slot dispatch.
Encode Dashboard (/dashboard)
- Persistent encode-history ledger -- a new
EncodeHistorySQLite table records one row per completed encode (or "no savings" outcome where the output was discarded). Captures original/encoded bytes + bytes saved, original/encoded codec, source/output bitrate, duration, encode wall-clock, dispatched device, node hostname/ID, was-remote, is-4K, started/completed timestamps, and anOutcomemarker. Append-only by design — failed encodes are not recorded; this is a ledger of completed work, not an error log. New EF Core migration20260429014928_AddEncodeHistoryadds the table. - Hero strip + range picker -- four headline cards (Bytes Saved, Files Encoded, Encode Time, Avg Compression) with sparkline trends, plus a 7d / 30d / 90d / 1y range picker that re-queries every panel.
- Per-device + per-node analytics -- a Device Workload stripe shows total work routed to each
DeviceIdover the window (NVIDIA / Intel / AMD / Apple / CPU). A Node Throughput leaderboard ranks every node in the cluster by completed-encode count and bytes saved so you can see whether the new GPU is actually pulling its weight. - Codec mix donut + savings-over-time + recent + top-savings -- a codec donut surfaces how much of the library has been migrated off h264, a continuous daily savings line chart with empty-day backfill, a recent-encodes activity table, and a "biggest wins" leaderboard ordered by absolute bytes saved.
- Worker dashboard transparently proxies to master -- workers in node mode have an empty local ledger by design (every completed encode is persisted on the master only). When a worker's
/api/dashboard/*handler is hit, it proxies to the master's/api/cluster/dashboard/*mirror over the cluster shared-secret channel and streams the JSON response back verbatim. The dashboard's frontend never has to know which side it's talking to. Falls back to an empty payload (200) when the master is unreachable so the chart renderer draws an empty state instead of erroring. - Clear dashboard data -- new "Clear Dashboard Data" button in Advanced Settings wipes the ledger after explicit confirmation. On a worker the request proxies to the master (the worker's own ledger is empty); on a master/standalone the deletion runs locally. Either way, a SignalR
EncodeHistoryClearedbroadcast tells every connected client to repaint to zero.
SPA Shell
- Navigating between Queue and Dashboard no longer reloads the page -- click
QueueorDashboardin the navbar and the shell intercepts the click, fetches the new route, and swaps#page-contentinstead of doing a full document reload. The SignalR connection, modals, and cluster dashboard state all survive the navigation.popstateis wired so back/forward also routes through the SPA. External links and modifier-clicked links (Cmd+clicketc.) fall through to the browser's default behaviour. - Page lifecycle hooks -- pages register
mount/unmountcallbacks against theirdata-pagename; the shell drives the lifecycle so timers and SignalR subscriptions cycle cleanly across navigations instead of stacking up. - Shared modals partial -- the Library Browser, Analyze Results, Folder Picker, Settings, and Confirm modals moved out of the queue page and into a layout-level
_AppModals.cshtmlpartial. Every page in the SPA shell can now open them without the modals having to be re-rendered on each route change. - Off-page resilience -- the queue manager and cluster dashboard now no-op their DOM updates when the queue page isn't mounted; the in-memory work-item map and worker list stay current, and the next mount paints them in. Saves a few wasted DOM lookups per SignalR event when the user is staring at the dashboard.
Bug Fixes & Reliability
Job Watchdog
- New per-job watchdog inside the encode pipeline -- a 30-second tick alongside every running encode aborts the job if no log line, status change, progress packet, or transfer-progress update has refreshed
WorkItem.LastUpdatedAtfor 15 minutes. Defends against hangs in pre-encode stages (hardware-accel detection, FFprobe, crop-detect) that wouldn't be caught by FFmpeg's own no-output stall detection because they predate ffmpeg even being launched. Aborting cancels the job's CTS, which unwinds OCR / sidecar extraction / tessdata-download child processes on the same hook the user-issued cancel uses. - Master-wide stuck-item watchdog -- a 30-second tick on the master scans every work item in
Processing/Uploading/Downloadingfor items orphaned past 5 minutes byLastUpdatedAt. Three rescue cases: (a) assigned to a ghost node — the node it was sent to no longer exists in the cluster registry, so requeue viaHandleNodeFailureAsync; (b) orphaned local-side — no node assignment, not running in any of the master's active local slots, and not in_remoteJobs, so requeue and clear the DB's remote-assignment marker; (c) stalled remote — tracked in_remoteJobsbut no progress for 10+ minutes, so requeue (deferred to 10 min so this check cooperates with — and fires after — the existing 100-second grace counter). First tick deferred 60s to give recovery time to settle; only runs after recovery completes. LastUpdatedAtis bumped on every sign of life -- status setter, progress setter, transfer-progress setter, and every log line allTouch()the work item so the watchdog doesn't kill jobs that are emitting output but no formal progress ticks (e.g. crop-detect, OCR pre-pass, hardware-accel probing). In-memory only — deliberately not persisted; a master restart re-bases the timestamp from recovery.- Requeue jobs on removed nodes -- when the heartbeat reconciler removes an unreachable node, any remote jobs still assigned to it are now requeued through
HandleNodeFailureAsync. Without this, jobs whose owning node disappeared lingered in_remoteJobsforever — a permanent orphan state that even the watchdog couldn't see because the items still had a_remoteJobsentry. - Stale
_activeUploadsentries no longer silently drop dispatches -- a duplicate-dispatch guard added in v2.5 had a hole: when the prior dispatch attempt aborted but didn't clean up, the next attempt would see the stale_activeUploadsentry and return without requeuing — orphaning the work item with no node assignment, no_remoteJobsentry, and not on the queue. The guard now distinguishes "real concurrent dispatch" (item is in_remoteJobs→ skip silently, the in-flight wins) from "stale entry" (item is not in_remoteJobs→ clear the entry and requeue so the next tick can retry). - Idle-grace tells the worker to kill its straggler before requeuing -- when the master decides a remote node has gone idle on a job (3 → bumped to 10 grace heartbeats now, so transient SignalR blips don't cost the job), it now issues a DELETE for the job ID against the worker before requeuing. Without this, a confused worker that's still encoding silently could double-process the same job after the master sent it elsewhere.
Original-Language Pre-Resolution
- Master resolves Sonarr/Radarr lookup before dispatch -- the
KeepOriginalLanguageoption needs to talk to the master's configured Sonarr/Radarr instances and match the source path against a media root. Workers can't run that lookup themselves: theirworkItem.Pathis a temp upload path that doesn't match any configured root, and the folder-name fallback hits a UUID job dir. The master now pre-resolves the original language against the real source path and merges it intoAudioLanguagesToKeep/SubtitleLanguagesToKeepbefore shipping options to the worker, then disablesKeepOriginalLanguageon the clone so the worker can't re-attempt the lookup against its temp path.
OCR Slot Lock
- Tesseract is now serialised at the movie level on multi-slot nodes -- multi-slot encoding can run several jobs in parallel on a single node, but Tesseract's engine state isn't safe to drive concurrently and the OCR pipeline runs one cue at a time anyway. Two parallel encodes hitting the shared engine cache produced cross-job state corruption and interleaved log output.
NativeOcrServicenow exposes a node-wideAcquireOcrSlotAsyncthat the subtitle-extraction service holds for the full bitmap pass of a movie — a parallel encode's OCR work waits its turn behind a "waiting for OCR" log line instead of racing on the engines. Text streams are still extracted by ffmpeg directly and skip the lock.
UTC Timestamps Across the API/SignalR Boundary
- No more "-18000s ago" labels -- SQLite stores
DateTimeas TEXT and EF Core hands values back withDateTimeKind.Unspecified. The defaultSystem.Text.Jsonoutput for those values has no timezone marker, sonew Date()in the browser interprets them as local time — the dashboard's relative-time labels then drifted by the user's timezone offset (e.g. "-18000s ago" for a CDT/UTC five-hour gap). A newUtcDateTimeConverter/NullableUtcDateTimeConverterpair is wired into both the MVC JSON pipeline and the SignalR JSON protocol to coerce everyDateTimeto UTC ISO-8601 with theZsuffix on the wire.
Worker-Local UI Cleanup
- No more synthetic "Uploading 100%" stuck-state -- when an autonomous-encoding accept moved into the OCR pre-pass, the synthetic 100% Uploading frame the controller broadcast at receive time was the last broadcast on the worker's hub for that job until encode completion — the worker's local UI sat at "Uploading 100%" for the duration. The accept path now broadcasts a clean
Processing/Encodinghandover frame so the card transitions out of the upload state immediately. The reject path broadcasts aCancelledframe so the orphan card actually disappears from the UI on rejection instead of sitting permanently at 100%. - Worker-local broadcasts no longer mislabel themselves as "remote" -- the
assignedNodeName: "master"field was being sent on transfer-progress and download-progress broadcasts that fire on the worker's own hub, where the work item is being processed locally. The "Processing on remote node X" badge would then appear on a card the worker is encoding itself. The field is now omitted from worker-local broadcasts. - Node badge appears on cards dispatched after their initial render -- the badge update path used to require a pre-existing
.badge.bg-secondaryelement on the card; cards that started without an assignment and got dispatched mid-render had no badge until a full reload. The renderer now creates the badge on demand and inserts it next to the status badge in the same row the initial layout uses.
Files Changed
Per-device concurrency
Snacks/Models/HardwareDevice.cs(new) --HardwareDevice(id, display name, supported codecs, encoders, default concurrency, isHardware) andActiveJobInfo(per-job heartbeat snapshot)Snacks/Models/NodeSettings.cs-- newDeviceSettingsmap (DeviceConcurrencySettingper device id) for per-node enable/disable + max-concurrency overridesSnacks/Models/ClusterNode.cs--Capabilities.Devices[]; legacyActiveWorkItemIdnow derived fromActiveJobs[]first entrySnacks/Models/JobMetadata.cs-- newDeviceId+DeviceMaxConcurrencyfields shipped from master to workerSnacks/Models/WorkItem.cs-- newDispatchedDeviceId(captured at dispatch time for ledger attribution);LastUpdatedAt+Touch()(watchdog hooks)Snacks/Services/ClusterDiscoveryService.cs-- capability advertisement now includesDeviceslist;Statusalways reported asOnline(master infersBusyfrom per-device occupancy)Snacks/Services/ClusterService.cs-- per-(node, device) slot pools; codec/vendor/load-spread scoring; CPU-as-fallback gate;EffectiveDeviceCapacity; settings-change broadcast (NodeSettingsChanged); legacy single-slot dispatch path retiredSnacks/Services/ClusterNodeJobService.cs-- per-jobActiveRemoteJobrecords keyed by job id; per-deviceDeviceSlotPoolthat grows on demand to honor master-set caps; one CTS per jobSnacks/Services/TranscodingService.cs-- per-jobActiveLocalJobrecords (replaces single_activeProcess/_activeWorkItem); local device-slot acquisition;WakeScheduler+WaitForSchedulerProgressAsyncso settings changes mid-run take effect immediatelySnacks/wwwroot/js/cluster/cluster-dashboard.js-- per-device chip rendering with effective caps fromNodeSettings; cached node-settings updated byNodeSettingsChanged;redraw()for SPA mountSnacks/wwwroot/js/cluster/override-dialog.js-- new Hardware Concurrency section (per-device enable + max-concurrency); standalone-mode hardware-only variantSnacks/wwwroot/css/site.css--.device-chip-mini,.cluster-card-chips, dashboard chart stylesSnacks/wwwroot/js/core/signalr-client.js--NodeSettingsChangedhandler
Encode dashboard
Snacks/Models/EncodeHistory.cs(new) -- ledger row schemaSnacks/Data/EncodeHistoryRepository.cs(new) -- summary, savings-over-time (with empty-day backfill), device utilization, codec mix, node throughput, recent, top savings,ClearAllAsyncSnacks/Data/Migrations/20260429014928_AddEncodeHistory.{cs,Designer.cs}(new) +SnacksDbContextModelSnapshot.cs-- EF Core migration adding theEncodeHistorytableSnacks/Data/SnacksDbContext.cs-- DbSet wiring + indexes for the ledgerSnacks/Controllers/DashboardController.cs(new) --/dashboardpage +/api/dashboard/*JSON endpoints + worker-side proxy to master + clear-history with SignalR broadcastSnacks/Controllers/ClusterController.cs--/api/cluster/dashboard/*mirror so workers can proxy in;DELETE /api/cluster/dashboard/historySnacks/Views/Dashboard/Index.cshtml(new) -- hero strip, charts, device stripe, node leaderboard, recent activity, top savingsSnacks/wwwroot/js/dashboard/dashboard.js(new) -- hand-rolled SVG chart rendering, range picker, panel data fetchesSnacks/wwwroot/js/settings/panels/advanced-panel.js-- "Clear Dashboard Data" wiring with confirm modal
SPA shell
Snacks/Views/Shared/_Layout.cshtml--#page-contentswap target; nav links carrydata-spa-link;_AppModalspartial included at layout level; main.js promoted to moduleSnacks/Views/Shared/_AppModals.cshtml(new) -- shared modal DOM extracted fromIndex.cshtml(Library, Analyze, Folder Picker, Settings, Confirm)Snacks/Views/Home/Index.cshtml-- modals removed, page content onlySnacks/wwwroot/js/core/navigation.js(new) -- click interceptor, fetch-and-swap,popstate, page mount/unmount lifecycleSnacks/wwwroot/js/main.js-- queue + dashboard pages registered with the shell;clusterDashboard.redraw()on queue mountSnacks/wwwroot/js/queue/queue-manager.js-- bail when queue containers aren't in the DOM (off-page); throttled refresh after status transitions to fill freed slotsSnacks/Controllers/DashboardController.cs/Snacks/Controllers/ClusterController.cs-- shared layout means the dashboard page is now reachable via SPA fetch as well as direct nav
Job watchdog
Snacks/Services/ClusterService.cs-- master-wideRunStuckItemWatchdogAsync(30s tick); requeue on node removal; stale-_activeUploadsdistinguishing real-concurrent vs orphan; idle-grace bump 3→10 with worker DELETE before requeueSnacks/Services/TranscodingService.cs-- per-job 30s watchdog insideConvertVideoAsyncthat aborts on 15min silence;LogAsynctouchesLastUpdatedAtSnacks/Models/WorkItem.cs--LastUpdatedAt(touched on status/progress/transfer-progress/log) +Touch()helper
Original-language passthrough
Snacks/Services/ClusterService.cs-- newCloneOptionsForWorkerAsyncresolvesKeepOriginalLanguageagainst the master's integrations before shipping to the worker; merges into keep-lists; disables the flag on the clone
OCR slot lock
Snacks/Services/Ocr/NativeOcrService.cs-- node-wide_ocrSlotsemaphore;AcquireOcrSlotAsyncwith holder-label and queued-behind log line;OcrSlotReleaserIDisposableSnacks/Services/SubtitleExtractionService.cs-- sidecar pass acquires the slot lazily on the first bitmap stream and releases after the loop; OCR-mux pass holds the slot for the full bitmap pass
UTC datetime serialization
Snacks/Json/UtcDateTimeConverter.cs(new) --UtcDateTimeConverter+NullableUtcDateTimeConvertercoerceDateTimeKind.Unspecifiedto UTC and emit ISO-8601 withZsuffixSnacks/Program.cs-- both converters registered on the MVC JSON pipeline and the SignalR JSON protocol
Worker-local UI cleanup
Snacks/Controllers/ClusterController.cs-- accept path broadcastsProcessing/Encodinghandover; reject path broadcastsCancelled;assignedNodeName: "master"removed from worker-local broadcastsSnacks/wwwroot/js/queue/work-item-renderer.js-- node badge created on demand for cards dispatched after their initial render
Version Bumps
Snacks/Controllers/HomeController.csSnacks/Services/ClusterDiscoveryService.cs-- protocol version bump to 2.7.0Snacks/Views/Shared/_Layout.cshtmlREADME.mdbuild-and-export.batelectron-app/package.json/package-lock.json
Full documentation: README.md