Snacks v2.7.0

Automated Video Library Encoder

A minor release that adds per-device concurrency — every node now exposes its hardware encoders (NVIDIA / Intel / AMD / Apple / CPU) as discrete dispatch slots, with user-tunable caps per device, so a single beefy box can run several encodes at once instead of a one-job-at-a-time bottleneck. Pairs with a new Encode Dashboard at /dashboard — a persistent ledger of every completed encode with hero stats, savings-over-time, per-device utilization, codec mix, node throughput, top wins, and a recent-activity table. The release also lands a stuck-job watchdog (per-job and master-wide) that recovers items orphaned in Processing / Uploading / Downloading, an OCR slot lock so parallel encodes can't race the shared Tesseract engine, UTC-correct timestamps across the API/SignalR boundary, and an SPA shell so navigating between the queue and dashboard pages keeps SignalR alive instead of tearing down state.

New Features

Per-Device Concurrency

Hardware as discrete slots -- a worker now reports a Devices[] list (one entry per vendor family it can drive concurrently — nvidia, intel, amd, apple, cpu) instead of a single GpuVendor string. Each device carries a DefaultConcurrency (NVIDIA defaults to 2, QSV/AMF/VideoToolbox to 1, CPU to a fraction of logical cores) and the master treats every (node, device) pair as a slot pool the dispatcher draws from. A node with NVENC + iGPU now actually runs two encodes at once instead of serializing on a single currentJobId.
Master-side scoring with vendor preference + load-spread -- when picking where to send a job, the master scores every free slot across the cluster: codec compatibility (rejects unsupported), the user's HardwareAcceleration preference (auto / none / specific vendor), and a small headroom bonus that lightly prefers the device with more free slots so two NVENCs ahead of you don't both pile onto card 0. CPU is reserved for explicit "Software" jobs and for "auto" jobs on nodes with no hardware encoder at all — under "auto" with detected hardware, CPU is excluded outright so a job won't silently land on a slow software encode while a GPU sits idle.
Per-node device overrides -- the cluster override dialog now has a Hardware Concurrency section listing every detected device on the target node, with a per-device enable toggle and a MaxConcurrency cap. Useful for "let this iGPU encode but only one job at a time", "disable my AMD card outright while I drive it for gaming", or "raise NVIDIA from 2→3 on the box that can actually take it". Saved overrides push to all clients via SignalR so chip rendering reflects the change immediately, not on the next reload.
CPU is never a chip, always a fallback, pinned to 1 -- the per-device chip strip on each node card shows hardware encoders only (NVIDIA 0/2, INTEL 1/1); CPU is treated as the implicit fallback and doesn't earn a chip — it would just appear on every card and dilute the signal. CPU is also exempt from per-node overrides — it's hidden from the override dialog and pinned to a single concurrent encode regardless of any saved MaxConcurrency value, since CPU exists only as a destination for explicit "Software" jobs and parallel software encodes would just thrash. Active CPU jobs still appear in the per-node active-jobs list (prefixed [cpu]).
Standalone gets the same control panel -- "Edit Hardware Settings" on the local-node card opens the same dialog in standalone mode with only the Hardware Concurrency section visible (4K dispatch rules and encoding overrides only fire during cluster dispatch). A user with a single beefy desktop can now tell Snacks to run 2 NVENC encodes at once without pretending to be a cluster.
Live capacity changes wake the running scheduler -- raising a cap from 1→2 mid-run used to only take effect after the currently running encode finished, because the local scheduler was parked in WhenAny(inflight). The scheduler now races inflight task completions against an explicit settings-change wake signal so the new cap is honored on the next tick. Same for the cluster dispatcher: settings mutations now fire a fresh dispatch pass instead of waiting for the next 2s tick.
Per-job device pinning end-to-end -- the master stamps the chosen DeviceId and effective MaxConcurrency into the worker's JobMetadata envelope; the worker's slot pool grows on demand to honor the master's cap (so a fixed-size semaphore can't 503 a job the master legitimately scheduled under a higher override) and pins the encode to that device family by overriding EncoderOptions.HardwareAcceleration. The same DispatchedDeviceId flows into the encode-history ledger so dashboard analytics attribute the work even after the slot has been released.
Cancel scoping -- killing a single remote job no longer unwinds peer encodes on other slots. Each in-flight encode (local and remote) carries its own CancellationTokenSource so a master-issued cancel only kills that specific job; everything else on the node keeps running.
Legacy single-slot path retired -- the old currentJobId / _currentRemoteJob / _activeProcess fields are gone. The legacy heartbeat shape (currentJobId, progress, completedJobId, receivingJobId) is still emitted alongside the new multi-slot fields (activeJobs[], completedJobIds[], receivingJobIds[]) so older nodes still see something, but the master no longer falls back to single-slot dispatch.

Encode Dashboard (`/dashboard`)

Persistent encode-history ledger -- a new EncodeHistory SQLite table records one row per completed encode (or "no savings" outcome where the output was discarded). Captures original/encoded bytes + bytes saved, original/encoded codec, source/output bitrate, duration, encode wall-clock, dispatched device, node hostname/ID, was-remote, is-4K, started/completed timestamps, and an Outcome marker. Append-only by design — failed encodes are not recorded; this is a ledger of completed work, not an error log. New EF Core migration 20260429014928_AddEncodeHistory adds the table.
Hero strip + range picker -- four headline cards (Bytes Saved, Files Encoded, Encode Time, Avg Compression) with sparkline trends, plus a 7d / 30d / 90d / 1y range picker that re-queries every panel.
Per-device + per-node analytics -- a Device Workload stripe shows total work routed to each DeviceId over the window (NVIDIA / Intel / AMD / Apple / CPU). A Node Throughput leaderboard ranks every node in the cluster by completed-encode count and bytes saved so you can see whether the new GPU is actually pulling its weight.
Codec mix donut + savings-over-time + recent + top-savings -- a codec donut surfaces how much of the library has been migrated off h264, a continuous daily savings line chart with empty-day backfill, a recent-encodes activity table, and a "biggest wins" leaderboard ordered by absolute bytes saved.
Worker dashboard transparently proxies to master -- workers in node mode have an empty local ledger by design (every completed encode is persisted on the master only). When a worker's /api/dashboard/* handler is hit, it proxies to the master's /api/cluster/dashboard/* mirror over the cluster shared-secret channel and streams the JSON response back verbatim. The dashboard's frontend never has to know which side it's talking to. Falls back to an empty payload (200) when the master is unreachable so the chart renderer draws an empty state instead of erroring.
Clear dashboard data -- new "Clear Dashboard Data" button in Advanced Settings wipes the ledger after explicit confirmation. On a worker the request proxies to the master (the worker's own ledger is empty); on a master/standalone the deletion runs locally. Either way, a SignalR EncodeHistoryCleared broadcast tells every connected client to repaint to zero.

SPA Shell

Navigating between Queue and Dashboard no longer reloads the page -- click Queue or Dashboard in the navbar and the shell intercepts the click, fetches the new route, and swaps #page-content instead of doing a full document reload. The SignalR connection, modals, and cluster dashboard state all survive the navigation. popstate is wired so back/forward also routes through the SPA. External links and modifier-clicked links (Cmd+click etc.) fall through to the browser's default behaviour.
Page lifecycle hooks -- pages register mount / unmount callbacks against their data-page name; the shell drives the lifecycle so timers and SignalR subscriptions cycle cleanly across navigations instead of stacking up.
Shared modals partial -- the Library Browser, Analyze Results, Folder Picker, Settings, and Confirm modals moved out of the queue page and into a layout-level _AppModals.cshtml partial. Every page in the SPA shell can now open them without the modals having to be re-rendered on each route change.
Off-page resilience -- the queue manager and cluster dashboard now no-op their DOM updates when the queue page isn't mounted; the in-memory work-item map and worker list stay current, and the next mount paints them in. Saves a few wasted DOM lookups per SignalR event when the user is staring at the dashboard.

Bug Fixes & Reliability

Job Watchdog

New per-job watchdog inside the encode pipeline -- a 30-second tick alongside every running encode aborts the job if no log line, status change, progress packet, or transfer-progress update has refreshed WorkItem.LastUpdatedAt for 15 minutes. Defends against hangs in pre-encode stages (hardware-accel detection, FFprobe, crop-detect) that wouldn't be caught by FFmpeg's own no-output stall detection because they predate ffmpeg even being launched. Aborting cancels the job's CTS, which unwinds OCR / sidecar extraction / tessdata-download child processes on the same hook the user-issued cancel uses.
Master-wide stuck-item watchdog -- a 30-second tick on the master scans every work item in Processing / Uploading / Downloading for items orphaned past 5 minutes by LastUpdatedAt. Three rescue cases: (a) assigned to a ghost node — the node it was sent to no longer exists in the cluster registry, so requeue via HandleNodeFailureAsync; (b) orphaned local-side — no node assignment, not running in any of the master's active local slots, and not in _remoteJobs, so requeue and clear the DB's remote-assignment marker; (c) stalled remote — tracked in _remoteJobs but no progress for 10+ minutes, so requeue (deferred to 10 min so this check cooperates with — and fires after — the existing 100-second grace counter). First tick deferred 60s to give recovery time to settle; only runs after recovery completes.
LastUpdatedAt is bumped on every sign of life -- status setter, progress setter, transfer-progress setter, and every log line all Touch() the work item so the watchdog doesn't kill jobs that are emitting output but no formal progress ticks (e.g. crop-detect, OCR pre-pass, hardware-accel probing). In-memory only — deliberately not persisted; a master restart re-bases the timestamp from recovery.
Requeue jobs on removed nodes -- when the heartbeat reconciler removes an unreachable node, any remote jobs still assigned to it are now requeued through HandleNodeFailureAsync. Without this, jobs whose owning node disappeared lingered in _remoteJobs forever — a permanent orphan state that even the watchdog couldn't see because the items still had a _remoteJobs entry.
Stale _activeUploads entries no longer silently drop dispatches -- a duplicate-dispatch guard added in v2.5 had a hole: when the prior dispatch attempt aborted but didn't clean up, the next attempt would see the stale _activeUploads entry and return without requeuing — orphaning the work item with no node assignment, no _remoteJobs entry, and not on the queue. The guard now distinguishes "real concurrent dispatch" (item is in _remoteJobs → skip silently, the in-flight wins) from "stale entry" (item is not in _remoteJobs → clear the entry and requeue so the next tick can retry).
Idle-grace tells the worker to kill its straggler before requeuing -- when the master decides a remote node has gone idle on a job (3 → bumped to 10 grace heartbeats now, so transient SignalR blips don't cost the job), it now issues a DELETE for the job ID against the worker before requeuing. Without this, a confused worker that's still encoding silently could double-process the same job after the master sent it elsewhere.

Original-Language Pre-Resolution

Master resolves Sonarr/Radarr lookup before dispatch -- the KeepOriginalLanguage option needs to talk to the master's configured Sonarr/Radarr instances and match the source path against a media root. Workers can't run that lookup themselves: their workItem.Path is a temp upload path that doesn't match any configured root, and the folder-name fallback hits a UUID job dir. The master now pre-resolves the original language against the real source path and merges it into AudioLanguagesToKeep / SubtitleLanguagesToKeep before shipping options to the worker, then disables KeepOriginalLanguage on the clone so the worker can't re-attempt the lookup against its temp path.

OCR Slot Lock

Tesseract is now serialised at the movie level on multi-slot nodes -- multi-slot encoding can run several jobs in parallel on a single node, but Tesseract's engine state isn't safe to drive concurrently and the OCR pipeline runs one cue at a time anyway. Two parallel encodes hitting the shared engine cache produced cross-job state corruption and interleaved log output. NativeOcrService now exposes a node-wide AcquireOcrSlotAsync that the subtitle-extraction service holds for the full bitmap pass of a movie — a parallel encode's OCR work waits its turn behind a "waiting for OCR" log line instead of racing on the engines. Text streams are still extracted by ffmpeg directly and skip the lock.

UTC Timestamps Across the API/SignalR Boundary

No more "-18000s ago" labels -- SQLite stores DateTime as TEXT and EF Core hands values back with DateTimeKind.Unspecified. The default System.Text.Json output for those values has no timezone marker, so new Date() in the browser interprets them as local time — the dashboard's relative-time labels then drifted by the user's timezone offset (e.g. "-18000s ago" for a CDT/UTC five-hour gap). A new UtcDateTimeConverter / NullableUtcDateTimeConverter pair is wired into both the MVC JSON pipeline and the SignalR JSON protocol to coerce every DateTime to UTC ISO-8601 with the Z suffix on the wire.

Worker-Local UI Cleanup

No more synthetic "Uploading 100%" stuck-state -- when an autonomous-encoding accept moved into the OCR pre-pass, the synthetic 100% Uploading frame the controller broadcast at receive time was the last broadcast on the worker's hub for that job until encode completion — the worker's local UI sat at "Uploading 100%" for the duration. The accept path now broadcasts a clean Processing/Encoding handover frame so the card transitions out of the upload state immediately. The reject path broadcasts a Cancelled frame so the orphan card actually disappears from the UI on rejection instead of sitting permanently at 100%.
Worker-local broadcasts no longer mislabel themselves as "remote" -- the assignedNodeName: "master" field was being sent on transfer-progress and download-progress broadcasts that fire on the worker's own hub, where the work item is being processed locally. The "Processing on remote node X" badge would then appear on a card the worker is encoding itself. The field is now omitted from worker-local broadcasts.
Node badge appears on cards dispatched after their initial render -- the badge update path used to require a pre-existing .badge.bg-secondary element on the card; cards that started without an assignment and got dispatched mid-render had no badge until a full reload. The renderer now creates the badge on demand and inserts it next to the status badge in the same row the initial layout uses.

Files Changed

Per-device concurrency

Snacks/Models/HardwareDevice.cs (new) -- HardwareDevice (id, display name, supported codecs, encoders, default concurrency, isHardware) and ActiveJobInfo (per-job heartbeat snapshot)
Snacks/Models/NodeSettings.cs -- new DeviceSettings map (DeviceConcurrencySetting per device id) for per-node enable/disable + max-concurrency overrides
Snacks/Models/ClusterNode.cs -- Capabilities.Devices[]; legacy ActiveWorkItemId now derived from ActiveJobs[] first entry
Snacks/Models/JobMetadata.cs -- new DeviceId + DeviceMaxConcurrency fields shipped from master to worker
Snacks/Models/WorkItem.cs -- new DispatchedDeviceId (captured at dispatch time for ledger attribution); LastUpdatedAt + Touch() (watchdog hooks)
Snacks/Services/ClusterDiscoveryService.cs -- capability advertisement now includes Devices list; Status always reported as Online (master infers Busy from per-device occupancy)
Snacks/Services/ClusterService.cs -- per-(node, device) slot pools; codec/vendor/load-spread scoring; CPU-as-fallback gate; EffectiveDeviceCapacity; settings-change broadcast (NodeSettingsChanged); legacy single-slot dispatch path retired
Snacks/Services/ClusterNodeJobService.cs -- per-job ActiveRemoteJob records keyed by job id; per-device DeviceSlotPool that grows on demand to honor master-set caps; one CTS per job
Snacks/Services/TranscodingService.cs -- per-job ActiveLocalJob records (replaces single _activeProcess/_activeWorkItem); local device-slot acquisition; WakeScheduler + WaitForSchedulerProgressAsync so settings changes mid-run take effect immediately
Snacks/wwwroot/js/cluster/cluster-dashboard.js -- per-device chip rendering with effective caps from NodeSettings; cached node-settings updated by NodeSettingsChanged; redraw() for SPA mount
Snacks/wwwroot/js/cluster/override-dialog.js -- new Hardware Concurrency section (per-device enable + max-concurrency); standalone-mode hardware-only variant
Snacks/wwwroot/css/site.css -- .device-chip-mini, .cluster-card-chips, dashboard chart styles
Snacks/wwwroot/js/core/signalr-client.js -- NodeSettingsChanged handler

Encode dashboard

Snacks/Models/EncodeHistory.cs (new) -- ledger row schema
Snacks/Data/EncodeHistoryRepository.cs (new) -- summary, savings-over-time (with empty-day backfill), device utilization, codec mix, node throughput, recent, top savings, ClearAllAsync
Snacks/Data/Migrations/20260429014928_AddEncodeHistory.{cs,Designer.cs} (new) + SnacksDbContextModelSnapshot.cs -- EF Core migration adding the EncodeHistory table
Snacks/Data/SnacksDbContext.cs -- DbSet wiring + indexes for the ledger
Snacks/Controllers/DashboardController.cs (new) -- /dashboard page + /api/dashboard/* JSON endpoints + worker-side proxy to master + clear-history with SignalR broadcast
Snacks/Controllers/ClusterController.cs -- /api/cluster/dashboard/* mirror so workers can proxy in; DELETE /api/cluster/dashboard/history
Snacks/Views/Dashboard/Index.cshtml (new) -- hero strip, charts, device stripe, node leaderboard, recent activity, top savings
Snacks/wwwroot/js/dashboard/dashboard.js (new) -- hand-rolled SVG chart rendering, range picker, panel data fetches
Snacks/wwwroot/js/settings/panels/advanced-panel.js -- "Clear Dashboard Data" wiring with confirm modal

SPA shell

Snacks/Views/Shared/_Layout.cshtml -- #page-content swap target; nav links carry data-spa-link; _AppModals partial included at layout level; main.js promoted to module
Snacks/Views/Shared/_AppModals.cshtml (new) -- shared modal DOM extracted from Index.cshtml (Library, Analyze, Folder Picker, Settings, Confirm)
Snacks/Views/Home/Index.cshtml -- modals removed, page content only
Snacks/wwwroot/js/core/navigation.js (new) -- click interceptor, fetch-and-swap, popstate, page mount/unmount lifecycle
Snacks/wwwroot/js/main.js -- queue + dashboard pages registered with the shell; clusterDashboard.redraw() on queue mount
Snacks/wwwroot/js/queue/queue-manager.js -- bail when queue containers aren't in the DOM (off-page); throttled refresh after status transitions to fill freed slots
Snacks/Controllers/DashboardController.cs / Snacks/Controllers/ClusterController.cs -- shared layout means the dashboard page is now reachable via SPA fetch as well as direct nav

Job watchdog

Snacks/Services/ClusterService.cs -- master-wide RunStuckItemWatchdogAsync (30s tick); requeue on node removal; stale-_activeUploads distinguishing real-concurrent vs orphan; idle-grace bump 3→10 with worker DELETE before requeue
Snacks/Services/TranscodingService.cs -- per-job 30s watchdog inside ConvertVideoAsync that aborts on 15min silence; LogAsync touches LastUpdatedAt
Snacks/Models/WorkItem.cs -- LastUpdatedAt (touched on status/progress/transfer-progress/log) + Touch() helper

Original-language passthrough

Snacks/Services/ClusterService.cs -- new CloneOptionsForWorkerAsync resolves KeepOriginalLanguage against the master's integrations before shipping to the worker; merges into keep-lists; disables the flag on the clone

OCR slot lock

Snacks/Services/Ocr/NativeOcrService.cs -- node-wide _ocrSlot semaphore; AcquireOcrSlotAsync with holder-label and queued-behind log line; OcrSlotReleaser IDisposable
Snacks/Services/SubtitleExtractionService.cs -- sidecar pass acquires the slot lazily on the first bitmap stream and releases after the loop; OCR-mux pass holds the slot for the full bitmap pass

UTC datetime serialization

Snacks/Json/UtcDateTimeConverter.cs (new) -- UtcDateTimeConverter + NullableUtcDateTimeConverter coerce DateTimeKind.Unspecified to UTC and emit ISO-8601 with Z suffix
Snacks/Program.cs -- both converters registered on the MVC JSON pipeline and the SignalR JSON protocol

Worker-local UI cleanup

Snacks/Controllers/ClusterController.cs -- accept path broadcasts Processing/Encoding handover; reject path broadcasts Cancelled; assignedNodeName: "master" removed from worker-local broadcasts
Snacks/wwwroot/js/queue/work-item-renderer.js -- node badge created on demand for cards dispatched after their initial render

Version Bumps

Snacks/Controllers/HomeController.cs
Snacks/Services/ClusterDiscoveryService.cs -- protocol version bump to 2.7.0
Snacks/Views/Shared/_Layout.cshtml
README.md
build-and-export.bat
electron-app/package.json / package-lock.json

Full documentation: README.md

derekshreds/Snacks v2.7.0 Snacks v2.7.0 on GitHub