v1.5.0-rc.20
[1.5.0-rc.20] — 2026-05-14
Added
-
Fleet-aggregate stats subsystem (commits
feature/v1.5-rc17). NewContainerStatsAggregatorpolls each locally-monitored container once per tick (default 10 s) and computes a fleet-wideContainerStatsSummary(total CPU%, total memory, top-N rows). Two new endpoints —GET /api/v1/stats/summaryandGET /api/v1/stats/summary/stream— expose the current snapshot and a live SSE feed; the dashboard Resource Usage widget now consumes the SSE stream directly, fixing the regression (introduced in rc.13 by the?touch=falseworkaround) where the widget showed zeros because the per-container cache was never warmed. The legacyGET /api/v1/containers/statsendpoint and the client-sidesummarizeContainerResourceUsagerollup have been removed. -
Per-container update locks (commit
761fb834). New keyedLockManagerprimitive inapp/updates/lock-primitives.tsreplaces the module-levelpLimit(1)that was serialising every container update across the entire process. Lock keys are derived per container (and per compose project forDockercompose), so two unrelated containers can now pull and recreate concurrently while two services in the same compose project still serialise correctly. The lock primitive is its own pure-logic file with full unit tests; the docker trigger and compose subclass derive the lock key set via a newgetUpdateLockKeys(container)method. -
Restart recovery for queued and pulling updates (commit
00788b13). Startup reconciliation inapp/store/update-operation.tsis now selective:status=queuedoperations stay queued for the recovery dispatcher to pick up, andphase=pullingrows are reset toqueued(pull is idempotent). All other in-progress phases —prepare,renamed,new-created,old-stopped,new-started,health-gate,rollback-*— remain marked failed because they leave inconsistent state that an operator should review. A newapp/updates/recovery.tsmodule runs once afterregistry.init(), re-resolves trigger and container for each queued operation, and dispatches them through the existing fire-and-forget pipeline. Operations whose container or trigger no longer exists are marked failed with an explanatorylastErrorso they don't sit in the queue forever. -
Notification outbox with retry and dead-letter queue (commits
a9561d93,7d2ef6eb,b215d295,ce26bece). NewnotificationOutboxLokiJS collection (app/store/notification-outbox.ts) and matchingapp/notifications/outbox-worker.tsbackground worker provide durable retry semantics for notification dispatch.Trigger.dispatchContainerForEventnow optimistically callsthis.trigger(container)directly; on failure, the delivery intent is persisted to the outbox and the worker retries on a periodic drain with exponential backoff + jitter. After a configurable number of failed attempts (default 5) entries transition to the dead-letter queue; delivered and dead-letter entries are auto-purged past TTL (default 30 days). New/api/notifications/outboxREST surface lets operators list entries (?status=filter), retry from the DLQ (POST /:id/retry), or discard (DELETE /:id). New base methodTrigger.dispatchOutboxEntry(entry)is the worker's delivery hook; subclasses can override. -
Notification outbox UI (commit
feature/v1.5-rc17). NewNotification outboxpage (route/notifications/outbox, nav under Settings) consumes the existing/api/notifications/outboxREST surface so operators can review the dead-letter queue, retry stuck deliveries, or discard dead entries from the UI. Status tabs (Dead-letter / Pending / Delivered) keep the same query-param convention (?status=) used by the rest of the list views; counts per bucket render as inline badges.Retryis shown only on dead-letter rows;Discardis available everywhere. Newui/src/services/notification-outbox.tsmirrors the API exactly. -
Cancel queued or in-flight updates (commits
4b79e3ac,79487115).POST /api/operations/:id/cancelnow accepts both queued and in-progress operations. Queued ops are marked failed immediately withlastError: 'Cancelled by operator'(200). In-progress ops are flagged via a newcancelRequestedfield on the operation row and the endpoint returns202 Accepted; the lifecycle observes the flag at three safe checkpoints — after pull and before rename (clean abort, no rollback needed), before creating the replacement container, and before stopping the old container — so cancellations either short-circuit cleanly or fall through the existing rollback path that renames the container back. The rollback path tags the rollback reason ascancelledso the audit trail distinguishes operator cancellations from real failures. Already-terminal ops still return409 Conflict. The container row's Cancel action is now visible for both queued and in-progress operations; the toast says "Cancelled" for the immediate path and "Cancellation requested" for the in-progress path. -
Global concurrent-update cap (
DD_UPDATE_MAX_CONCURRENT). New counting semaphore (Semaphoreclass inapp/updates/lock-primitives.ts) provides a configurable global gate on how many update lifecycles run simultaneously across the entire controller instance. Default0= unlimited — no behavior change on upgrade. Positive integerNmeans at most N updates run concurrently. Negative or non-integer values fail fast at startup with a descriptive error. The cap layers on top of the existing per-container and per-compose-project locks; it does not replace them. Operations waiting on the cap remain inqueuedstatus. Scope is per controller instance; distributed agent hosts have independent counters by design. Self-update operations bypass the global cap — they take per-container locks but never wait on the global semaphore, preventing a full update queue from starving an admin-triggered self-update. -
Health-gate SSE heartbeat (
DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS). While drydock waits for a new container to pass its health gate, the SSE pipeline was silent for the entire wait — the UI received no events betweenphase: 'health-gate'andphase: 'health-gate-passed'. For images with long healthcheck intervals (e.g. vaultwarden's 60 s check) this meant the UI relied on REST reconciliation poll if the SSE connection was interrupted during that window. A periodic heartbeat now re-emitsphase: 'health-gate'at a configurable interval (default 10 s).DD_UPDATE_HEALTH_GATE_HEARTBEAT_MS=0disables heartbeats entirely; values below 1000 ms or non-integers fail fast at startup. The heartbeat cancels immediately when the wait resolves in any direction (success, timeout, or unhealthy), ensuring the terminal event is never preempted. No new phases are introduced; existing UI consumers accept the re-emitted event unchanged.
Changed
- Crowdin export configuration aligned with app locale folders. Crowdin now maps language codes such as
es-ESinto the locale folder IDs the UI actually loads (for examplees) and only downloads languages exposed in the locale picker. A new config guard test prevents future sync PRs from adding ignored region-coded folders, and the new auto-hidden-columns tooltip source avoids English-onlycolumn(s)punctuation that triggered Crowdin QA warnings for translated strings. - Shared DataTable column sizing overhaul (commit
596adcd2). All first-party table surfaces now route through the sharedDataTablecomponent with numeric sizing metadata (size,minSize,maxSize,flex,priority,overflow,autoSize) instead of ad-hoc string widths. Tables render a stable<colgroup>, keep actions in an independent sticky/fixed managed column, support pointer and keyboard column resizing, double-click autosize visible content, and persist manual/autosized widths per table via browser preferences. Containers uses the sizing data for responsive auto-hide math so narrow widths hide lower-priority metadata instead of shrinking columns below readable minimums. The config webhook endpoint list was migrated too, and a new architecture test fails if raw<table>markup or string column widths reappear inui/src. - Watcher dispatch is fully fire-and-forget (commit
5cfa2286).Trigger.runUpdateAvailableSimpleTriggerandrunAcceptedUpdateBatchpreviously awaitedrunAcceptedContainerUpdates, so a slow update lifecycle stalled the next watcher tick. The API path was already fire-and-forget; the watcher path now matches. NewdispatchAccepted(accepted)helper centralises thevoid runAcceptedContainerUpdates(...).catch(() => undefined)pattern across all four call sites. Per-operation failures are still terminalised inside the lifecycle handler, so swallowing the dispatch chain's rejection loses no observable information. - Security alert emit is non-blocking inside the update lifecycle (commit
6c5198dd).SecurityGate.maybeEmitHighSeverityAlertwas awaited insideevaluateScanOutcome, which itself runs inside the update lifecycle's critical path. With multiple notifiers registered for security alerts, the await chained sequential provider calls (SMTP, Slack, HTTP, MQTT, webhook) into the lifecycle, multiplying latency before pull/recreate could even start. The function now returns synchronously after firing the emit; notification dispatch semantics from the caller's perspective are unchanged (the same handlers run in the same order viaemitOrderedHandlers), the lifecycle just no longer waits. - "Update started" toasts renamed to "Update queued" (commit
79487115). Dispatch is fire-and-forget — by the time the toast renders, the lifecycle hasn't started, the operation is just queued. The text now matches what actually happened:"Update queued: {name}","Force update queued: {name}","Queued update(s) for N container(s)". Function names inui/src/utils/container-update.tsare unchanged so call-site churn is zero.
Fixed
-
#342 follow-up — Hybrid
image:tag@sha256:digestrefs no longer trigger a spurious "Cannot get a reliable tag" warning when Docker'sRepoTagsis empty. When a container is pulled by digest pin, Docker'sImageInspect.RepoTagsis often empty even though the deploy ref written in compose (docker.io/valkey/valkey:9@sha256:...) carries an authoritative tag.Docker.resolveImageNamewas diverting toresolveDigestOnlyImagein that case, logging a misleading warning and discarding the explicit:9tag — which cascaded intoimage-comparison.tsemittingNo Registry Provider foundbecause the digest-only fallback could lose registry-domain context. The function now detects hybrid refs (a:for the tag before@sha256:) and parses them directly viaparse-docker-image-name, which already returns both tag and digest correctly; only true digest-only refs (image@sha256:...with no tag) fall through to the existingRepoTags/resolveDigestOnlyImagepath. The user-visible result for the Immichvalkeyandpostgresdigest-pinned containers reported on #342 is that the misleading warnings stop firing on every cron cycle and the registry router resolves them to Docker Hub / GHCR cleanly. -
#342 follow-up — Registry env-var naming convention now explained in the registries index. The per-registry doc tables show
DD_REGISTRY_<TYPE>_{REGISTRY_NAME}_<KEY>as a placeholder convention, but{REGISTRY_NAME}is not self-documenting. A new "Naming registry instances" callout oncontent/docs/current/configuration/registries/index.mdxexplains that the placeholder is a user-chosen label (PUBLIC,PRIVATE,WORK, anything) that namespaces multiple instances of the same registry type, with a concrete two-instance example (HUB_PUBLIC_*+GHCR_PRIVATE_*). -
#342 follow-up — Watcher cron callout explains rate-limit interaction with hourly polling. A new callout on
content/docs/current/configuration/watchers/index.mdxexplains why the default0 * * * *(hourly) can saturate anonymous Docker Hub limits with many-container deployments, points to the v1.5.0-rc.20 retry + token-bucket mitigations, and recommends0 */6 * * *or0 1 * * *for high-container-count installs that don't need near-real-time detection. -
#356 — Containers list Version column no longer hides the human-readable tag for floating-tag + digest-watch images. The rc.18 ship of #342 (digest-pinned containers were rendering
currentTag → newTagas two identical truncatedsha256:strings because both fields came from the same pinned digest reference) replaced that with a realformatShortDigest(localValue) → formatShortDigest(remoteValue)pair whenever the update kind was'digest'. That correctly addressed digest-pinned containers but cast too wide a net: containers that pull a floating tag (:latest,:v8.13.2,:compose-X-version-9.0.1) withimage.digest.watchenabled also surface askind === 'digest'whenever the registry rebuilds the image — and there the tag is meaningful, the user expects to see it, and replacing it with twosha256:…hashes obscured every linuxserver/* and similar GHCR-hosted row on the Containers table for users like the reporter. A new derivedisDigestPinned: boolean(added to the UI Container type, mapped fromimage.tag.value.startsWith('sha256:')— same heuristic the watcher uses atapp/watchers/providers/docker/image-comparison.ts:240) now gates the digest-pair render: digest-pinned containers continue to show thesha256:abc… → sha256:def…pair the #342 fix intended, while floating-tag + digest-watch containers render the tag once (no arrow, sincecurrentTag === newTagfor digest-only updates) with the digest pair surfaced on the cell tooltip. The two container-detail panels gain a small muted "Digest:" subline showing the actual digest transition so the underlying change is still visible without dominating the version row. Applies symmetrically to all five UI sites that switched in rc.18: the Containers table version cell, card body, and list-accordion image subtitle, plus the side and full-page detail panels. -
#357 — Transient Trivy failures no longer wipe previously-stored scan history. The scheduler used to overwrite
container.security.scanunconditionally; when Trivy hit a hiccup (daemon timeout, registry blip, missing socket)mapToErrorResultreturned an emptystatus:'error'record and that result silently replaced every priorpassed/blockedentry on the next cycle. The scheduler now keeps the existing record when the new result is an error and there is something to preserve, capped at a 7-day max-staleness window so a persistently broken pipeline eventually surfaces in stored state instead of locking in a stalepassedindefinitely; the UI still sees the live error via SSE so operators are not left in the dark either way. Error results are also no longer indefinitely re-spawning fresh Trivy invocations —scanImageWithDedupnow uses a 15-minute error retry floor so under aggressive cron and a registry outage, retries are bounded to once per 15 minutes per digest instead of once per scheduler cycle. -
#355 —
update-failednotifications no longer drop silently when the controller's container store races against post-failure prune.UpdateLifecycleExecutornow carries the failing container on theupdate-failedpayload, andTrigger.handleContainerUpdateFailedEventacceptspayload.containeras the primary source with the store lookup as fallback — mirroring the existingupdate-appliedsymmetry. Previously, when the store lookup missed (post-failure prune timing, agent push race, watcher/name re-key) the trigger silently debug-logged "No container found for update-failed event => ignore" and the user got no out-of-band signal that the update had failed. The event payload types are now strictly typed (container?: Containeronly — noRecord<string, unknown>escape hatch), the three duck-typing payload-extraction blocks across the trigger handlers collapsed into a single directpayload.container || lookup(...)pattern, and the agent SSE relay stripscontainerfromdd:update-applied/dd:update-failedevents before transmit (mirroring the controller-side sanitizer atapp/api/sse.ts) so the full container blob — vulnerabilities, env entries, labels — no longer goes over the wire on every event. -
#355 / #357 — Trivy scan and SBOM no longer require
/var/run/docker.sockinside the drydock container. Regression introduced in rc.17 forced Trivy to use only the local Docker daemon as image source. Operators running thetecnativa/docker-socket-proxytopology (documented inREADME.md), rootless Docker, podman, or remote watchers saw every gated update fail post-pull withdial unix /var/run/docker.sock: connect: no such file or directory, and previously-stored scan results were also overwritten with empty error records when the scheduler fired. The forced--image-src dockerflag is removed; Trivy now uses its default source order (docker, containerd, podman, remote) and falls back to a registry pull when the local daemon isn't reachable. Operators who know their topology is socket-less and want to skip the docker/containerd/podman probe attempts can setDD_SECURITY_TRIVY_IMAGE_SRC=remote(any value Trivy accepts works, including comma-separated lists likeremote,docker); when unset Trivy auto-detects. Pre-rc.17 behaviour is fully restored. -
#290 — "Updated Successfully" toast no longer drops intermittently after a container update. Terminal-update toasts previously fired from three independent handlers (
ContainerUpdateDialog,useContainerSsePatchPipeline,ContainersGroupedViews), each gated on different state — any one ofoperationIdmissing on the wire, the view being unmounted, or the per-batch dependency onContainersGroupedViewsbeing mounted would silently swallow the toast. A newuseGlobalUpdateToastcomposable mounted once atApp.vueis the single source of truth: listens fordd:sse-update-applied/dd:sse-update-failed/dd:sse-batch-update-completed(viaglobalThisevents), survives route navigation, dedupes byoperationIdover a 5-minute window matched to the SSE replay buffer, and waits for the matchingdd:sse-container-added/updated/removedevent before firing so the toast appears the moment the row's "Updating" badge clears (not on a hardcoded delay). A 5s safety fallback fires the toast for cases where no row event arrives (remote agents, deleted containers). Backend stops coercing missingoperationId/containerIdto''so the wire format is honest about what's optional. BrowserEventSourcecannot set custom headers on reconnect, soLast-Event-IDis now also accepted via query param (?last-event-id=) and validated against the canonical<bootId>:<counter>shape at the request boundary. Defensive hardening: module-level singleton guard so a stray child-component install can't double-register listeners, FIFO-bounded dedup map (cap 500) defends against runaway operation throughput, and HTML angle brackets are stripped from raw error text before i18n interpolation. -
#289 — Container row state regression after recreate. Same root cause as #290: per-view SSE handlers dropped events when the view was unmounted or the payload omitted
operationId. The row-state pipeline (useContainerSsePatchPipeline) is now decoupled from toast emission so it can focus solely on patch application; toast firing lives exclusively inuseGlobalUpdateToastatApp.vue. -
#291 — Dashboard fired "updated" toast while the "updating" toast was missed. The dashboard had its own duplicate SSE-terminal-toast handler that competed with (and sometimes pre-empted) the global one. The dashboard SSE handler now does row-state hold/ghost management only; toast emission is owned exclusively by the global handler at
App.vue. -
Release security gate restored before rc.18. Patched transitive npm dependencies flagged by OSV during the post-merge main CI run:
fast-urinow resolves to3.1.2in app/UI lock domains, andfast-xml-buildernow resolves to1.2.0through the app/e2e XML parser override path. This clears the Qlty security gate without changing runtime behavior. -
#345 — Host names with numeric suffixes no longer lose the differentiating character in the Containers table. The rc.18 table pass already replaced the old host badge with plain text, and the host column now has a wider default/readable floor so names like
servicevaultandservicevault2remain distinguishable at desktop widths. Narrow layouts still auto-hide the host column into secondary metadata instead of shrinking it below readability. -
#340 - Self-update no longer preserves stale Drydock version metadata. The self-update clone path now drops image-inherited environment variables and labels from the old image when the target image changed them, so replacement containers inherit the new image's
DD_VERSIONandorg.opencontainers.image.versioninstead of reporting the previous release after an automatic update. Operator-supplied environment variables and labels remain preserved. -
One slow notifier no longer stalls every container update (commit
761fb834). The module-levelpLimit(1)introduced in v1.5 to serialise concurrent updates was the root cause behind reports of stuck queues whenever a single notifier hung — every update on every container was waiting for the same single slot. Per-container locks remove the global bottleneck while still preventing a container from being updated twice in parallel. -
Process restart no longer wipes the queued update list (commit
00788b13). Previously every active operation was force-failed on startup. Queued and pulling-phase operations now resume; only operations mid-destructive-step (renamed/new-created/old-stopped/etc.) are surfaced for operator review. See the matching addition above. -
Transient notifier outages no longer drop alerts (commit
b215d295). Direct dispatch failures land in the outbox and are retried with exponential backoff + jitter; only persistently failing entries (default: 5 failed attempts) move to the dead-letter queue. Crash-during-dispatch is the only remaining loss window. -
dd.registry.lookup.imagelabel no longer corrupts deploy identity (commit594a07e8, fixes #336). The lookup label is intended to redirect tag/manifest queries to a different image (e.g. a private mirror runningmyreg/nextcloudlooking up tags from Docker Hub'slibrary/nextcloud), butnormalizeContainerwas assigning the substituted view back onto the container record so the deploy identity — image name and registry URL — was silently rewritten to the lookup target. Compose-file rewrites and container recreates then deployed the wrong image.normalizeContainerno longer overwritesimage.name/image.registry.url; a newgetImageForRegistryQueryhelper applies the substitution + provider URL normalisation only at each query boundary (getTags,getImageManifestDigest,getImagePublishedAt). Un-prefixed images (nginx:1.0) now default todocker.iofor the registry URL;Hub.getImageFullNamestrips the prefix for clean display. -
Password-manager autofill restored on login form (commit
3abe2fa6, fixes #335). Username and password inputs lost theirnameandidattributes during the v1.5 plain-HTML rewrite. Browser-native autofill kept working viaautocomplete=, but credential managers that rely onname/idheuristics (Dashlane in Chrome, among others) could no longer identify the username field. Both attributes are restored. -
security-scan-skippedaudit row now fires when the gate is disabled globally (commitae24e0a9). PreviouslyrecordSecurityAudit('security-scan-skipped', …)only executed when the per-container labeldd.security.gate=offwas set. WithDD_SECURITY_GATE_MODE=offconfigured globally, scans were silently skipped with no audit trail — an operator reading the audit log had no indication that the gate was suppressed.getGateDisabledAuditDetailsnow selects the appropriate human-readable reason from whichever off-state is in effect and the audit call is unconditional. -
Registry URL normalization restored on container record after regression in
594a07e8. Removing thenormalizeImagecall innormalizeContainerto fix deploy-identity corruption (issue #336) inadvertently leftimage.registry.urlin its raw user-config form (docker.io) instead of the API base URL form (https://registry-1.docker.io/v2). All registry HTTP callers,getImageFullName, the Prometheusimage_registry_urllabel, and the Docker trigger's self-update helper expect the normalized form. The URL rewrite is now restored for containers where the deploy image itself matches the provider; harbor-mirror containers (where a lookup label diverts to a different registry) correctly retain their deploy URL unchanged. -
image.namecanonicalization also restored after partial fix in4e06329b. The prior fix only restoredimage.registry.url;image.namewas still not rewritten through the provider'snormalizeImage, so Docker Hub containers with un-prefixed names (e.g.nginx) keptimage.name = "nginx"instead oflibrary/nginx. This caused the Prometheusimage_namelabel to emit the bare name, breaking e2e scenarios that assertimage_name="library/nginx". ThenormalizeImageresult now also assignsimage.namein the deploy-match branch; the cross-registry mirror branch (harbor → Hub lookup) is unaffected and still preserves the deploy name. -
Stack/group view no longer collapses to ungrouped mid-update when containers are recreated. When a Docker action recreates a container it receives a new container ID; the group-membership map was keyed only by the original ID, so the post-recreate lookup missed and every container fell into
__ungrouped__. With a two-container stack the single-member-flatten rule then removed both group buckets entirely.loadGroups()now indexes the map under id, name, AND displayName, so the existingmap[container.name]fallback in the lookup actually resolves after a recreate.
Added
- Chinese (Simplified) UI (PR #331 by TianMiao, commits
8f3286b7,b97944dc). Chinese is the first non-English locale to ship in drydock. 14 namespace JSON files underui/src/locales/zh-CN/cover the full UI surface — dashboard, containers, agents, config, list views, container components, app shell, auth, logs, and shared components (~1,100+ strings). A latent bootstrap bug (buildMessagesmap initialized only foren, causingObject.assign(undefined)crashes for any second locale) was fixed as part of this work, along with 112 translation gaps that arose because the locale files were authored before several new UI strings landed in rc.17. The i18n framework loaded on the existingimport.meta.globauto-discovery; no additional wiring was needed. - Chinese (Traditional) UI (PR #344 by TianMiao, commit
2e60f1e7). The Chinese catalog is now split into BCP-47 locale folders (zh-CNandzh-TW) so operators can choose Simplified or Traditional Chinese from Config > Appearance. The Traditional catalog ships with the same namespace coverage as Simplified Chinese, including the rc.18 appearance, outbox, table, and preference strings. - Multi-select event-type filter in audit log (commit
5e2d0c70, discussion #332). The audit log's event-type filter was a single-value<select>, so operators wanting to view bothupdate-appliedandupdate-failedin the same session had to query them separately and mentally merge the results. The filter is now a checkbox dropdown supporting any combination of event categories simultaneously. The backend already accepted?actions=(plural, comma-separated) — this wires the missing UI half. Back-compat: existing?action=foobookmark URLs parse as a single-element selection without requiring a migration.
Changed
app/updates/locks.tsrenamed toapp/updates/lock-primitives.ts(commit4c506d21).locks.tswas a misleading filename for a module that contains general-purpose synchronisation primitives (Semaphore,LockManager) not tied to the updates subsystem. Existing CHANGELOG entries above and theapp/updates/update-locks.tsconsumer have been updated to the new path.HookExecutorandRollbackMonitornow delegate label-to-integer parsing to the project-wideparseEnvNonNegativeIntegerhelper instead of inline NaN/zero guards;getErrorMessageinline copies inpost-start-livenessandrequest-updateare consolidated to the sharedutil/errorimport.
Security
- Credential redaction expanded to
x-registry-auth,*-token, andapi-keyfields (commit4417ce25). The existingscrubAuthorizationHeaderValueshelper only redactedAuthorizationheader values. Structured error payloads inupdate-failedSSE events could still leak registry auth tokens, API keys, and OAuth bearer strings embedded under other field names. A second regex pass now redactsx-registry-auth, any field matching*-token, andapi-key/api_keyvalues before the payload leaves the server. Theupdate-failedSSE path was the primary exposure vector; operator-visible diagnostic strings no longer leak registry credentials in production environments.
Performance
- Binary indices and drain concurrency cap for notification outbox (commit
9393253e).findReadyForDelivery— the hot path that runs on every outbox drain cycle — querieddata.statusanddata.nextAttemptAtwith standard LokiJS indices, causing full-collection scans as the outbox grew. Switching those two fields to binary indices gives O(log n) B-tree lookups.OutboxWorkergains amaxDrainConcurrencyoption (default 10) backed by aDrainSemaphoreso a burst of ready entries cannot flood the trigger pipeline with unbounded parallel deliveries.store/utilnormalises abinaryIndicesoption oninitCollectionso collections receive correct field registration at creation time.
Tests / CI
- Reconciliation terminal-hold toast assertions use
maxIdBeforepattern. Two tests inContainersView.spec.tswere flaking on CI becausevi.advanceTimersByTime(1500)expired pre-existing toasts, loweringtoasts.value.lengtheven though no new toasts were added. ReplacedcountBefore = toasts.value.length/toBe(countBefore)with themaxIdBefore/filter(t.id > maxIdBefore)pattern already used elsewhere in the file.
Added (rc.20)
-
Registry 429 / 503 retry with Retry-After and per-host token bucket (commit
ffd1b57b, #342). A newwithRetryhelper wraps every registry HTTP call: on 429 or 503 it honors the upstreamRetry-Afterheader (both seconds and HTTP-date forms), then falls back to exponential backoff (1 s / 2 s / 4 s, capped at 60 s), up to 3 retries. A new per-host token bucket (callRegistryacquires a slot before each call) prevents the watcher from self-inflicting rate limits during a large cron cycle — GHCR and Docker Hub get conservative defaults (2 req/s burst-5; 1 req/s burst-3 forapi.github.com; 5 req/s burst-10 for everything else). The GHCR PAT fallback fromffd1b57bis now also wired into the GitHub release-notes provider: operators who configured a GHCR token get release notes for free without settingDD_RELEASE_NOTES_GITHUB_TOKEN. Followed up ineca51c4cto cap hostileRetry-Aftervalues, validate HTTP-date format, throw on invalid token-bucket rate, and extend rate limiting to the bearer-token auth call (previously bypassed the bucket). -
Release notes inline popover (commit
09475fa6). The release-notes icon on container rows now opens an inline popover showing both the current and the available-version release notes side by side, with expand/collapse per panel. The external link behavior remains as a fallback when no release notes are available from the backend.
Fixed (rc.20)
-
#289 follow-up — Stable
identityKeyfor same-name siblings (commit36ce1ac6). The rc.20 hold-matching fix at02433a02added anidentityKeydiscriminator to the UI, but the backend never emitted one — so two containers with the same name on the same host (the pi-hole scenario from #289) both synthesizedagent::watcher::nameand the hold still bled across them. The backend now derives a stableidentityKeyonvalidate(): compose-deployed containers useagent::watcher::compose:project/service(distinct between sibling services, stable across recreates); non-compose containers fall back to the legacyagent::watcher::nameform. The UI now passes the backend-emitted key toholdOperationDisplayso holds follow the correct container through a recreate without bleeding to its same-named sibling. -
#289 follow-up — Per-container update-operation history scoped by id (commit
e4b44480).GET /api/containers/:id/update-operationswas fetching history by container name only, causing same-name sibling containers to bleed each other's operation history. The handler now uses the same id-first / name-legacy-fallback pattern already established in the list and request-update paths: query bycontainerId/newContainerIdfirst, then include name-matched ops only when they carry nocontainerId(legacy rows from pre-rc.20 stores), deduped by id. -
#342 follow-up — Registry routing always uses the credentialed instance when one is configured (commit
069274fe). The root cause of the#342rate-limit storm: when a user configured a PAT, drydock was still seeding an anonymous<provider>.publicdefault instance and non-deterministicObject.values().find()routing could pick the anonymous instance — sending all traffic through the unauthenticated rate-limit tier despite a valid token. The router now gives explicit priority to credentialed instances (any non-empty token, password, auth, clientemail, privatekey, accesskeyid, or secretaccesskey). The silent anonymous fallback on 401/403 is also removed; auth rejection now throws an actionable error that surfaces as the existing "Check failed" badge in the UI. Follow-up112953b2unifies the credentialed-instance check across registry routing and image-comparison and treats whitespace-only credential fields as anonymous. -
#342 follow-up — Label-driven watcher fields re-derived on Docker update events (commit
56579eb7). After adocker compose up -drecreate,updateContainerFromInspectwas updatingcontainer.labelsbut NOT recomputing derived fields that depend on labels (tagFamily,includeTags,excludeTags,transformTags,triggerInclude,triggerExclude). The storedtagFamilystayedundefinedeven whendd.tag.family=loosewas set, causinggetTagFamilyPolicy()to default to'strict'and fire misleading "Strict tag-family policy filtered out…" warnings on every subsequent cycle. A newapplyDerivedLabelFieldsToContainerhelper (wrapping the existingresolveLabelsFromContainerlogic) is now wired into the event-path update so it stays consistent with the full watch cycle. -
#342 follow-up — Hybrid
image:tag@sha256:digestrows show a visible digest delta (commitb40d3db8). For images likeghcr.io/immich-app/postgres:14-vectorchord0.4.3@sha256:bcf6…whereupdateKind === 'digest'butisDigestPinned === false, the previous template rendered only the unchanged tag in the cell body and hid the actual digest transition in a tooltip — users had no visible indicator that anything changed. Table rows now render the tag on top with the short current→new digest pair below (same styling as the pure-digest branch); card view shows "latest" + current/new short digests inline. -
#342 follow-up — Floating semver aliases excluded from greater-than check (commit
0b9eaaf3).isGreaterCandidateTagusedsemver.gte(≥), so floating aliases like3.3and3.3.0(both coerce to 3.3.0) counted as "higher than" each other. For wealthfolio/3.3.0 this caused a spurious "Strict tag-family policy filtered out 1 higher semver tag" warning with no way to silence it, because settingdd.tag.family=loosecouldn't fix a strict-greater failure. The check now requires strictly greater semver in one direction and not-greater in the reverse, so floating aliases drop out of the candidate set entirely. -
SBOM endpoint returns
503instead of500when the security scanner is disabled (commit88c8c066).GET /api/v1/containers/:id/sbomwas returning HTTP 500 with the internal error message when the Trivy scanner was disabled or unconfigured. The feature being intentionally off is not a server error — it now returns 503 with a clear"Security scanner is disabled or misconfigured"message so clients can distinguish "feature off" from "generation failed". -
Icon proxy serves fallback image on upstream CDN timeout or 5xx (commit
c98d6e38). Browser image requests for icons whose upstream CDN fetch fails for non-existence reasons (timeout, DNS failure, 5xx) were returning502to the browser, causing broken-image indicators on container rows. Non-existence failures now route through the existing fallback path and serve the placeholder image instead; non-image clients still receive502 + error JSONso the actual upstream failure is surfaced to API callers. -
ECR stale auth token cache write avoided on concurrent key change (commit
7db3d4cb). When two concurrentfetchPrivateEcrAuthTokencalls raced with a credential rotation between them, the slower call (using the old credentials) could overwrite the cache entry that the faster call (using the new credentials) had already populated. The cache write is now keyed on the credentials snapshot captured at request start; a resolved token is only stored back if the in-flight key still matches the current credentials. -
Registry auth bearer token non-Error rejections preserved in error messages (commit
211623a4). When the token-request axios call rejected with a non-Error value (e.g. a plain string rejection), thee.messageproperty access threwTypeError: Cannot read properties of undefinedand thefailClosedAuthpath received a misleading secondary error instead of the real rejection. The catch block now uses the project-widegetErrorMessagehelper so string rejections, null rejections, and structured non-Error throws all produce readable"token request failed (…)"messages. -
Accepted update dispatch failures now logged (commit
674a0ed8).runAcceptedContainerUpdatesdispatches updates fire-and-forget; when the inner lifecycle threw synchronously (e.g. no docker trigger found after the eligibility check), the rejection was silently swallowed by thevoid …catch()pattern. A warn log now fires with the container name and operation id so operators can diagnose dispatch failures from logs without needing a debugger. -
Truncated release notes body now marked with trailing ellipsis (commit
3a9bd098).truncateReleaseNotesBodywas slicing at the character limit but not appending…, so notification bodies ended mid-sentence with no indication that content was cut. The truncated form is now${body.slice(0, maxLength)}.... -
Row update overlays anchored to first data cell width (commit
4bdb8d65). Thedd-row-updatingoverlay usedwidth: 100%which resolved to the cell width rather than the full table width, leaving the overlay undersized on wide tables.DataTablenow sets a--dd-data-table-row-overlay-widthCSS variable to the measured viewport width and marks the first data cell as the overlay host, so the overlay reliably spans the full visible table regardless of table-layout mode. -
Same-name container update holds isolated to the correct instance (commit
02433a02). The UI hold-matching logic usedcontainer.idalone; when a docker recreate changed the container id, the hold was not released for the pre-recreate id. Added anidentityKeydiscriminator so the hold follows the container's stable identity through id changes while still remaining isolated between same-named siblings on the same host (backed by the backendidentityKeyadded in36ce1ac6). -
Security view container chooser traps keyboard focus (commit
e98603c1). When the Security view's multi-container chooser popover opened, focus remained on the trigger button and Tab moved through the background page instead of cycling within the modal. Added standard focus-trap: focus is moved to the first focusable element on open, Tab/Shift+Tab cycle within the popover, Escape returns focus to the previously-focused element. -
Legacy
xlink:hrefSVG attributes stripped by icon sanitizer (commit0309bacb). The allowlist-based SVG sanitizer was stripping modernhrefattributes (used by<linearGradient>for gradient references) becausehrefwas not in the attribute allowlist — causing some SVG icons to render without their gradients.hrefis now in the allowlist; the deprecatedxlink:hrefform remains blocked. -
Command action security warning updated to canonical
DD_ACTION_COMMAND_*prefix (commitaa5fc98d). The shell-execution security warning logged by the Command action trigger still referenced the deprecatedDD_TRIGGER_COMMAND_*prefix instead of the currentDD_ACTION_COMMAND_*form. Updated to match the v1.5.0 canonical prefix so the warning points operators to the correct environment variable. -
Docker event history pruning amortized to reduce per-event splice cost (commit
d6690cc8).appendBoundedHistoryEntrywas slicing the history array on every call once it exceeded the configured maximum (1,000 events). On high-event Docker hosts this meant a fullsplice(0, N)on every new event. The threshold is now2×maxEntries, so splices are amortized across many appends rather than firing on every single one. -
Agent container list no longer shares mutable LokiJS references (commit
1f7d8034). The agentGET /containershandler was callinggetContainersRaw()and thenstripLokiMetadatadirectly on the raw LokiJS document, which mutates the live store object. The handler now clones each container viacloneContainerbefore stripping metadata so the store is never modified by an API read.
Security (rc.20)
-
Proxied SVG icons sanitized before caching (commit
54d93a3b). SVG payloads fetched from upstream icon CDNs (Simple Icons, Walkxcode dashboard-icons, etc.) are now run through an allowlist-based sanitizer (app/api/icons/svg.ts) before being written to the icon cache.<script>tags,on*event attributes,javascript:hrefs, and all non-allowlisted attributes are stripped. TheContent-Type-Options: nosniffheader is also set on icon responses. -
Command action trigger env values sanitized to strip shell metacharacters (commit
1113d8ca). Container-derived values injected into the command subprocess environment (image_tag_value,result_tag,update_kind_local_value,update_kind_remote_value, andcontainer_json) are now stripped of shell metacharacters ($,`,;,(,)) before the environment map is passed toexecFile. This closes a command-injection vector that existed when a malicious registry served a crafted tag value. -
Credential status pattern matching uses RE2 (commit
df9b914a).BaseRegistry.getRejectedCredentialStatuswas using a native JSRegExpto detect 401/403 token-request failures in error messages. Replaced withRE2JS.compile(…)to maintain the project-wide ReDoS immunity guarantee for all user-data-adjacent pattern matching. -
Registry instances using
insecure=truenow log a warning on every request (commitcd14e3a9). Previously no log was emitted when a registry was configured withINSECURE=true(certificate validation disabled). A warn-level log now fires on each call so operators with insecure TLS registries get an ongoing reminder in their log output that the registry is operating without certificate validation. -
DD_SESSION_SECRETis now required; startup fails without it (commitb9e8be38). Previously, whenDD_SESSION_SECRETwas unset, drydock generated an ephemeral per-process random secret and logged an error in production. An ephemeral secret invalidates all sessions on every restart, which is unsafe in any deployment. The fallback is removed: startup now throws immediately with a clear message if the variable is missing. Operators who relied on the ephemeral fallback must setDD_SESSION_SECRETto a strong persistent value before upgrading. -
Agent connections over plain HTTP with a configured secret are now rejected at startup (commit
7c6f6c20). Previously, configuring an agent secret over a non-TLS connection logged only a warning, allowing the secret to be transmitted in cleartext. Agent clients with a non-emptysecretand a plain-HTTP base URL now throw at initialization, blocking the connection before any secret can be sent over the wire. Use HTTPS or a TLS-terminating proxy for agent connections that require a shared secret. -
GHCR token fallback treats whitespace-only tokens as missing (commit
711d583c).getGhcrTokenFallbackand the registry suppression logic were testingtoken.length > 0, which treated a token value of" "as a valid credential. All token checks now call.trim().length > 0. The GitHub release-notes provider also logs a warn whenapi.github.comrejects a configured token or GHCR fallback, instead of silently swallowing the auth failure. -
#362 — Agents intermittently showing 0 running containers in the controller UI. When the Docker watcher's container enumeration on an agent (
this.getContainers()) threw — transient socket-proxy hiccup, Docker daemon restart, momentary network blip — the warning was logged but execution still fell through toevent.emitWatcherSnapshot({ containers: [] }). The controller treats the snapshot as authoritative andpruneOldContainers([], watcherName)then wiped every container for that agent's watcher from the controller's store. The agent's own store was untouched (which is why its next cron still logged "25 containers watched"), but the controller UI showed 0 until either the next clean cron cycle landed or the controller was restarted (re-handshake re-fetched the agent's container list). The watcher now tracks ancontainerEnumerationFailedflag and suppresses the snapshot emission on failure so the controller preserves last-known state across transient enumeration errors.
i18n (rc.20)
- 14 new locales bundled (discussion #329). Italian, Spanish, German, French, Brazilian Portuguese, Dutch, Polish, Turkish, Japanese, Korean, Russian, Vietnamese, Ukrainian, and Arabic now ship with the same namespace coverage as the existing English / Simplified Chinese / Traditional Chinese catalogs. Initial machine-translation seeds were authored against the rc.17 catalog snapshot and then run through Crowdin QA — 28 real bugs were fixed via the Crowdin API (Italian accent recovery, capitalization mismatches, one zh-TW stray symbol) and ~700 false-positive brand/tech terms (Drydock, Trivy, Cosign, SBOM, GHCR, etc.) were added to the per-language Crowdin dictionaries to silence spellcheck noise. 17 locales total in the picker.
- Dashboard stat cards now react to runtime locale changes (commit
f3df9b41).useDashboardComputedwas readingi18n.global.tinsidecomputed()without touchingi18n.global.locale.value, so Vue's reactivity graph never registered a dependency on the locale ref. Switching language in Config → Appearance left the dashboardCONTAINERS / SECURITY ISSUES / REGISTRIES / UPDATES AVAILABLElabels in English until a page reload. The two affected computeds (useStatsComputed,getUpdateBreakdownBucketDefs) now readi18n.global.locale.valueat the top so the dependency is captured and the labels re-render on locale switch. - 23 remaining hardcoded English strings localized across container, security, dashboard, and audit surfaces (commits
5b29e134,c426f047,79b472ff,8576f9de,1fd32f9e,2a5183cd,a05e34c3). Update toast messages, security delta tooltips, container row tooltips/aria labels/empty states, suggested-tag + update-dialog content, dashboard updates-widget aria, the "Image age" detail label, audit event names, security-scanner-disabled status, and empty-state copy were all still emitting English even when a non-English locale was selected. Each surface now flows throught()(toast helpersgetContainerUpdateStartedMessage,getForceContainerUpdateStartedMessage,getContainerAlreadyUpToDateMessage, etc. now accept aTranslateFnparameter rather than baking English in). Catalog keys added to all 17 locales.
Performance (rc.20)
-
ECR private auth tokens cached per instance (commit
d716e05e). EachfetchPrivateEcrAuthTokencall previously issued a freshGetAuthorizationTokenrequest to the ECR control plane. ECR tokens are valid for 12 hours; tokens are now cached per instance and refreshed when within 5 minutes of expiry, eliminating redundant calls on every container watch cycle for ECR-heavy deployments. -
Watcher fan-out capped at 10 concurrent container watches (commit
765a68d6). The Docker watcher's per-cron-cyclewatchContainerfan-out previously ran with unbounded concurrency viaPromise.allSettled. On inventories with 40–200 containers this produced a burst of simultaneous registry calls, often triggering the very rate limits that the token bucket is designed to prevent. Fan-out is now capped at 10 concurrentwatchContainerinvocations viapLimit(10).