Security release. Closes a cache-poisoning vulnerability in both forwarder and resolver paths (issue #469). Operators on 1.6.5 should upgrade.
CVE / advisory: the issue was reported and disclosed publicly via the issue tracker. A GHSA entry will follow.
What's Changed
Security
-
Drop upstream responses with mismatched question section (#470, #471). Both the forwarder (
middleware/forwarder/forwarder.go) and the resolver wire layer (middleware/resolver/client.go:Conn.Exchange) used to accept an upstream reply as long as the DNS transaction ID matched. A malicious or misbehaving upstream could answer a query forattacker.example.with a message whose question section wasvictim.example.— and because the cache is keyed on the response's question, the unrelated answer was stored undervictim.example.and served from cache to later clients.Both paths now require the response to contain exactly one question whose
Name(case-insensitively, per DNS wire rules),Qtype, andQclassmatch the outstanding request. Mismatches drop the response and fall through to the next upstream, with the existing retry path covering transient cases. New regression tests pin the contract at both layers.Closes #469.
Features
-
Per-client static-answer middleware ("views", #360). New
[[views]]config block returns different DNS answers based on the originating client's source IP — split-horizon resolution where*.example.lan.can resolve to one address for LAN clients and a different one for VPN clients without disturbing recursion for everyone else. Each view declares azonelabel, a list ofnetworks(CIDR), and a list ofanswers(zone-file format, wildcards allowed).Match precedence follows RFC 4592: exact owners override a covering wildcard (§3.2); among wildcards, the longest matching suffix (closest encloser, §2.2.1) wins. Views are evaluated in declaration order; the first whose networks contain the client IP wins. A matched-but-no-answer view falls through (CoreDNS-style "fallthrough" semantics). Internal sub-pipelines skip views entirely. Position in the chain: between
hostsfileandblocklist, so a view-curated answer wins over a global blocklist rule for that name. See the example block incontrib/linux/sdns.confand the README for usage. -
Non-blocking blocklist persistence + bulk import API. Reported issue: blocklist mutations via the HTTP API caused DNS to temporarily stop responding while changes were applied. Root cause:
Set/Removeheldb.mu(mutually exclusive with theRLockthatServeDNStakes on every query) for the full duration of the synchronous disk write insave(). Large blocklists turned that into multi-millisecond stalls of every in-flight query.Fixes:
- Mutate maps under
b.mu, snapshot, releaseb.mu, then persist outside the lock.ServeDNSreaders no longer wait on disk I/O. - A new
saveMuserializes concurrent persists; theos.Renameof a temp file (CreateTemp+Sync+Rename) is the linearisation point, so the on-disk file always matches some in-memory state and never a half-written intermediate. - New
SetBatch/RemoveBatchperform one map lock + one disk write for an entire batch instead of one disk write per entry.
Two new HTTP endpoints accept
{"keys":[...]}JSON bodies (8 MiB cap, unknown fields rejected), returning{requested, added/removed, skipped/missing}:POST /api/v1/block/set/batchPOST /api/v1/block/remove/batch
A new contract test (
Test_BlockList_NoStallDuringSave) holdssaveMufrom a goroutine and asserts that a concurrentServeDNS-styleRLockreturns within 2s, so a future regression that re-introduces disk I/O inside the map lock fails loudly. - Mutate maps under
Kubernetes Middleware Refactor
Collapses the dual-mode (killer/boring) implementation into one sharded registry with per-headless-service incremental state. Slice events go through ApplyEndpointSlice / RemoveEndpointSlice plus a worker-coalesced MaterialiseHeadless, so a one-pod change in a 1000-pod headless service costs O(slice size) for state work and O(delta) RR allocations.
Correctness fixes that came along with the refactor:
- SERVFAIL for cluster-domain queries when not synced — forward queries no longer leak to public DNS during initial informer warmup; reverse queries still fall through.
- UID guard rejects late
EndpointSliceevents from a deleted Service via tombstone tracking andownerRef.UIDmatching, plus dirty-replay onAddServiceso the synthetic seed handover doesn't drop other slices. onEndpointSliceUpdateretracts the slice from the old service on a service-name relabel.cluster_domainis normalised (trailing dot, mixed case) at construction and atRegistry.SetClusterDomain.- Anonymous headless endpoints get distinct dashed-IP SRV targets (
10-0-0-1.svc...) instead of collapsing to one record. buildConfigdefers toclientcmd's default loading rules so multi-fileKUBECONFIGentries merge correctly.- Skip-if-equal guard in
applyEndpointSliceeliminates rebuilds forresourceVersion-only update events. - SRV port-number edits invalidate the cached
*dns.SRVpointer; SRV glue refresh allocates a new answerSet rather than mutating the published one in place. Runwaits on per-handlerHasSynced(not justinformer.HasSynced) and flushes pending rebuilds before publishingsynced=true.DeleteServiceorder is now tombstone → flush → DeleteService, preventing a worker rebuild from re-populating the registry after wipe.
config.KubernetesConfig.killer_mode is dropped from the live API; existing configs still parse (the field is retained but ignored), but new configs should omit it.
Resolver / DNSSEC Refactor
Pure DNSSEC verify functions (RRSIG, DS, NSEC, NSEC3 denial-of-existence proofs) and the EDE-coded sentinel errors that go with them moved into a new middleware/resolver/dnssec subpackage. The generic DNS RR helpers (ExtractRRSet, FilterRRsToZone, NameInZone, DnameTarget) and the EDEError type moved into util/, where both resolver and dnssec can share them without a circular import. Resolver-side network errors keep their identities but now use *util.EDEError instead of the resolver-local ValidationError type.
(*Resolver).lookup() was split in place: the per-server query goroutine moved to a queryServer method, the adaptive RTT-based timeout became adaptiveServerTimeout, and the trailing fallback-picker became pickFallbackResponse. Behaviour is unchanged; lookup() drops from ~250 lines to ~140 and the goroutine entry no longer captures state via closure. Net diff: −2016 / +265 in middleware/resolver/, ~1100 lines under middleware/resolver/dnssec/.
Config
- Update B-root to current IANA addresses. ICANN/Verisign re-numbered B-root in late 2023 (IPv4
199.9.14.201→170.247.170.2; IPv62001:500:200::b→2801:1b8:10::b). The old addresses still answer for transitional reasons and priming even discovers the new ones at runtime, but the embedded default config, the Linux packaging config, the benchmark fixtures, and the fuzz seed corpus now match the canonicalnamed.rootlist.
Dependencies
github.com/semihalev/zlog/v2→ v2.0.8 (v2.0.7 broke the variadic-KV signature; v2.0.8 restores it, so this is a no-op upgrade).github.com/fsnotify/fsnotify→ v1.10.0.goreleaser/goreleaser-action→ v7.2.1.
Upgrade Notes
- Recommended for everyone on 1.6.5. The cache-poisoning fix is the headline reason for this release.
- Config compatibility:
configverbumps to1.6.6; existing configs continue to parse, you'll just see a one-line "Config file is out of version" log warning until you regenerate. The deprecatedkubernetes.killer_modekey is now ignored. - No on-disk format changes to
trust-anchor.db/trust-anchor-tombstones.db/ blocklist persistence — the new blocklist save path is a strict superset of the old format.
Full Changelog: v1.6.5...v1.6.6