Highlights
Arc 26.05.1 is the Enterprise GA release. Major themes: production-ready Helm chart, peer file replication for bare-metal/edge clusters, dedicated compactor role with automatic failover, cluster TLS + shared-secret auth, and a top-to-bottom hardening pass on ingestion and the query path.
Enterprise
- Production-ready Helm chart (
helm/arc-enterprise/) — role-separated StatefulSets (writer/reader/compactor), HA bootstrap (Raft-safe), automatic failover wiring, durable-by-default WAL, fail-fast install validation, secure defaults, and quick-start preset values for shared-storage and local-storage topologies. - Peer file replication — Parquet files replicate between nodes automatically, SHA-256 verified through a Raft-backed manifest. Resumable transfers; non-leader writes forward to the Raft leader. Enables bare-metal/VM/edge deployments without shared object storage.
- Dedicated compactor role with automatic failover —
ARC_CLUSTER_ROLE=compactorruns compaction on exactly one node. Raft leader monitors and reassigns after ~30s unresponsiveness; 60s cooldown prevents cycling. - Cluster TLS + shared-secret auth — encrypted inter-node traffic across coordinator, WAL replication, shard replication, and Raft. HMAC-SHA256-signed joins with replay protection.
- Kubernetes-ready node identity — defaults to OS hostname (StatefulSet pod name). Graceful
LeaveNotifyon shutdown so peers remove the node from Raft immediately instead of waiting for heartbeat timeout. - Reader query freshness via WAL replication — readers apply replicated WAL entries to their local ArrowBuffer; ingested data is queryable on readers within milliseconds of arriving at the writer.
- Manifest-vs-storage reconciliation (anti-entropy) — periodic reconciler detects orphan manifest entries and orphan storage files. Off-by-default; report-only on first run; bounded blast radius via per-run cap, grace window, and root-walk cap.
- Dead node removal API —
DELETE /api/v1/cluster/nodes/:idremoves dead voters from Raft and the FSM. - Cluster-safe schedulers — retention, continuous queries, and DELETE endpoint all gate on
IsPrimaryWriter()per tick. Failover and demotion take effect without restart. Manifest-before-storage ordering preserved. - Batched Raft commands for compaction —
CompletionWatcherapplies all RegisterFile + DeleteFile ops for a manifest in one Raft entry. Apply latency ~200ms → ~5ms for typical 20-output manifests. - RBAC cache lifecycle —
Close()shutdown for the cleanup goroutine, bounded caches (default 10K entries each).
Deployment
- Traefik v3.6 in docker-compose examples — replaces nginx in both
deploy/docker-compose/(shared storage) anddeploy/docker-compose-local/(local storage). Routing via container labels; adding a node is one compose edit. - Deployment Patterns docs — new page comparing shared-storage vs. local-storage cluster topologies side-by-side: docs.basekick.net/arc-enterprise/deployment-patterns.
Hardening — Ingestion (26.05.1 Pre-GA)
A 4-agent staff/principal-engineer review of all three ingest paths (MessagePack columnar, MessagePack row, Line Protocol, TLE) surfaced and fixed five criticals. Sustained-load benchmarks: p99 latency 3.68ms → 3.13ms (~17% better), ~19M rec/s on MsgPack columnar, 0% errors over 60s.
- Multi-hour flush atomicity — manifest now updates only after all hour buckets have written successfully (collect-then-register).
- Graceful-shutdown panic eliminated —
ArrowBuffer.Close()no longer races writers on a closed channel. - Schema-evolution corruption under concurrent writes eliminated — bounded retry loop returns typed
ingest.ErrSchemaChurnExceeded(HTTP 503) instead of silently committing schema-mixed buffers. - WAL backpressure no longer masquerades as durability — typed
wal.ErrWALDroppedsentinel with separatetotal_wal_droppedcounter; cluster-replication receivers tolerate the error instead of diverging followers. - Performance: cached column signatures (zero-alloc hot-path schema check), pre-built parquet writer properties, single-pass sort permutation reuse, value-typed merge structs.
Hardening — Query Path (26.05.1 Pre-GA)
A second 4-agent review on the read path surfaced six critical-path issues. ClickBench-hits regression budget held: ~3% on a 99.9M-row aggregate.
- Expanded SQL denylist — gates
ATTACH/DETACH/COPY/EXPORT/IMPORT/PRAGMA/SET/RESET/LOAD/INSTALL/CALLon the read-only query API. Comment-strip and string-literal masking closeDROP /* */ TABLE xandSELECT 'DROP TABLE x'bypass shapes. x-arc-databaseheader validation + universalread_parquetpath quoting — everyread_parquet('PATH', ...)site now routes through a single-source-of-truth quoting helper.- Direct
read_parquet()in user SQL rejected — prevents bypass of the database/measurement RBAC pair-check. - Streaming-response error semantics —
streamTypedJSONandstreamArrowJSONnow propagate Scan failures andctx.Err(); partial streams are no longer markedComplete. - Parallel-partition partial failure surfaced as request error — any errored partition fails the whole request with HTTP 500 (was: returned surviving rows as
success: true). - Arrow IPC streaming memory bound — explicit per-batch
Release()instead of accumulateddefer; constant per-batch memory restored.
Security
- Write endpoints now require write-tier auth — five ingest endpoints (
/api/v1/write/msgpack,/write,/api/v2/write,/api/v1/write/line-protocol,/api/v1/write/tle) and four bulk-import endpoints lacked explicit write-tier auth. All now useauth.RequireWrite; imports useauth.RequireAdmin. - Gzip + zstd decompression-bomb fixes — line-protocol and TLE handlers now apply the same hard 100MB cap that msgpack already enforced. Bound is enforced during decoding so a 28KB → 256MB zstd bomb is rejected with bounded allocation.
- Symmetric
maxSizecap on uncompressed branch — closes the uncompressed-OOM vector left open after the gzip fix. - Defensive body copy — LP and TLE handlers no longer hand fasthttp-owned slices to async parsers; regression test pins the no-aliasing invariant.
- Cluster-safe DELETE — readers reject deletes with 503 before any storage scan; database/measurement inputs validated against
..,/,\. - Directory permissions 0700 — auth DB, CQ definitions, retention policies, Raft state, telemetry, import output. Existing deployments retain prior permissions; operators may
chmod 700manually. - SQL escaping defense-in-depth — DuckDB
SET memory_limitand compactionORDER BYsort keys.
Deprecations
?p=tokenquery parameter authentication (InfluxDB 1.x compat) is now deprecated. Tokens in URLs leak through reverse proxies / load balancers / access logs. Continues to work; first use logs a one-time warning. Migrate toAuthorization: Bearer <token>.
Bug Fixes
- WAL filename rotation collision — second-precision filenames could collide on rapid rotation. Now nanosecond-precision (
arc-YYYYMMDD_HHMMSS.000000000.wal). - Query registry reports 0 row count for Arrow-path queries — Arrow streams asynchronously; registry now receives real counts via
onComplete/onFail/onTimeoutcallbacks. - Low-volume measurements starved of age-based flushes under load —
periodicFlushno longer extends the timer when a new buffer's deadline is later than the current one. - Memory not released after delete/retention —
ClearHTTPCache()+ debounceddebug.FreeOSMemory()after every delete and retention run; delete COPY queries now constrainROW_GROUP_SIZE. - Writer-only schedulers skipped all ticks without failover enabled —
Coordinator.IsPrimaryWriter()falls back toRole == RoleWriterwhen no failover manager is configured. Fixes retention and CQs silently no-op'ing on default cluster config. - CQ not scheduled after API creation —
handleCreatenow callsscheduler.StartJobDirect; previously required a restart. - RBAC goroutine leak —
RBACManager.Close()added; registered with the shutdown coordinator. - Row-format MessagePack flush hardening — regression coverage +
arc_buffer_flush_failures_totalPrometheus counter for visibility.
Dependencies
| Package | From | To |
|---|---|---|
| DuckDB Go binding | v2.5.5 (DuckDB 1.4.4) | v2.10501.0 (DuckDB 1.5.1) |
aws-sdk-go-v2 core
| 1.40 | 1.41.5 |
aws-sdk-go-v2/service/s3
| 1.92 | 1.99 |
smithy-go
| 1.23 | 1.24 |
S3 DNS-timeout retries are now automatic; non-existent AWS profile no longer fails config load.
How to Update
Docker:
docker pull ghcr.io/basekick-labs/arc:26.05.1Debian/Ubuntu:
wget https://github.com/basekick-labs/arc/releases/download/v26.05.1/arc_26.05.1_amd64.deb
sudo dpkg -i arc_26.05.1_amd64.debRHEL/Fedora:
wget https://github.com/basekick-labs/arc/releases/download/v26.05.1/arc-26.05.1-1.x86_64.rpm
sudo rpm -i arc-26.05.1-1.x86_64.rpmHelm:
helm upgrade arc https://github.com/basekick-labs/arc/releases/download/v26.05.1/arc-26.05.1.tgzFull release notes (long-form, with config tables and per-fix detail): RELEASE_NOTES_2026.05.1.md
What's Changed
- fix(delete,retention): clear DuckDB cache and free OS memory after execution by @xe-nvdk in #372
- chore(deps): bump github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream from 1.7.3 to 1.7.8 in the go_modules group across 0 directory by @dependabot[bot] in #370
- fix(delete,retention): clear DuckDB cache and free OS memory after ex… by @xe-nvdk in #374
- chore(deps): upgrade DuckDB to v1.5.1 + fix age-based flush starvation by @xe-nvdk in #375
- fix(query): record real row count in registry for Arrow path (#333) by @xe-nvdk in #376
- fix(wal): use nanosecond precision in filenames to prevent rotation collision (#340) by @xe-nvdk in #377
- fix(compaction,database): escape SQL interpolations in SET memory_limit and ORDER BY (#289, #290) by @xe-nvdk in #378
- fix(security): use 0700 for directories containing sensitive data (#298) by @xe-nvdk in #379
- fix(auth): deprecate ?p= query parameter authentication with warning (#297) by @xe-nvdk in #380
- fix(auth): add Close() to RBACManager and bound permission caches (#295, #296) by @xe-nvdk in #381
- feat(cluster): add TLS encryption and shared secret authentication (#366, #367) by @xe-nvdk in #382
- feat(cluster): stable node identity and graceful leave broadcast by @xe-nvdk in #383
- feat(cluster): add dead node removal API for Kubernetes scale-down by @xe-nvdk in #384
- feat(cluster): wire IngestHandler on readers for query freshness by @xe-nvdk in #385
- feat(cluster): file manifest in Raft FSM — foundation for peer replication (Phase 1) by @xe-nvdk in #386
- feat(cluster): peer-to-peer file replication — Phase 2 (fetch protocol + background puller) by @xe-nvdk in #387
- feat(cluster): peer-replication catch-up on join — Phase 3 (multi-peer fallback + startup reconciliation) by @xe-nvdk in #388
- feat(cluster): dedicated compactor role with manifest-aware compaction — Phase 4 by @xe-nvdk in #389
- feat(cluster): automatic compactor failover by @xe-nvdk in #390
- fix(ingest): harden row-format msgpack flush visibility by @xe-nvdk in #402
- feat(cluster): batched Raft commands for compaction manifests by @xe-nvdk in #403
- feat(cluster): resumable peer-replication transfers (#398) by @xe-nvdk in #404
- feat(enterprise): Helm chart + Traefik compose examples by @xe-nvdk in #405
- chore(deps): bump github.com/jackc/pgx/v5 from 5.8.0 to 5.9.0 in the go_modules group across 1 directory by @dependabot[bot] in #406
- feat(cluster): writer-only retention with Raft manifest propagation by @xe-nvdk in #407
- feat(delete): cluster-safe DELETE endpoint — writer gate + Raft manifest by @xe-nvdk in #408
- feat(cluster): writer-only CQ execution with Raft writer gate by @xe-nvdk in #409
- fix(ingest): ingestion correctness hardening and performance — 26.05.1 pre-release by @xe-nvdk in #411
- feat(cluster): manifest-vs-storage reconciliation — Phase 5 anti-entropy by @xe-nvdk in #412
- fix(ingest): 26.05.1 critical hardening — 5 criticals + review-pass cleanup by @xe-nvdk in #413
- fix(query): six critical-path hardening fixes from query-path review by @xe-nvdk in #414
Full Changelog: v26.04.1...v26.05.1