github Basekick-Labs/arc v26.05.1
Arc 26.05.1

5 hours ago

Highlights

Arc 26.05.1 is the Enterprise GA release. Major themes: production-ready Helm chart, peer file replication for bare-metal/edge clusters, dedicated compactor role with automatic failover, cluster TLS + shared-secret auth, and a top-to-bottom hardening pass on ingestion and the query path.

Enterprise

  • Production-ready Helm chart (helm/arc-enterprise/) — role-separated StatefulSets (writer/reader/compactor), HA bootstrap (Raft-safe), automatic failover wiring, durable-by-default WAL, fail-fast install validation, secure defaults, and quick-start preset values for shared-storage and local-storage topologies.
  • Peer file replication — Parquet files replicate between nodes automatically, SHA-256 verified through a Raft-backed manifest. Resumable transfers; non-leader writes forward to the Raft leader. Enables bare-metal/VM/edge deployments without shared object storage.
  • Dedicated compactor role with automatic failoverARC_CLUSTER_ROLE=compactor runs compaction on exactly one node. Raft leader monitors and reassigns after ~30s unresponsiveness; 60s cooldown prevents cycling.
  • Cluster TLS + shared-secret auth — encrypted inter-node traffic across coordinator, WAL replication, shard replication, and Raft. HMAC-SHA256-signed joins with replay protection.
  • Kubernetes-ready node identity — defaults to OS hostname (StatefulSet pod name). Graceful LeaveNotify on shutdown so peers remove the node from Raft immediately instead of waiting for heartbeat timeout.
  • Reader query freshness via WAL replication — readers apply replicated WAL entries to their local ArrowBuffer; ingested data is queryable on readers within milliseconds of arriving at the writer.
  • Manifest-vs-storage reconciliation (anti-entropy) — periodic reconciler detects orphan manifest entries and orphan storage files. Off-by-default; report-only on first run; bounded blast radius via per-run cap, grace window, and root-walk cap.
  • Dead node removal APIDELETE /api/v1/cluster/nodes/:id removes dead voters from Raft and the FSM.
  • Cluster-safe schedulers — retention, continuous queries, and DELETE endpoint all gate on IsPrimaryWriter() per tick. Failover and demotion take effect without restart. Manifest-before-storage ordering preserved.
  • Batched Raft commands for compactionCompletionWatcher applies all RegisterFile + DeleteFile ops for a manifest in one Raft entry. Apply latency ~200ms → ~5ms for typical 20-output manifests.
  • RBAC cache lifecycleClose() shutdown for the cleanup goroutine, bounded caches (default 10K entries each).

Deployment

  • Traefik v3.6 in docker-compose examples — replaces nginx in both deploy/docker-compose/ (shared storage) and deploy/docker-compose-local/ (local storage). Routing via container labels; adding a node is one compose edit.
  • Deployment Patterns docs — new page comparing shared-storage vs. local-storage cluster topologies side-by-side: docs.basekick.net/arc-enterprise/deployment-patterns.

Hardening — Ingestion (26.05.1 Pre-GA)

A 4-agent staff/principal-engineer review of all three ingest paths (MessagePack columnar, MessagePack row, Line Protocol, TLE) surfaced and fixed five criticals. Sustained-load benchmarks: p99 latency 3.68ms → 3.13ms (~17% better), ~19M rec/s on MsgPack columnar, 0% errors over 60s.

  • Multi-hour flush atomicity — manifest now updates only after all hour buckets have written successfully (collect-then-register).
  • Graceful-shutdown panic eliminatedArrowBuffer.Close() no longer races writers on a closed channel.
  • Schema-evolution corruption under concurrent writes eliminated — bounded retry loop returns typed ingest.ErrSchemaChurnExceeded (HTTP 503) instead of silently committing schema-mixed buffers.
  • WAL backpressure no longer masquerades as durability — typed wal.ErrWALDropped sentinel with separate total_wal_dropped counter; cluster-replication receivers tolerate the error instead of diverging followers.
  • Performance: cached column signatures (zero-alloc hot-path schema check), pre-built parquet writer properties, single-pass sort permutation reuse, value-typed merge structs.

Hardening — Query Path (26.05.1 Pre-GA)

A second 4-agent review on the read path surfaced six critical-path issues. ClickBench-hits regression budget held: ~3% on a 99.9M-row aggregate.

  • Expanded SQL denylist — gates ATTACH/DETACH/COPY/EXPORT/IMPORT/PRAGMA/SET/RESET/LOAD/INSTALL/CALL on the read-only query API. Comment-strip and string-literal masking close DROP /* */ TABLE x and SELECT 'DROP TABLE x' bypass shapes.
  • x-arc-database header validation + universal read_parquet path quoting — every read_parquet('PATH', ...) site now routes through a single-source-of-truth quoting helper.
  • Direct read_parquet() in user SQL rejected — prevents bypass of the database/measurement RBAC pair-check.
  • Streaming-response error semanticsstreamTypedJSON and streamArrowJSON now propagate Scan failures and ctx.Err(); partial streams are no longer marked Complete.
  • Parallel-partition partial failure surfaced as request error — any errored partition fails the whole request with HTTP 500 (was: returned surviving rows as success: true).
  • Arrow IPC streaming memory bound — explicit per-batch Release() instead of accumulated defer; constant per-batch memory restored.

Security

  • Write endpoints now require write-tier auth — five ingest endpoints (/api/v1/write/msgpack, /write, /api/v2/write, /api/v1/write/line-protocol, /api/v1/write/tle) and four bulk-import endpoints lacked explicit write-tier auth. All now use auth.RequireWrite; imports use auth.RequireAdmin.
  • Gzip + zstd decompression-bomb fixes — line-protocol and TLE handlers now apply the same hard 100MB cap that msgpack already enforced. Bound is enforced during decoding so a 28KB → 256MB zstd bomb is rejected with bounded allocation.
  • Symmetric maxSize cap on uncompressed branch — closes the uncompressed-OOM vector left open after the gzip fix.
  • Defensive body copy — LP and TLE handlers no longer hand fasthttp-owned slices to async parsers; regression test pins the no-aliasing invariant.
  • Cluster-safe DELETE — readers reject deletes with 503 before any storage scan; database/measurement inputs validated against .., /, \.
  • Directory permissions 0700 — auth DB, CQ definitions, retention policies, Raft state, telemetry, import output. Existing deployments retain prior permissions; operators may chmod 700 manually.
  • SQL escaping defense-in-depth — DuckDB SET memory_limit and compaction ORDER BY sort keys.

Deprecations

  • ?p=token query parameter authentication (InfluxDB 1.x compat) is now deprecated. Tokens in URLs leak through reverse proxies / load balancers / access logs. Continues to work; first use logs a one-time warning. Migrate to Authorization: Bearer <token>.

Bug Fixes

  • WAL filename rotation collision — second-precision filenames could collide on rapid rotation. Now nanosecond-precision (arc-YYYYMMDD_HHMMSS.000000000.wal).
  • Query registry reports 0 row count for Arrow-path queries — Arrow streams asynchronously; registry now receives real counts via onComplete/onFail/onTimeout callbacks.
  • Low-volume measurements starved of age-based flushes under loadperiodicFlush no longer extends the timer when a new buffer's deadline is later than the current one.
  • Memory not released after delete/retentionClearHTTPCache() + debounced debug.FreeOSMemory() after every delete and retention run; delete COPY queries now constrain ROW_GROUP_SIZE.
  • Writer-only schedulers skipped all ticks without failover enabledCoordinator.IsPrimaryWriter() falls back to Role == RoleWriter when no failover manager is configured. Fixes retention and CQs silently no-op'ing on default cluster config.
  • CQ not scheduled after API creationhandleCreate now calls scheduler.StartJobDirect; previously required a restart.
  • RBAC goroutine leakRBACManager.Close() added; registered with the shutdown coordinator.
  • Row-format MessagePack flush hardening — regression coverage + arc_buffer_flush_failures_total Prometheus counter for visibility.

Dependencies

Package From To
DuckDB Go binding v2.5.5 (DuckDB 1.4.4) v2.10501.0 (DuckDB 1.5.1)
aws-sdk-go-v2 core 1.40 1.41.5
aws-sdk-go-v2/service/s3 1.92 1.99
smithy-go 1.23 1.24

S3 DNS-timeout retries are now automatic; non-existent AWS profile no longer fails config load.

How to Update

Docker:

docker pull ghcr.io/basekick-labs/arc:26.05.1

Debian/Ubuntu:

wget https://github.com/basekick-labs/arc/releases/download/v26.05.1/arc_26.05.1_amd64.deb
sudo dpkg -i arc_26.05.1_amd64.deb

RHEL/Fedora:

wget https://github.com/basekick-labs/arc/releases/download/v26.05.1/arc-26.05.1-1.x86_64.rpm
sudo rpm -i arc-26.05.1-1.x86_64.rpm

Helm:

helm upgrade arc https://github.com/basekick-labs/arc/releases/download/v26.05.1/arc-26.05.1.tgz

Full release notes (long-form, with config tables and per-fix detail): RELEASE_NOTES_2026.05.1.md

What's Changed

  • fix(delete,retention): clear DuckDB cache and free OS memory after execution by @xe-nvdk in #372
  • chore(deps): bump github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream from 1.7.3 to 1.7.8 in the go_modules group across 0 directory by @dependabot[bot] in #370
  • fix(delete,retention): clear DuckDB cache and free OS memory after ex… by @xe-nvdk in #374
  • chore(deps): upgrade DuckDB to v1.5.1 + fix age-based flush starvation by @xe-nvdk in #375
  • fix(query): record real row count in registry for Arrow path (#333) by @xe-nvdk in #376
  • fix(wal): use nanosecond precision in filenames to prevent rotation collision (#340) by @xe-nvdk in #377
  • fix(compaction,database): escape SQL interpolations in SET memory_limit and ORDER BY (#289, #290) by @xe-nvdk in #378
  • fix(security): use 0700 for directories containing sensitive data (#298) by @xe-nvdk in #379
  • fix(auth): deprecate ?p= query parameter authentication with warning (#297) by @xe-nvdk in #380
  • fix(auth): add Close() to RBACManager and bound permission caches (#295, #296) by @xe-nvdk in #381
  • feat(cluster): add TLS encryption and shared secret authentication (#366, #367) by @xe-nvdk in #382
  • feat(cluster): stable node identity and graceful leave broadcast by @xe-nvdk in #383
  • feat(cluster): add dead node removal API for Kubernetes scale-down by @xe-nvdk in #384
  • feat(cluster): wire IngestHandler on readers for query freshness by @xe-nvdk in #385
  • feat(cluster): file manifest in Raft FSM — foundation for peer replication (Phase 1) by @xe-nvdk in #386
  • feat(cluster): peer-to-peer file replication — Phase 2 (fetch protocol + background puller) by @xe-nvdk in #387
  • feat(cluster): peer-replication catch-up on join — Phase 3 (multi-peer fallback + startup reconciliation) by @xe-nvdk in #388
  • feat(cluster): dedicated compactor role with manifest-aware compaction — Phase 4 by @xe-nvdk in #389
  • feat(cluster): automatic compactor failover by @xe-nvdk in #390
  • fix(ingest): harden row-format msgpack flush visibility by @xe-nvdk in #402
  • feat(cluster): batched Raft commands for compaction manifests by @xe-nvdk in #403
  • feat(cluster): resumable peer-replication transfers (#398) by @xe-nvdk in #404
  • feat(enterprise): Helm chart + Traefik compose examples by @xe-nvdk in #405
  • chore(deps): bump github.com/jackc/pgx/v5 from 5.8.0 to 5.9.0 in the go_modules group across 1 directory by @dependabot[bot] in #406
  • feat(cluster): writer-only retention with Raft manifest propagation by @xe-nvdk in #407
  • feat(delete): cluster-safe DELETE endpoint — writer gate + Raft manifest by @xe-nvdk in #408
  • feat(cluster): writer-only CQ execution with Raft writer gate by @xe-nvdk in #409
  • fix(ingest): ingestion correctness hardening and performance — 26.05.1 pre-release by @xe-nvdk in #411
  • feat(cluster): manifest-vs-storage reconciliation — Phase 5 anti-entropy by @xe-nvdk in #412
  • fix(ingest): 26.05.1 critical hardening — 5 criticals + review-pass cleanup by @xe-nvdk in #413
  • fix(query): six critical-path hardening fixes from query-path review by @xe-nvdk in #414

Full Changelog: v26.04.1...v26.05.1

Don't miss a new arc release

NewReleases is sending notifications on new releases.