This is a recommended release focused on request-cancellation plumbing, memory-safety hardening across data fetch paths, and indexer backpressure. Key highlights include end-to-end AbortSignal threading through GraphQL resolvers, attribute fetchers, chunk fetches, and the trusted-node request queue so client disconnects no longer leave zombie work pinned on shared queues; a byte-range / Range-request hardening pass (PE-9081) that closes multiple paths through which oversized or truncated upstream responses silently pinned large Buffers in the external memory pool; and indexer backpressure via batched matched-item draining (BUNDLE_DATA_ITEM_DRAIN_BATCH) and hard caps on the DataItemIndexer / Ans104DataIndexer queues with drop-on-full recovery via bundle-repair-worker. Other notable fixes: stale ArNS resolution-failure caching that poisoned upstream nginx caches with the "unregistered" placeholder (PE-9072), a CompositeArNSResolver fast-fail fallback gap during AO/CU flaps (PE-9075), a bundles root-atom consistency bug that crashed the SQLite worker on optimistic indexing races (PE-9073), an envoy /tx prefix routing fix for txids starting with tx (PE-9079), an export-parquet metric-cardinality bloat fix (PE-9078), an Ans104OffsetSource stream-lifecycle leak (PE-9077), and a runtime dependency declaration for @ardrive/turbo-sdk (PE-9074). Adds a substantial set of new observability metrics for GraphQL request volume + cancellations, attribute-fetch source attribution, and Node.js process / event-loop saturation.
Added
-
CACHE_APEX_MAX_AGEConfiguration: New environment variable that bounds theCache-Controlmax-agereturned forAPEX_TX_IDresponses (default 3600s, 1 hour) and adds themust-revalidatedirective. Operators can now rotateAPEX_TX_IDwithout leaving upstream proxies serving the previous content for the data-layer cache lifetime (potentially up toCACHE_STABLE_MAX_AGEwithimmutable). See PE-9072. -
GraphQL Request Cancellation + Source-Attribution Metrics (PE-9087): A single
AbortSignalcomposed from the express request close event and a configurable server-side deadline is now threaded through every GraphQL resolver and downstream attribute fetcher. When a client disconnects (or the deadline elapses), in-flight attribute fetches andarweaveClientrequests cancel immediately instead of running to completion against arweave.net while their results are discarded. New env varGRAPHQL_RESOLVER_DEADLINE_MS(default 12000ms; set to0to disable) caps server-side resolver runtime. New metrics:graphql_requests_total(denominator for cancellation-rate alerts),graphql_resolver_cancellations_totalwith{reason="client_disconnect"|"deadline_exceeded"},attribute_fetch_totalandattribute_fetch_duration_secondswith{kind, subject, source, outcome}labels for end-to-end source attribution of L1 attribute fetches. Also fixes a bug wheregetTransactionAttributesalways returnedowner: nullfor L1 transactions even when the wallet was already cached, forcing every owner query to round-trip to arweave.net. -
Indexer Backpressure: Queue Caps + Batched Matched-Item Drain (PE-9089 + PE-9086): The unbundler's matched-item firehose now buffers in-process and drains in
setImmediatebatches sized byBUNDLE_DATA_ITEM_DRAIN_BATCH(default 100), guaranteeing event-loop turns for SQLite worker replies and other I/O when large bundles produce thousands of cross-thread messages per second. Buffer depth is exposed asqueue_length{queue_name="matchedItemBuffer"}. TheDataItemIndexerandAns104DataIndexerqueues now enforce hard caps (DATA_ITEM_INDEXER_QUEUE_SIZEandANS104_DATA_INDEXER_QUEUE_SIZE, both default 500000; set0to disable). Non-prioritized items pushed at the cap are dropped and counted indata_items_dropped_total{queue_name}; thebundle-repair-workerrecovers the dropped items on its next cycle, so dropping is a backpressure release valve, not data loss. Backpressure and depth checks are now O(1) tracked counters instead of linked-list walks. -
Node.js Process and Event-Loop Observability Metrics: The default
prom-clientcollectors are now enabled, exposingprocess_resident_memory_bytes,nodejs_heap_size_*,nodejs_eventloop_lag_seconds,nodejs_gc_duration_seconds, andnodejs_active_handles_total. Adds a newnodejs_event_loop_utilizationgauge (0..1, sampled at scrape time) — the most reliable signal for detecting main-thread saturation, since lag percentiles can read near-zero while the loop is fully pegged. Adds abundle_data_item_counthistogram (buckets up to 5M items) so heap and queue spikes can be correlated with the bundle size that triggered them rather than with aggregate ingest throughput.
Changed
- Manifest Resolution Type Surfaced:
ManifestResolutionnow carries an optionalresolutionTypefield ('path' | 'index' | 'fallback') populated byStreamingManifestPathResolver. Used by the data handler to apply differentCache-Controlpolicies per resolution type — see Fixed below. The field is optional so external implementations ofManifestPathResolverremain compatible.
Fixed
- Stale ArNS Resolution-Failure Caching (affects all gateways by default):
ARNS_NOT_FOUND_ARNS_NAMEdefaults to'unregistered_arns', so on every failed ArNS resolution the middleware setsreq.dataIdto the resolved placeholder and callsdataHandlerwithout setting anyCache-Control.setDataHeadersthen applied the data-layer ladder — most commonlyCACHE_UNSTABLE_TRUSTED_MAX_AGE(default 12h, but some operators run 90d). Result: the "Make this domain space yours" placeholder cached upstream (nginx honors upstreamCache-Control) and downstream long after a name actually registered. Same bug class on theARNS_NOT_FOUND_TX_IDandAPEX_TX_IDbranches, and on manifest fallback responses where the URL → data-id binding is mutable across manifest revisions.
Fixes:
ARNS_NOT_FOUND_TX_IDandARNS_NOT_FOUND_ARNS_NAMEresolved-404 responses now emitpublic, max-age=${CACHE_NOT_FOUND_MAX_AGE}, must-revalidate(default 60s).- Manifest fallback responses emit the same short
Cache-Control, overriding any longer ANT TTL set by the ArNS middleware. Path- and index-resolved manifest responses still inherit the ANT TTL. APEX_TX_IDresponses are bounded byCACHE_APEX_MAX_AGEwithmust-revalidate(see Added).sendNotFound404 responses now emitmust-revalidateinstead ofimmutable. (PE-9072)
Operator one-time sweep: entries already poisoned in nginx caches must be evicted manually — grep cache files for the placeholder's X-AR-IO-Data-Id (or for the resolved id of unregistered_arns on default-config gateways) and remove matches.
-
ArNS cached fallback on fast-fail (PE-9075):
CompositeArNSResolverpreviously fell back to a cached resolution only when fresh resolution exceededARNS_CACHED_RESOLUTION_FALLBACK_TIMEOUT_MS. When fresh resolution returnedundefinedfaster than the timeout (names cache miss, AO/CU dry-run error swallowed to undefined), the fallback didn't fire and the gateway dropped through toARNS_NOT_FOUND_ARNS_NAME— serving the "unregistered" placeholder for names with valid cached resolutions during AO/CU flaps. Now falls back to the cached resolution whenever fresh has no resolved id, matching the comment-documented intent. New metricarns_cached_resolution_fallback_on_empty_totalcounts how often this fires. -
Byte-Range and Range-Request Hardening (PE-9081): Closes a set of related defects across the byte-range fetch paths that allowed oversized or truncated upstream responses to silently pass through to consumers and pin large
Buffers in the external memory pool. All byte-range sources now enforce a symmetric contract: the upstreamcontent-lengthmust equal the requested region size, and responses that exceed the requested range are truncated by a bounded transform that destroys the underlying socket on close.attribute-fetchersrejects signature/owner attributes whose size is0or undefined, andfetchDataFromParentpre-allocates its result buffer and aborts on the first oversize chunk (closing an unbounded accumulator path).contiguous-data-byte-range-sourceandhttp-byte-range-sourcenow passmaxContentLengthto axios to bound client-side buffering, andar-io-data-source/gateways-data-sourceapply matching206-when-Range,content-lengthguards, and per-region byte caps. -
AbortSignal Threading Through Chunk Fetches and Trusted-Node Path (PE-9076): When a client request aborted, the cancellation signal had no path into
trustedNodeRequestQueueor the bucket-wait loops, so abandoned client requests still issued HTTP calls to arweave.net whose responses had nowhere to go.trustedNodeRequestnow checkssignal.throwIfAborted()at entry, after the bucket wait, and between tokens, releasing queue capacity immediately on disconnect. A newabortablePromiseRacehelper isolates caller signals from the sharedchunkPromiseCache, so a cancelled caller bails out without cancelling the underlying fetch for other waiters. -
ANS-104 Offsets Stream Lifecycle Repair (PE-9077): Parsing paths in
Ans104OffsetSourceconsume only a bounded prefix of the stream returned bygetDataand previously dropped the reference without destroying the stream. Under axiosresponseType: 'stream'the unread tail stayed pinned in theIncomingMessageexternal buffer pool — invisible to V8 — until the underlying socket was destroyed by eventual GC or unrelated cleanup. The parsing paths inparseDataItemHeader,extractDataItemMeta, andgetDataItemOffsetnow explicitly destroy the stream on both success and error. -
Envoy
/txPath Routing Fix (PE-9079): A redundant/txprefix route inenvoy.template.yamlshadowed the more specific/tx/rule for transactions whose ids start with the literal string "tx". Those requests were being routed to thetrusted_arweave_nodescluster intended for header-prefixed paths instead of the gateway data path, producing incorrect cache headers and skipping peer diversity. The redundant route is removed. -
Export-Parquet Job-Status Metric Cardinality Normalization (PE-9078): The
/ar-io/admin/export-parquet/status/:jobIdroute was falling through to the admin catch-all in thenormalizePathhelper, so everyrandomUUID-generatedjobIdbecame a permanent label value onhttp_request_duration_seconds. Combined with clickhouse-auto-import's polling cadence, metric cardinality grew unbounded over time and inflated scrape latency on indexer deployments. The path is now explicitly normalized alongside the singleton bundle-status endpoint. -
Bundles Root-Atom Consistency + Optimistic-Indexing Fix (PE-9073): Fixes two interacting defects that caused the indexer's SQLite worker to crash when the optimistic data-item admin endpoint (
/ar-io/admin/queue-data-item) raced with ANS-104 unbundling. Thenew_data_itemstable enforces a "root atom" invariant on eleven bundling-metadata fields (parent_id,root_transaction_id,root_parent_offset,data_offset,offset,size,signature_offset,signature_size,owner_offset,owner_size,signature_type) — they must move together or remain entirelyNULLtogether. The admin endpoint unconditionally nulled three of them on every re-POST, so repeated admin POSTs after an unbundle regressed back-filled values toNULLand the next flush failedNOT NULLchecks. The admin path now uses a dedicatedinsertOptimisticDataItem(INSERTwith the root atom hardcodedNULL) and the unbundler usesupsertNewDataItem(atomic root-atomUPDATEwith aCOALESCE-protected safety net). GraphQL signature/owner resolvers guard against incomplete root-atom rows and returnundefinedwith a warning rather than throwing. The/ar-io/admin/debugSQLite snapshot is now cached forGET_DEBUG_INFO_CACHE_TTL_MS(default 5 min, set0to disable), since each call runs unfilteredCOUNT(*)scans on the SQLite worker and frequent polling was monopolizing the debug worker. -
@ardrive/turbo-sdkMoved to Runtime Dependencies (PE-9074):@ardrive/turbo-sdkis required at runtime by the HTTPSIG attestation upload path (src/lib/httpsig-upload.ts) but was declared indevDependencies. The production Dockerfile'syarn install --productionpruned it, producing a silentMODULE_NOT_FOUNDat upload time and a fallback to the L1 upload path — which fails when the gateway wallet has no AR balance. Now declared as a runtime dependency.
Docker image SHAs
ENVOY_IMAGE_TAG:69f2d933ef88180bb5c7690dcf0ebe9c1465fd6dCORE_IMAGE_TAG:c5fac97b930e51963d5fef4418d7488d10e11960CLICKHOUSE_AUTO_IMPORT_IMAGE_TAG:8a1c0c55ed712e283b55b87f2bc8c7111bbc0482LITESTREAM_IMAGE_TAG:be121fc0ae24a9eb7cdb2b92d01f047039b5f5e8OBSERVER_IMAGE_TAG:ddd3a9c15e426c84da24c9fb7a1107620ccc27c1