ar-io/ar-io-node r50 on GitHub

This is a recommended release due to cache safety improvements that prevent
caching incomplete data and enhance data validation reliability.

This release introduces significant robustness improvements with offset-aware
data sources, experimental datasets HTTP endpoint for analytics workloads, and
enhanced Parquet/Iceberg tooling. It also includes important fixes for data
validation and root parent traversal.

Added

Offset-Aware Data Sources: Added two new offset-aware data sources that
leverage cached upstream offset attributes for improved performance:
- chunks-offset-aware (renamed from chunks-data-item with backwards
  compatibility) - enables automatic data item resolution within ANS-104
  bundles using cached offsets
- trusted-gateways-offset-aware - uses cached upstream offsets without
  expensive searching for faster data retrieval
Cache Skip Configuration: Added SKIP_DATA_CACHE environment variable to
bypass cache retrieval and always fetch from upstream sources for testing and
debugging
Datasets HTTP Endpoint (Experimental): Added optional /local/datasets
endpoint (disabled by default) for HTTP access to Parquet files and Iceberg
metadata, enabling remote DuckDB queries. Note: This feature is experimental
and subject to change
Datasets Proxy Configuration: Added configurable datasets proxy via Envoy
with DATASETS_PROXY_HOST and DATASETS_PROXY_PORT environment variables
Parquet Repartitioning Tool: Added comprehensive parquet-repartition
script supporting both tag-based and owner address-based partitioning with
height chunking and Iceberg metadata generation
Minimal Iceberg Metadata Generator: Added lightweight
generate-minimal-iceberg-metadata script optimized for DuckDB compatibility
with HTTP URL support
Multi-Architecture Support: Added multi-arch support to ClickHouse
auto-import Docker image for broader platform compatibility

Changed

Default Retrieval Order: Updated default ON_DEMAND_RETRIEVAL_ORDER to
use new chunks-offset-aware name (backwards compatible with
chunks-data-item)
Iceberg Metadata Implementation: Replaced complex PyIceberg-based
implementation with minimal fastavro-based version for better performance and
DuckDB compatibility
Zero-Size Data Handling: Skip caching and indexing for zero-size data to
prevent unnecessary storage operations

Fixed

Root Parent Traversal: Fixed RootParentDataSource to properly handle root
transactions without cached attributes
Data Size Validation: Added validation to prevent caching incomplete data
and prevent ID to hash mapping queue on partial stream errors
Parquet Export Issues: Fixed CSV column type specification to prevent
DuckDB type inference errors
ClickHouse Build Workflow: Updated build workflow to include missing file
paths

Container Images

ar-io-core: ghcr.io/ar-io/ar-io-core:95c324834538c984eee08dedadcc93f6f1c8a5f3
ar-io-envoy: ghcr.io/ar-io/ar-io-envoy:159d6467108122a3413c5ab45150d334dc9fb78f
ar-io-clickhouse-auto-import: ghcr.io/ar-io/ar-io-clickhouse-auto-import:71bbc13161b69bc28c501d09f586513073d550fe
ar-io-litestream: ghcr.io/ar-io/ar-io-litestream:be121fc0ae24a9eb7cdb2b92d01f047039b5f5e8
ao-cu: ghcr.io/permaweb/ao-cu:08436a88233f0247f3eb35979dd55163fd51a153

ar-io/ar-io-node r50 Release 50 on GitHub

Added

Changed

Fixed

Container Images

ar-io/ar-io-node r50
Release 50

on GitHub