This is a recommended release due to cache safety improvements that prevent
caching incomplete data and enhance data validation reliability.
This release introduces significant robustness improvements with offset-aware
data sources, experimental datasets HTTP endpoint for analytics workloads, and
enhanced Parquet/Iceberg tooling. It also includes important fixes for data
validation and root parent traversal.
Added
- Offset-Aware Data Sources: Added two new offset-aware data sources that
leverage cached upstream offset attributes for improved performance:chunks-offset-aware(renamed fromchunks-data-itemwith backwards
compatibility) - enables automatic data item resolution within ANS-104
bundles using cached offsetstrusted-gateways-offset-aware- uses cached upstream offsets without
expensive searching for faster data retrieval
- Cache Skip Configuration: Added
SKIP_DATA_CACHEenvironment variable to
bypass cache retrieval and always fetch from upstream sources for testing and
debugging - Datasets HTTP Endpoint (Experimental): Added optional
/local/datasets
endpoint (disabled by default) for HTTP access to Parquet files and Iceberg
metadata, enabling remote DuckDB queries. Note: This feature is experimental
and subject to change - Datasets Proxy Configuration: Added configurable datasets proxy via Envoy
withDATASETS_PROXY_HOSTandDATASETS_PROXY_PORTenvironment variables - Parquet Repartitioning Tool: Added comprehensive
parquet-repartition
script supporting both tag-based and owner address-based partitioning with
height chunking and Iceberg metadata generation - Minimal Iceberg Metadata Generator: Added lightweight
generate-minimal-iceberg-metadatascript optimized for DuckDB compatibility
with HTTP URL support - Multi-Architecture Support: Added multi-arch support to ClickHouse
auto-import Docker image for broader platform compatibility
Changed
- Default Retrieval Order: Updated default
ON_DEMAND_RETRIEVAL_ORDERto
use newchunks-offset-awarename (backwards compatible with
chunks-data-item) - Iceberg Metadata Implementation: Replaced complex PyIceberg-based
implementation with minimal fastavro-based version for better performance and
DuckDB compatibility - Zero-Size Data Handling: Skip caching and indexing for zero-size data to
prevent unnecessary storage operations
Fixed
- Root Parent Traversal: Fixed RootParentDataSource to properly handle root
transactions without cached attributes - Data Size Validation: Added validation to prevent caching incomplete data
and prevent ID to hash mapping queue on partial stream errors - Parquet Export Issues: Fixed CSV column type specification to prevent
DuckDB type inference errors - ClickHouse Build Workflow: Updated build workflow to include missing file
paths
Container Images
- ar-io-core:
ghcr.io/ar-io/ar-io-core:95c324834538c984eee08dedadcc93f6f1c8a5f3 - ar-io-envoy:
ghcr.io/ar-io/ar-io-envoy:159d6467108122a3413c5ab45150d334dc9fb78f - ar-io-clickhouse-auto-import:
ghcr.io/ar-io/ar-io-clickhouse-auto-import:71bbc13161b69bc28c501d09f586513073d550fe - ar-io-litestream:
ghcr.io/ar-io/ar-io-litestream:be121fc0ae24a9eb7cdb2b92d01f047039b5f5e8 - ao-cu:
ghcr.io/permaweb/ao-cu:08436a88233f0247f3eb35979dd55163fd51a153