This is an optional release that significantly improves the ClickHouse ETL pipeline with better performance, reliability, and Apache Iceberg metadata support. While optional for most users, this release is important for anyone experimenting with Parquet exports and ClickHouse integration.
Added
-
Apache Iceberg Metadata Generation: Added
generate-iceberg-metadata
script to create Apache Iceberg table metadata for exported Parquet datasets, enabling compatibility with query engines like DuckDB and Spark. Controlled by newENABLE_ICEBERG_GENERATION
environment variable (default: false). Note: Iceberg metadata generation is still under active development and currently incomplete. -
HyperBEAM Sidecar Support: Added optional HyperBEAM container configuration with
.env.hb.example
template for running AO processes alongside the gateway. -
ETL Configuration Documentation: Documented existing ClickHouse auto-import environment variables in
.env.example
:CLICKHOUSE_AUTO_IMPORT_SLEEP_INTERVAL
- interval between import cycles (default: 3600 seconds)CLICKHOUSE_AUTO_IMPORT_HEIGHT_INTERVAL
- batch size in blocks (default: 10000)CLICKHOUSE_AUTO_IMPORT_MAX_ROWS_PER_FILE
- Parquet file size limit (default: 1000000)
Changed
- ETL Pipeline Architecture: Refactored the ClickHouse ETL pipeline for improved reliability and modularity:
- Implemented staging-based workflow to prevent data corruption
- Changed from API-based triggering to direct script execution
- Made L1 transaction export the default behavior
- Changed default export location from
data/parquet
todata/datasets/default
- Performance: Greatly improved query performance through better index usage in the refactored pipeline
- Stability: Fixed issue where the 'core' service would occasionally crash due to long-running SQLite queries
Docker Images
- ar-io-core:
ghcr.io/ar-io/ar-io-core:8c1f559a5d8cf8a0a9a4c577b56f7b989b467e62
- ar-io-envoy:
ghcr.io/ar-io/ar-io-envoy:da2abd14cdf3248db21673878c6f2c7b752a3850
- ar-io-clickhouse-auto-import:
ghcr.io/ar-io/ar-io-clickhouse-auto-import:5d06e824ce7d18764bed130025a3f493657cd39d
- ar-io-observer:
ghcr.io/ar-io/ar-io-observer:6cb911e4ac9fd04a1795144f86b77ad0174ee6d9
- ar-io-litestream:
ghcr.io/ar-io/ar-io-litestream:be121fc0ae24a9eb7cdb2b92d01f047039b5f5e8
- ao-cu:
ghcr.io/permaweb/ao-cu:08436a88233f0247f3eb35979dd55163fd51a153