- API
- Core
- Use V2 format by default in new tables (#8381)
- Use
zstd
compression for Parquet by default in new tables (#8593) - Add strict metadata cleanup mode and enable it by default (#8397) (#8599)
- Avoid generating huge manifests during commits (#6335)
- Add a writer for unordered position deletes (#7692)
- Optimize
DeleteFileIndex
(#8157) - Optimize lookup in
DeleteFileIndex
without useful bounds (#8278) - Optimize split offsets handling (#8336)
- Optimize computing user-facing state in data tasks (#8346)
- Don't persist useless file and position bounds for deletes (#8360)
- Don't persist counts for paths and positions in position delete files (#8590)
- Support setting system-level properties via environmental variables (#5659)
- Add JSON parser for
ContentFile
andFileScanTask
(#6934) - Add REST spec and request for commits to multiple tables (#7741)
- Add REST API for committing changes against multiple tables (#7569)
- Default to exponential retry strategy in REST client (#8366)
- Support registering tables with REST session catalog (#6512)
- Add last updated timestamp and snapshot ID to partitions metadata table (#7581)
- Add total data size to partitions metadata table (#7920)
- Extend
ResolvingFileIO
to support bulk operations (#7976) - Key metadata in Avro format (#6450)
- Add AES GCM encryption stream (#3231)
- Fix a connection leak in streaming delete filters (#8132)
- Fix lazy snapshot loading history (#8470)
- Fix unicode handling in HTTPClient (#8046)
- Fix paths for unpartitioned specs in writers (#7685)
- Fix OOM caused by Avro decoder caching (#7791)
- Spark
- Added support for Spark 3.5
- Code for DELETE, UPDATE, and MERGE commands has moved to Spark, and all related extensions have been dropped from Iceberg.
- Support for WHEN NOT MATCHED BY SOURCE clause in MERGE.
- Column pruning in merge-on-read operations.
- Ability to request a bigger advisory partition size for the final write to produce well-sized output files without harming the job parallelism.
- Dropped support for Spark 3.1
- Deprecated support for Spark 3.2
- Support vectorized reads for merge-on-read operations in Spark 3.4 and 3.5 (#8466)
- Increase default advisory partition size for writes in Spark 3.5 (#8660)
- Support distributed planning in Spark 3.4 and 3.5 (#8123)
- Support pushing down system functions by V2 filters in Spark 3.4 and 3.5 (#7886)
- Support fanout position delta writers in Spark 3.4 and 3.5 (#7703)
- Use fanout writers for unsorted tables by default in Spark 3.5 (#8621)
- Support multiple shuffle partitions per file in compaction in Spark 3.4 and 3.5 (#7897)
- Output net changes across snapshots for carryover rows in CDC (#7326)
- Display read metrics on Spark SQL UI (#7447) (#8445)
- Adjust split size to benefit from cluster parallelism in Spark 3.4 and 3.5 (#7714)
- Add
fast_forward
procedure (#8081) - Support filters when rewriting position deletes (#7582)
- Support setting current snapshot with ref (#8163)
- Make backup table name configurable during migration (#8227)
- Add write and SQL options to override compression config (#8313)
- Correct partition transform functions to match the spec (#8192)
- Enable extra commit properties with metadata delete (#7649)
- Added support for Spark 3.5
- Flink
- Add possibility of ordering the splits based on the file sequence number (#7661)
- Fix serialization in
TableSink
with anonymous object (#7866) - Switch to
FileScanTaskParser
for JSON serialization ofIcebergSourceSplit
(#7978) - Custom partitioner for bucket partitions (#7161)
- Implement data statistics coordinator to aggregate data statistics from operator subtasks (#7360)
- Support alter table column (#7628)
- Parquet
- ORC
- Handle filters with transforms by assuming the filter matches (#8244)
- Vendor Integrations
- GCP: Fix single byte read in
GCSInputStream
(#8071) - GCP: Add properties for OAtuh2 and update library (#8073)
- GCP: Add prefix and bulk operations to
GCSFileIO
(#8168) - GCP: Add bundle jar for GCP-related dependencies (#8231)
- GCP: Add range reads to
GCSInputStream
(#8301) - AWS: Add bundle jar for AWS-related dependencies (#8261)
- AWS: support config storage class for
S3FileIO
(#8154) - AWS: Add
FileIO
tracker/closer to Glue catalog (#8315) - AWS: Update S3 signer spec to allow an optional string body in
S3SignRequest
(#8361) - Azure: Add
FileIO
that supports ADLSv2 storage (#8303) - Azure: Make
ADLSFileIO
implementDelegateFileIO
(#8563) - Nessie: Provide better commit message on table registration (#8385)
- GCP: Fix single byte read in
- Dependencies
- Bump Nessie to 0.71.0
- Bump ORC to 1.9.1
- Bump Arrow to 12.0.1
- Bump AWS Java SDK to 2.20.131