What's Changed
- Nessie: Remove compile-time Hadoop dependency by @nastra in #7054
- Core: Fix deprecation message by @nastra in #7104
- Build: Update ORC to 1.8.3 by @williamhyun in #7124
- AWS: Use Apache HTTP client as default AWS HTTP client by @singhpk234 in #7119
- AWS: Enable virtual-host-style requests for MinioContainer by @nastra in #7125
- Flink: Bump to Flink 1.15.3 by @Fokko in #7059
- Flink: Bump to Flink 1.16.1 by @Fokko in #7057
- Core: Use unknown report type for forward-compatibility by @nastra in #7145
- Aliyun: Remove AssertHelpers by @liuxiaocs7 in #7116
- dell: remove usage of AssertHelpers by @liuxiaocs7 in #7143
- Core: Minor refactoring of PartitionsTable by @ajantha-bhat in #6975
- Build: Let RevAPI compare against 1.2.0 by @nastra in #7155
- MR: Remove deprecate AssertHelpers by @liuxiaocs7 in #7159
- Core: Remove deprecated validation APIs in MergingSnapshotProducer by @amogh-jahagirdar in #7150
- data: Remove AssertHelpers Usage by @liuxiaocs7 in #7134
- Flink:fix flink streaming query problem [ Cannot get a client from a closed pool] by @xuzhiwen1255 in #6614
- Spark 3.3: Remove use of deprecated SparkFilesScan by @szehon-ho in #7106
- Docs: Add
rest
to the catalog configuration by @Fokko in #7126 - Contributing Docs: Add section for testing code by @nastra in #7131
- Core, API: View Version implementation by @amogh-jahagirdar in #6861
- Update defaults of max-concurrent-file-group-rewrites to 5 by @karuppayya in #6907
- Flink: fixed Cloneable not implemented on CatalogLoader by @xuzhiwen1255 in #7168
- Core: Refactor actions results by @ajantha-bhat in #7089
- Docs: update doc to read easier by @joonsun-baek in #7167
- API: Fix retainAll and removeAll in CharSequenceSet by @zhongyujiang in #7133
- Spark 3.3: Support metadata column in the changelog table by @flyrain in #7152
- Spark 3.2: Support metadata column in the changelog table by @flyrain in #7178
- Flink: Backport #6614 to Flink 1.15 by @xuzhiwen1255 in #7165
- Core: Remove deprecated code from 1.2.0 by @nastra in #7156
- S3 Credentials provider support in DefaultAwsClientFactory #7063 by @dpaani in #7066
- Core: Move InMemoryCatalog from test to core by @nastra in #7185
- Doc: Retypeset the Flink document by @hililiwei in #7099
- Core: Store split offset for delete files by @singhpk234 in #7011
- Flink: Backport #6614 to Flink 1.14 by @xuzhiwen1255 in #7166
- Core, Hive: Support pluggable ClientPool by @lirui-apache in #6698
- AWS: Remove deprecated AssertHelpers by @liuxiaocs7 in #7195
- Spark: Support loading function as FunctionCatalog in SparkSessionCatalog by @bowenliang123 in #7153
- Flink: Implement data statistics operator to collect traffic distribution for guiding smart shuffling by @yegangy0718 in #6382
- Build: Move RevApi breakage to correct version by @nastra in #7223
- Ability to add multiple metrics reporters to scan by @karuppayya in #6919
- Spark 3.3: Use ProcedureInput in AncestorsOfProcedure by @aokolnychyi in #7177
- Core: Parse snapshot-id as long in remove-statistics update by @nastra in #7235
- Bump Nessie to 0.54.0 by @snazy in #7146
- Optimized spark vectorized read parquet decimal by @ConeyLiu in #3249
- Core: Optimize S3 layout of Datafiles by expanding first character set of the hash by @singhpk234 in #7128
- AWS: Prevent token refresh scheduling on every sign request by @nastra in #7270
- Disable local credentials if remote signing is enabled by @danielcweeks in #7230
- Spark: Revert "Spark: Add "Iceberg" prefix to SparkTable name string for SparkUI (#5629) by @amogh-jahagirdar in #7273
- Spark: broadcast table instead of file IO in rewrite manifests by @bryanck in #7263
- AWS: abort S3 input stream on close if not EOS by @bryanck in #7262
- Spark 3.2: Use ProcedureInput in AncestorsOfProcedure and AddFilesProcedure by @aokolnychyi in #7260
- Spark 3.3: Dataset writes for position deletes by @szehon-ho in #7029
- REST: fix previous locations for refs-only load by @bryanck in #7284
- Core: Fix flakiness in HadoopFileIOTest by @nastra in #7253
- Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling by @yegangy0718 in #7269
- AWS: Make AuthSession cache static by @nastra in #7289
- Core: Require namespace when creating table using InMemoryCatalog by @nastra in #7252
- Refactor PartitionsTable planning by @dramaticlly in #7190
- Flink: Introduce Flink 1.17 by @hililiwei in #7254
- AWS: Check commit status after failed commit if AWS client performed retries by @ChristinaTech in #7198
- Core: Fix errorprone warning by @ajantha-bhat in #7286
- Bump Nessie to 0.56.0 by @snazy in #7283
- Build: Bump actions/stale from 7.0.0 to 8.0.0 by @dependabot in #7200
- Build: Bump org.apache.hadoop:hadoop-client from 3.3.4 to 3.3.5 by @dependabot in #7201
- Spark: apply rewrite manifest action fix to 3.1,3.2 by @bryanck in #7296
- Build: Spark version of
iceberg-delta-lake
to 3.3.2 by @doki23 in #7199 - Nessie: Use latest hash for catalog APIs by @ajantha-bhat in #6789
- Support vectorized reading int96 timestamps in imported data by @yabola in #6962
- Flink: Expose write-parallelism in SQL Hints by @hililiwei in #7039
- Nessie: Fix testcase failures by @ajantha-bhat in #7320
- Flink: move the classes from flink.sink.shuffle.statistics pkg to one level up as flink.sink.shuffle pkg by @stevenzwu in #7322
- Spark 3.3: Add doc for the changelog view procedure. by @flyrain in #7147
- Bump Nessie from 0.56.0 to 0.57.0 by @snazy in #7323
- Flink 1.15 1.17: Port Expose write-parallelism in SQL Hints to 1.15 & 1.17 by @hililiwei in #7327
- Update issue template for 1.2.1 release by @danielcweeks in #7331
- Core: Fix SnapshotProducer#targetBranch's exception message by @zhongyujiang in #7315
- Bump Gradle from 8.0.2 to 8.1 by @snazy in #7333
- Build: Fix flaky checkstyle issue by @ajantha-bhat in #7321
- [Infra] Update vote mail sample in source-release.sh by @gaborkaszab in #7330
- Core: Add missing metrics reporters when creating BaseTable by @nastra in #7341
- Core, Spark 3.3: Add FileRewriter API by @aokolnychyi in #7175
- Spark - Accept an
output-spec-id
that allows writing to a desired partition spec by @gustavoatt in #7120 - [ORC][Spark] - Support selected vector with ORC reader on the row and batch reader by @pavibhai in #7197
- Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode by @chenjunjiedada in #7338
- Throw NoSuchIcebergTableException instead of ValidationException in G… by @ericlgoodman in #7277
- Build: Bump Airlift from 0.21 to 0.24 by @Fokko in #7347
- Docs: clarify Hive on Tez configuration by @preaudc in #7282
- Spark -Simplify checks of output-spec-id in SparkWriteConf by @gustavoatt in #7348
- Fix SetDefaultPartitionSpec to use specId instead of schemaId by @dramaticlly in #7350
- Core, Spark: Make ObjectStoreLocationProvider serializable by @singhpk234 in #7353
- Core: Parameterize RewriteDataFile's CommitService by @szehon-ho in #7343
- Core: Fix flaky TestParallelIterable test by @amogh-jahagirdar in #7372
- Flink: Apply row level filtering by @Fokko in #7109
- Spark: Surface better error message during streaming planning when checkpoint snapshot not found by @amogh-jahagirdar in #6480
- Flink: backport #7338 to 1.16 and 1.15 by @chenjunjiedada in #7373
- Spark 3.4: Initial support by @aokolnychyi in #7378
- Honor spark case sensitivity in ALTER TABLE.. ORDERED BY by @karuppayya in #7324
- Spark 3.3: Surface better error message during streaming planning when checkpoint snapshot not found by @aokolnychyi in #7381
- Spark: Remove Spark 2.4 by @Fokko in #7385
- Build: Bump Hive to 2.3.9 by @Fokko in #7374
- Core: Introduce CompositeMetricsReporter by @nastra in #7337
- Flink: Use starting sequence number by default when rewriting data files by @linyanghao in #7218
- Flink: Backport #6382 and #7269 to 1.15 for shuffle operator by @yegangy0718 in #7400
- Flink: Backport row filter into 1.15 and 1.16 by @Fokko in #7397
- Spark 3.3: support rate limit in Spark Streaming by @singhpk234 in #4479
- Enumerate configs that should be respected in REST table load response by @danielcweeks in #7401
- Doc: Add a page explaining migration from other table formats to iceberg by @JonasJ-ap in #6600
- Doc: Fix typo in hive_migration.md by @JonasJ-ap in #7407
- Spark: Fix failing SS UT by @singhpk234 in #7414
- Flink: Backport #7218 to 1.15 and 1.17 by @linyanghao in #7404
- MR: Fix IndexOutOfBounds by skip filter translation if there are no leaves by @edgarRd in #7123
- Add mkdocstrings by @LuigiCerone in #7108
- AWS: Fix default warehouse path in Dynamodb catalog by @munendrasn in #7358
- Flink: sync 1.16 with 1.17 for backports missed or not ported identically by @stevenzwu in #7403
- Flink: sync 1.15 with 1.17 for backports missed or not ported identically by @stevenzwu in #7402
- Views: Clean up and clarify the view spec by @rdblue in #7416
- Docs: Separate page for Branching and Tagging by @amogh-jahagirdar in #6723
- Views: Fix SQL view representation field name by @rdblue in #7417
- Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 by @pvary in #6570
- Spark: Backport #6480 to Spark 3.2 and Spark 3.1 by @amogh-jahagirdar in #7425
- API, Core: Move schemaID from ViewRepresentation to ViewVersion and make it required by @amogh-jahagirdar in #7421
- Spark 3.4: Relax constraints in SparkPartitioningAwareScan by @aokolnychyi in #7423
- Core: Extract REST metrics reporter into its own class by @nastra in #7339
- Spark 3.4: Add tests for SPJ when partition keys mismatch by @aokolnychyi in #7424
- Cherry pick Order case sensitivity changes to 3.4 by @karuppayya in #7380
- Build: Run Iceberg with JDK 17 by @singhpk234 in #7391
- Views: Move 'operation' check to ViewVersion by @nastra in #7428
- Updated README.md to support Java 17 by @911432 in #7434
- Hive: Clean up expired metastore clients by @frankliee in #7310
- Core: Make TableScanContext immutable by @nastra in #5985
- Core: Move table-creation-without-namespace-test to CatalogTests by @nastra in #7349
- Spark: Refactor SparkReadConf use primitive type for confs with default values by @singhpk234 in #7429
- Spark 3.4: Remove deprecated classes by @aokolnychyi in #7448
- Arrow: Convert dict encoded vectors to their expected Arrow vector types by @nastra in #3024
- Spark: Fixed Typo in Spark Read Option Vectorization Javadoc by @vishnukumarsinha in #7439
- Spark 3.4: Remove no longer needed write extensions by @aokolnychyi in #7443
- Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction by @JonasJ-ap in #7450
- Delta Migration: Correct
snapshotDataFilesCount
in SnapshotDeltaLakeTable.Result and use Immutable to implement it by @JonasJ-ap in #7454 - Spark 3.4: Support rate limit in Spark Streaming by @singhpk234 in #7422
- Spark: Fix Failing SS ratelimit UT by @singhpk234 in #7470
- Spark 3.4: Switch to built-in DELETE implementation by @aokolnychyi in #7453
- Spark: Remove Usage of deprecated AssertHelpers in spark-sql by @liuxiaocs7 in #7486
- Spark 3.3: Remove deprecated FileScanTaskSetManager by @nastra in #7489
- Hive: Support connecting to multiple Hive-Catalog by @szehon-ho in #7441
- Spark: Add read/write support for UUIDs by @nastra in #7399
- Hive: Remove deprecated AssertHelpers by @liuxiaocs7 in #7482
- Flink: Remove deprecated AssertHelpers by @liuxiaocs7 in #7481
- Spark: Remove deprecated AssertHelpers by @liuxiaocs7 in #7483
- Core: Remove compile-time dependency to ResolvingFileIO by @nastra in #7488
- Update documentation to reflect new catalog features by @dramaticlly in #7433
- Spark 3.3: Add read and write support for UUIDs by @nastra in #7496
- Spark 3.2: Add read and write support for UUIDs by @nastra in #7497
- Spark 3.1: Add read and write support for UUIDs by @nastra in #7508
- API: Remove deprecated AssertHelpers usage by @akshayakp97 in #7468
- API: Update java doc for listTables and listViews by @ajantha-bhat in #7336
- Core: Simplify Partitions table partition-coercion code by @szehon-ho in #7503
- Core, Spark: Add configuration to control case sensitivity of CachingCatalog by @wypoon in #7469
- AWS: Add finalizer to S3FileIO by @nastra in #7513
- Move all S3FileIO related properties into a separate class S3FileIOProperties by @akshayakp97 in #7505
- Spark 3.4: Handle skew in writes by @aokolnychyi in #7520
- Spec: Update View spec to reflect that schema is defined at the version level and is required by @amogh-jahagirdar in #7485
- Spark 3.4: Implement rewrite position deletes by @szehon-ho in #7389
- Spark 3.4: Tests for coalescing small writing tasks by @aokolnychyi in #7532
- Core: Support delete file stats in partitions metadata table by @ajantha-bhat in #6661
- Make SparkCatalog use a case sensitive CachingCatalog by default. by @wypoon in #7535
- Core: Remove duplicate check for ManifestEntry.dataSequenceNumber() by @gaborkaszab in #7538
- Remove usages of S3 fields from AwsProperties within s3 package by @akshayakp97 in #7534
- Flink: Fix Typo in Namespace by @liuxiaocs7 in #7527
- Arrow: Fix errorprone warnings by @ajantha-bhat in #7498
- Spark: Use UUIDUtil.convertToByteBuffer to avoid rewinding buffer by @nastra in #7525
- Build: Bump me.champeau.jmh:jmh-gradle-plugin from 0.7.0 to 0.7.1 by @dependabot in #7408
- API, Core: Make RewriteFiles flexible by @aokolnychyi in #7501
- AWS: Add missing line - assign param S3FileIOProperties inside constructor by @akshayakp97 in #7559
- Build: Update RoaringBitmap to 0.9.44 by @aokolnychyi in #7563
- Core: Refactor naming in MergingSnapshotProducer by @aokolnychyi in #7564
- Fix Typo and Polish in aws.md by @skytin1004 in #7548
- Spark 3.3: Uniquess validation when computing updates of changelogs by @flyrain in #7388
- Core: Add finalizer to ResolvingFileIO by @nastra in #7536
- Core, AWS: Add flag to control whether initialization stack trace should be created in S3FileIO by @nastra in #7552
- Spark 3.2/3.4: Uniqueness validation when computing updates of changelogs by @flyrain in #7573
- AWS: create HttpClientProperties, move s3 related methods into S3FileIOProperties by @akshayakp97 in #7562
- Doc: Updates Writing to Partitioned Table Spark Docs by @RussellSpitzer in #7499
- Infra: Update slack invite link by @ajantha-bhat in #7583
- Nessie: Bump Nessie dependencies from 0.57.0 to 0.58.1 by @dimas-b in #7579
- Docs: Add identifier to each Markdown file under docs by @nastra in #7575
- Core: Check for all specs in partitionsTable by @ajantha-bhat in #7551
- API, Flink: StructProjection returns null projection object for null nested struct value by @stevenzwu in #7517
- Build: Upgrade Gradle to 8.1.1 by @XN137 in #7610
- Core: Remove deprecated AssertHelpers usage in catalog by @liuxiaocs7 in #7596
- Build: Bump Arrow from 11.0.0 to 12.0.0 by @ajantha-bhat in #7595
- Core: Remove deprecated AssertHelpers usage by @liuxiaocs7 in #7597
- Spark: Fix Parquet read benchmarks for Spark 3.3 + 3.4 by @nastra in #7587
- Docs: Improve readability on page Branching and Tagging by @zhangbutao in #7592
- Flink: change sink shuffle to use RowData as data type and statistics key type by @stevenzwu in #7494
- Flink: add toString, equals, hashCode overrides for RowDataProjection. by @stevenzwu in #7493
- Implement ReadableMetrics for Entries table by @dramaticlly in #7539
- Add unique JDBC application identifier and user agent header by @manisin in #7580
- Spark: Remove deprecated VectorizedSparkParquetReaders#buildReader API for 1.3.0 release by @amogh-jahagirdar in #7591
- Views: Update spec with expectations on versions, representations, and dialects by @wmoustafa in #7500
- Core: Allow deleting old partition spec columns in V1 by @Fokko in #7398
- API, Core, Spark:Add file groups failure in rewrite result by @waltczhang in #7361
- Docs: update documentation site link by @liuxiaocs7 in #7117
- AWS: Add S3FileIOAwsClientFactory with s3.client-factory-impl catalog property for S3FileIO by @akshayakp97 in #7590
- Core: Add FileIO tracker/closer to REST catalog by @nastra in #7487
- API, Core: Expose file and data sequence numbers through ContentFile by @gaborkaszab in #7555
- Spark 3.4: Avoid local sort for MERGE cardinality check by @aokolnychyi in #7558
- Spark 3.4: Fixup for RewritePositionDeleteFilesSparkAction by @szehon-ho in #7565
- Flink: Add retry limit for IcebergSource continuous split planning errors by @pvary in #7571
- Build: Bump com.fasterxml.jackson.core:jackson-annotations from 2.14.2 to 2.15.0 by @dependabot in #7601
- Disable Agg push down for incremental scan by @huaxingao in #7626
- Spark 3.4: Add RewritePositionDeleteFilesProcedure by @szehon-ho in #7572
- Remove Kyle and add bitsondatadev to collaborators .asf.yaml by @bitsondatadev in #7634
- Improve Error Handling to map Snowflake Exceptions into Iceberg Exceptions by @AnubhavSiddharth in #6952
- Flink: backport Add config for max allowed consecutive planning failures in IcebergSource before failing the job (#7571) to 1.16 and 1.15 by @pvary in #7629
- Flink: backport PR #7494. change sink shuffle to use RowData as data type and statistics key type by @stevenzwu in #7632
- Flink: backport PR #7493. add toString, equals, hashCode overrides for RowDataProjection by @stevenzwu in #7631
- Flink: Fixes flink sink failed due to updating partition spec by @ConeyLiu in #7171
- Core: Allow one data writer in BasePositionDeltaWriter by @aokolnychyi in #7648
- Spark 3.4: Cosmetic updates for SparkPositionDeltaWrite by @aokolnychyi in #7650
- Spark-3.4: Fix errorprone warning by @ajantha-bhat in #7654
- GCP, Pig: Switch tests to JUnit5 by @rakesh-das08 in #7647
- Spark 3.4: Fix NPE when create branch and tag on table without snapshot by @dramaticlly in #7652
- Spark 3.4: Split update into delete and insert for position deltas by @aokolnychyi in #7646
- Parquet: Update parquet to 1.13.1 by @singhpk234 in #7301
- Spark-3.4: Harmonize RewriteDataFilesSparkAction by @ajantha-bhat in #7630
- Spark 3.3, 3.4: use a deterministic where condition to make rewrite_data_files… by @ludlows in #6760
- Spark 3.2: backport Spark SQL extension on create/update/drop tags by @dramaticlly in #7662
- Spark: Backport fix NPE when create branch and tag on table without snapshot by @dramaticlly in #7659
- Core: Compacted position delete files should use the max data sequence number of source files by @szehon-ho in #7651
- Docs: RewritePositionDeleteFiles procedure by @szehon-ho in #7589
- OpenAPI responses should reference schemas by @snazy in #6699
- Core, Parquet: Remove Parquet dictionary encoding table property by @amogh-jahagirdar in #7665
- Build: Bump com.esotericsoftware:kryo-shaded from 4.0.2 to 4.0.3 by @dependabot in #7669
- Infra: Use the standard shadow plugin by @ajantha-bhat in #7681
- Spark 3.4: Add TimestampNTZ by @Fokko in #7553
- Spark 3.3: Backport RewritePositionDeleteFilesSparkAction (#7389) by @szehon-ho in #7684
- Spark 3.4: Distribution and ordering enhancements by @aokolnychyi in #7637
- Flink: Port #7171 to flink 1.17 by @ConeyLiu in #7680
- Flink: Port #7171 to flink 1.15 by @ConeyLiu in #7679
- Spark 3.3: Avoid local sort for MERGE cardinality check by @aokolnychyi in #7686
- Spark 3.3: Backport RewritePositionDeleteFilesProcedure (#7572) by @szehon-ho in #7687
- Nessie: Bump Nessie version from 0.58.1 to 0.59.0 by @ajantha-bhat in #7642
- Spark 3.3: Harmonize RewriteDataFilesSparkAction by @ajantha-bhat in #7676
- Core: Minor metadata table code harmonization for readable_metrics by @szehon-ho in #7613
New Contributors
- @joonsun-baek made their first contribution in #7167
- @dpaani made their first contribution in #7066
- @Polectron made their first contribution in #7163
- @bowenliang123 made their first contribution in #7153
- @yegangy0718 made their first contribution in #6382
- @deepyaman made their first contribution in #7242
- @ChristinaTech made their first contribution in #7198
- @doki23 made their first contribution in #7199
- @preaudc made their first contribution in #7282
- @linyanghao made their first contribution in #7218
- @911432 made their first contribution in #7434
- @frankliee made their first contribution in #7310
- @vishnukumarsinha made their first contribution in #7439
- @DarthData410 made their first contribution in #7462
- @akshayakp97 made their first contribution in #7468
- @skytin1004 made their first contribution in #7548
- @zhangbutao made their first contribution in #7592
- @wmoustafa made their first contribution in #7500
- @waltczhang made their first contribution in #7361
- @bitsondatadev made their first contribution in #7634
- @AnubhavSiddharth made their first contribution in #6952
- @rakesh-das08 made their first contribution in #7647
- @ludlows made their first contribution in #6760
Full Changelog: apache-iceberg-1.2.0...apache-iceberg-1.3.0