apache/iceberg apache-iceberg-1.3.0 on GitHub

What's Changed

Nessie: Remove compile-time Hadoop dependency by @nastra in #7054
Core: Fix deprecation message by @nastra in #7104
Build: Update ORC to 1.8.3 by @williamhyun in #7124
AWS: Use Apache HTTP client as default AWS HTTP client by @singhpk234 in #7119
AWS: Enable virtual-host-style requests for MinioContainer by @nastra in #7125
Flink: Bump to Flink 1.15.3 by @Fokko in #7059
Flink: Bump to Flink 1.16.1 by @Fokko in #7057
Core: Use unknown report type for forward-compatibility by @nastra in #7145
Aliyun: Remove AssertHelpers by @liuxiaocs7 in #7116
dell: remove usage of AssertHelpers by @liuxiaocs7 in #7143
Core: Minor refactoring of PartitionsTable by @ajantha-bhat in #6975
Build: Let RevAPI compare against 1.2.0 by @nastra in #7155
MR: Remove deprecate AssertHelpers by @liuxiaocs7 in #7159
Core: Remove deprecated validation APIs in MergingSnapshotProducer by @amogh-jahagirdar in #7150
data: Remove AssertHelpers Usage by @liuxiaocs7 in #7134
Flink:fix flink streaming query problem [ Cannot get a client from a closed pool] by @xuzhiwen1255 in #6614
Spark 3.3: Remove use of deprecated SparkFilesScan by @szehon-ho in #7106
Docs: Add rest to the catalog configuration by @Fokko in #7126
Contributing Docs: Add section for testing code by @nastra in #7131
Core, API: View Version implementation by @amogh-jahagirdar in #6861
Update defaults of max-concurrent-file-group-rewrites to 5 by @karuppayya in #6907
Flink: fixed Cloneable not implemented on CatalogLoader by @xuzhiwen1255 in #7168
Core: Refactor actions results by @ajantha-bhat in #7089
Docs: update doc to read easier by @joonsun-baek in #7167
API: Fix retainAll and removeAll in CharSequenceSet by @zhongyujiang in #7133
Spark 3.3: Support metadata column in the changelog table by @flyrain in #7152
Spark 3.2: Support metadata column in the changelog table by @flyrain in #7178
Flink: Backport #6614 to Flink 1.15 by @xuzhiwen1255 in #7165
Core: Remove deprecated code from 1.2.0 by @nastra in #7156
S3 Credentials provider support in DefaultAwsClientFactory #7063 by @dpaani in #7066
Core: Move InMemoryCatalog from test to core by @nastra in #7185
Doc: Retypeset the Flink document by @hililiwei in #7099
Core: Store split offset for delete files by @singhpk234 in #7011
Flink: Backport #6614 to Flink 1.14 by @xuzhiwen1255 in #7166
Core, Hive: Support pluggable ClientPool by @lirui-apache in #6698
AWS: Remove deprecated AssertHelpers by @liuxiaocs7 in #7195
Spark: Support loading function as FunctionCatalog in SparkSessionCatalog by @bowenliang123 in #7153
Flink: Implement data statistics operator to collect traffic distribution for guiding smart shuffling by @yegangy0718 in #6382
Build: Move RevApi breakage to correct version by @nastra in #7223
Ability to add multiple metrics reporters to scan by @karuppayya in #6919
Spark 3.3: Use ProcedureInput in AncestorsOfProcedure by @aokolnychyi in #7177
Core: Parse snapshot-id as long in remove-statistics update by @nastra in #7235
Bump Nessie to 0.54.0 by @snazy in #7146
Optimized spark vectorized read parquet decimal by @ConeyLiu in #3249
Core: Optimize S3 layout of Datafiles by expanding first character set of the hash by @singhpk234 in #7128
AWS: Prevent token refresh scheduling on every sign request by @nastra in #7270
Disable local credentials if remote signing is enabled by @danielcweeks in #7230
Spark: Revert "Spark: Add "Iceberg" prefix to SparkTable name string for SparkUI (#5629) by @amogh-jahagirdar in #7273
Spark: broadcast table instead of file IO in rewrite manifests by @bryanck in #7263
AWS: abort S3 input stream on close if not EOS by @bryanck in #7262
Spark 3.2: Use ProcedureInput in AncestorsOfProcedure and AddFilesProcedure by @aokolnychyi in #7260
Spark 3.3: Dataset writes for position deletes by @szehon-ho in #7029
REST: fix previous locations for refs-only load by @bryanck in #7284
Core: Fix flakiness in HadoopFileIOTest by @nastra in #7253
Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling by @yegangy0718 in #7269
AWS: Make AuthSession cache static by @nastra in #7289
Core: Require namespace when creating table using InMemoryCatalog by @nastra in #7252
Refactor PartitionsTable planning by @dramaticlly in #7190
Flink: Introduce Flink 1.17 by @hililiwei in #7254
AWS: Check commit status after failed commit if AWS client performed retries by @ChristinaTech in #7198
Core: Fix errorprone warning by @ajantha-bhat in #7286
Bump Nessie to 0.56.0 by @snazy in #7283
Build: Bump actions/stale from 7.0.0 to 8.0.0 by @dependabot in #7200
Build: Bump org.apache.hadoop:hadoop-client from 3.3.4 to 3.3.5 by @dependabot in #7201
Spark: apply rewrite manifest action fix to 3.1,3.2 by @bryanck in #7296
Build: Spark version of iceberg-delta-lake to 3.3.2 by @doki23 in #7199
Nessie: Use latest hash for catalog APIs by @ajantha-bhat in #6789
Support vectorized reading int96 timestamps in imported data by @yabola in #6962
Flink: Expose write-parallelism in SQL Hints by @hililiwei in #7039
Nessie: Fix testcase failures by @ajantha-bhat in #7320
Flink: move the classes from flink.sink.shuffle.statistics pkg to one level up as flink.sink.shuffle pkg by @stevenzwu in #7322
Spark 3.3: Add doc for the changelog view procedure. by @flyrain in #7147
Bump Nessie from 0.56.0 to 0.57.0 by @snazy in #7323
Flink 1.15 1.17: Port Expose write-parallelism in SQL Hints to 1.15 & 1.17 by @hililiwei in #7327
Update issue template for 1.2.1 release by @danielcweeks in #7331
Core: Fix SnapshotProducer#targetBranch's exception message by @zhongyujiang in #7315
Bump Gradle from 8.0.2 to 8.1 by @snazy in #7333
Build: Fix flaky checkstyle issue by @ajantha-bhat in #7321
[Infra] Update vote mail sample in source-release.sh by @gaborkaszab in #7330
Core: Add missing metrics reporters when creating BaseTable by @nastra in #7341
Core, Spark 3.3: Add FileRewriter API by @aokolnychyi in #7175
Spark - Accept an output-spec-id that allows writing to a desired partition spec by @gustavoatt in #7120
[ORC][Spark] - Support selected vector with ORC reader on the row and batch reader by @pavibhai in #7197
Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode by @chenjunjiedada in #7338
Throw NoSuchIcebergTableException instead of ValidationException in G… by @ericlgoodman in #7277
Build: Bump Airlift from 0.21 to 0.24 by @Fokko in #7347
Docs: clarify Hive on Tez configuration by @preaudc in #7282
Spark -Simplify checks of output-spec-id in SparkWriteConf by @gustavoatt in #7348
Fix SetDefaultPartitionSpec to use specId instead of schemaId by @dramaticlly in #7350
Core, Spark: Make ObjectStoreLocationProvider serializable by @singhpk234 in #7353
Core: Parameterize RewriteDataFile's CommitService by @szehon-ho in #7343
Core: Fix flaky TestParallelIterable test by @amogh-jahagirdar in #7372
Flink: Apply row level filtering by @Fokko in #7109
Spark: Surface better error message during streaming planning when checkpoint snapshot not found by @amogh-jahagirdar in #6480
Flink: backport #7338 to 1.16 and 1.15 by @chenjunjiedada in #7373
Spark 3.4: Initial support by @aokolnychyi in #7378
Honor spark case sensitivity in ALTER TABLE.. ORDERED BY by @karuppayya in #7324
Spark 3.3: Surface better error message during streaming planning when checkpoint snapshot not found by @aokolnychyi in #7381
Spark: Remove Spark 2.4 by @Fokko in #7385
Build: Bump Hive to 2.3.9 by @Fokko in #7374
Core: Introduce CompositeMetricsReporter by @nastra in #7337
Flink: Use starting sequence number by default when rewriting data files by @linyanghao in #7218
Flink: Backport #6382 and #7269 to 1.15 for shuffle operator by @yegangy0718 in #7400
Flink: Backport row filter into 1.15 and 1.16 by @Fokko in #7397
Spark 3.3: support rate limit in Spark Streaming by @singhpk234 in #4479
Enumerate configs that should be respected in REST table load response by @danielcweeks in #7401
Doc: Add a page explaining migration from other table formats to iceberg by @JonasJ-ap in #6600
Doc: Fix typo in hive_migration.md by @JonasJ-ap in #7407
Spark: Fix failing SS UT by @singhpk234 in #7414
Flink: Backport #7218 to 1.15 and 1.17 by @linyanghao in #7404
MR: Fix IndexOutOfBounds by skip filter translation if there are no leaves by @edgarRd in #7123
Add mkdocstrings by @LuigiCerone in #7108
AWS: Fix default warehouse path in Dynamodb catalog by @munendrasn in #7358
Flink: sync 1.16 with 1.17 for backports missed or not ported identically by @stevenzwu in #7403
Flink: sync 1.15 with 1.17 for backports missed or not ported identically by @stevenzwu in #7402
Views: Clean up and clarify the view spec by @rdblue in #7416
Docs: Separate page for Branching and Tagging by @amogh-jahagirdar in #6723
Views: Fix SQL view representation field name by @rdblue in #7417
Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 by @pvary in #6570
Spark: Backport #6480 to Spark 3.2 and Spark 3.1 by @amogh-jahagirdar in #7425
API, Core: Move schemaID from ViewRepresentation to ViewVersion and make it required by @amogh-jahagirdar in #7421
Spark 3.4: Relax constraints in SparkPartitioningAwareScan by @aokolnychyi in #7423
Core: Extract REST metrics reporter into its own class by @nastra in #7339
Spark 3.4: Add tests for SPJ when partition keys mismatch by @aokolnychyi in #7424
Cherry pick Order case sensitivity changes to 3.4 by @karuppayya in #7380
Build: Run Iceberg with JDK 17 by @singhpk234 in #7391
Views: Move 'operation' check to ViewVersion by @nastra in #7428
Updated README.md to support Java 17 by @911432 in #7434
Hive: Clean up expired metastore clients by @frankliee in #7310
Core: Make TableScanContext immutable by @nastra in #5985
Core: Move table-creation-without-namespace-test to CatalogTests by @nastra in #7349
Spark: Refactor SparkReadConf use primitive type for confs with default values by @singhpk234 in #7429
Spark 3.4: Remove deprecated classes by @aokolnychyi in #7448
Arrow: Convert dict encoded vectors to their expected Arrow vector types by @nastra in #3024
Spark: Fixed Typo in Spark Read Option Vectorization Javadoc by @vishnukumarsinha in #7439
Spark 3.4: Remove no longer needed write extensions by @aokolnychyi in #7443
Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction by @JonasJ-ap in #7450
Delta Migration: Correct snapshotDataFilesCount in SnapshotDeltaLakeTable.Result and use Immutable to implement it by @JonasJ-ap in #7454
Spark 3.4: Support rate limit in Spark Streaming by @singhpk234 in #7422
Spark: Fix Failing SS ratelimit UT by @singhpk234 in #7470
Spark 3.4: Switch to built-in DELETE implementation by @aokolnychyi in #7453
Spark: Remove Usage of deprecated AssertHelpers in spark-sql by @liuxiaocs7 in #7486
Spark 3.3: Remove deprecated FileScanTaskSetManager by @nastra in #7489
Hive: Support connecting to multiple Hive-Catalog by @szehon-ho in #7441
Spark: Add read/write support for UUIDs by @nastra in #7399
Hive: Remove deprecated AssertHelpers by @liuxiaocs7 in #7482
Flink: Remove deprecated AssertHelpers by @liuxiaocs7 in #7481
Spark: Remove deprecated AssertHelpers by @liuxiaocs7 in #7483
Core: Remove compile-time dependency to ResolvingFileIO by @nastra in #7488
Update documentation to reflect new catalog features by @dramaticlly in #7433
Spark 3.3: Add read and write support for UUIDs by @nastra in #7496
Spark 3.2: Add read and write support for UUIDs by @nastra in #7497
Spark 3.1: Add read and write support for UUIDs by @nastra in #7508
API: Remove deprecated AssertHelpers usage by @akshayakp97 in #7468
API: Update java doc for listTables and listViews by @ajantha-bhat in #7336
Core: Simplify Partitions table partition-coercion code by @szehon-ho in #7503
Core, Spark: Add configuration to control case sensitivity of CachingCatalog by @wypoon in #7469
AWS: Add finalizer to S3FileIO by @nastra in #7513
Move all S3FileIO related properties into a separate class S3FileIOProperties by @akshayakp97 in #7505
Spark 3.4: Handle skew in writes by @aokolnychyi in #7520
Spec: Update View spec to reflect that schema is defined at the version level and is required by @amogh-jahagirdar in #7485
Spark 3.4: Implement rewrite position deletes by @szehon-ho in #7389
Spark 3.4: Tests for coalescing small writing tasks by @aokolnychyi in #7532
Core: Support delete file stats in partitions metadata table by @ajantha-bhat in #6661
Make SparkCatalog use a case sensitive CachingCatalog by default. by @wypoon in #7535
Core: Remove duplicate check for ManifestEntry.dataSequenceNumber() by @gaborkaszab in #7538
Remove usages of S3 fields from AwsProperties within s3 package by @akshayakp97 in #7534
Flink: Fix Typo in Namespace by @liuxiaocs7 in #7527
Arrow: Fix errorprone warnings by @ajantha-bhat in #7498
Spark: Use UUIDUtil.convertToByteBuffer to avoid rewinding buffer by @nastra in #7525
Build: Bump me.champeau.jmh:jmh-gradle-plugin from 0.7.0 to 0.7.1 by @dependabot in #7408
API, Core: Make RewriteFiles flexible by @aokolnychyi in #7501
AWS: Add missing line - assign param S3FileIOProperties inside constructor by @akshayakp97 in #7559
Build: Update RoaringBitmap to 0.9.44 by @aokolnychyi in #7563
Core: Refactor naming in MergingSnapshotProducer by @aokolnychyi in #7564
Fix Typo and Polish in aws.md by @skytin1004 in #7548
Spark 3.3: Uniquess validation when computing updates of changelogs by @flyrain in #7388
Core: Add finalizer to ResolvingFileIO by @nastra in #7536
Core, AWS: Add flag to control whether initialization stack trace should be created in S3FileIO by @nastra in #7552
Spark 3.2/3.4: Uniqueness validation when computing updates of changelogs by @flyrain in #7573
AWS: create HttpClientProperties, move s3 related methods into S3FileIOProperties by @akshayakp97 in #7562
Doc: Updates Writing to Partitioned Table Spark Docs by @RussellSpitzer in #7499
Infra: Update slack invite link by @ajantha-bhat in #7583
Nessie: Bump Nessie dependencies from 0.57.0 to 0.58.1 by @dimas-b in #7579
Docs: Add identifier to each Markdown file under docs by @nastra in #7575
Core: Check for all specs in partitionsTable by @ajantha-bhat in #7551
API, Flink: StructProjection returns null projection object for null nested struct value by @stevenzwu in #7517
Build: Upgrade Gradle to 8.1.1 by @XN137 in #7610
Core: Remove deprecated AssertHelpers usage in catalog by @liuxiaocs7 in #7596
Build: Bump Arrow from 11.0.0 to 12.0.0 by @ajantha-bhat in #7595
Core: Remove deprecated AssertHelpers usage by @liuxiaocs7 in #7597
Spark: Fix Parquet read benchmarks for Spark 3.3 + 3.4 by @nastra in #7587
Docs: Improve readability on page Branching and Tagging by @zhangbutao in #7592
Flink: change sink shuffle to use RowData as data type and statistics key type by @stevenzwu in #7494
Flink: add toString, equals, hashCode overrides for RowDataProjection. by @stevenzwu in #7493
Implement ReadableMetrics for Entries table by @dramaticlly in #7539
Add unique JDBC application identifier and user agent header by @manisin in #7580
Spark: Remove deprecated VectorizedSparkParquetReaders#buildReader API for 1.3.0 release by @amogh-jahagirdar in #7591
Views: Update spec with expectations on versions, representations, and dialects by @wmoustafa in #7500
Core: Allow deleting old partition spec columns in V1 by @Fokko in #7398
API, Core, Spark:Add file groups failure in rewrite result by @waltczhang in #7361
Docs: update documentation site link by @liuxiaocs7 in #7117
AWS: Add S3FileIOAwsClientFactory with s3.client-factory-impl catalog property for S3FileIO by @akshayakp97 in #7590
Core: Add FileIO tracker/closer to REST catalog by @nastra in #7487
API, Core: Expose file and data sequence numbers through ContentFile by @gaborkaszab in #7555
Spark 3.4: Avoid local sort for MERGE cardinality check by @aokolnychyi in #7558
Spark 3.4: Fixup for RewritePositionDeleteFilesSparkAction by @szehon-ho in #7565
Flink: Add retry limit for IcebergSource continuous split planning errors by @pvary in #7571
Build: Bump com.fasterxml.jackson.core:jackson-annotations from 2.14.2 to 2.15.0 by @dependabot in #7601
Disable Agg push down for incremental scan by @huaxingao in #7626
Spark 3.4: Add RewritePositionDeleteFilesProcedure by @szehon-ho in #7572
Remove Kyle and add bitsondatadev to collaborators .asf.yaml by @bitsondatadev in #7634
Improve Error Handling to map Snowflake Exceptions into Iceberg Exceptions by @AnubhavSiddharth in #6952
Flink: backport Add config for max allowed consecutive planning failures in IcebergSource before failing the job (#7571) to 1.16 and 1.15 by @pvary in #7629
Flink: backport PR #7494. change sink shuffle to use RowData as data type and statistics key type by @stevenzwu in #7632
Flink: backport PR #7493. add toString, equals, hashCode overrides for RowDataProjection by @stevenzwu in #7631
Flink: Fixes flink sink failed due to updating partition spec by @ConeyLiu in #7171
Core: Allow one data writer in BasePositionDeltaWriter by @aokolnychyi in #7648
Spark 3.4: Cosmetic updates for SparkPositionDeltaWrite by @aokolnychyi in #7650
Spark-3.4: Fix errorprone warning by @ajantha-bhat in #7654
GCP, Pig: Switch tests to JUnit5 by @rakesh-das08 in #7647
Spark 3.4: Fix NPE when create branch and tag on table without snapshot by @dramaticlly in #7652
Spark 3.4: Split update into delete and insert for position deltas by @aokolnychyi in #7646
Parquet: Update parquet to 1.13.1 by @singhpk234 in #7301
Spark-3.4: Harmonize RewriteDataFilesSparkAction by @ajantha-bhat in #7630
Spark 3.3, 3.4: use a deterministic where condition to make rewrite_data_files… by @ludlows in #6760
Spark 3.2: backport Spark SQL extension on create/update/drop tags by @dramaticlly in #7662
Spark: Backport fix NPE when create branch and tag on table without snapshot by @dramaticlly in #7659
Core: Compacted position delete files should use the max data sequence number of source files by @szehon-ho in #7651
Docs: RewritePositionDeleteFiles procedure by @szehon-ho in #7589
OpenAPI responses should reference schemas by @snazy in #6699
Core, Parquet: Remove Parquet dictionary encoding table property by @amogh-jahagirdar in #7665
Build: Bump com.esotericsoftware:kryo-shaded from 4.0.2 to 4.0.3 by @dependabot in #7669
Infra: Use the standard shadow plugin by @ajantha-bhat in #7681
Spark 3.4: Add TimestampNTZ by @Fokko in #7553
Spark 3.3: Backport RewritePositionDeleteFilesSparkAction (#7389) by @szehon-ho in #7684
Spark 3.4: Distribution and ordering enhancements by @aokolnychyi in #7637
Flink: Port #7171 to flink 1.17 by @ConeyLiu in #7680
Flink: Port #7171 to flink 1.15 by @ConeyLiu in #7679
Spark 3.3: Avoid local sort for MERGE cardinality check by @aokolnychyi in #7686
Spark 3.3: Backport RewritePositionDeleteFilesProcedure (#7572) by @szehon-ho in #7687
Nessie: Bump Nessie version from 0.58.1 to 0.59.0 by @ajantha-bhat in #7642
Spark 3.3: Harmonize RewriteDataFilesSparkAction by @ajantha-bhat in #7676
Core: Minor metadata table code harmonization for readable_metrics by @szehon-ho in #7613

New Contributors

@joonsun-baek made their first contribution in #7167
@dpaani made their first contribution in #7066
@Polectron made their first contribution in #7163
@bowenliang123 made their first contribution in #7153
@yegangy0718 made their first contribution in #6382
@deepyaman made their first contribution in #7242
@ChristinaTech made their first contribution in #7198
@doki23 made their first contribution in #7199
@preaudc made their first contribution in #7282
@linyanghao made their first contribution in #7218
@911432 made their first contribution in #7434
@frankliee made their first contribution in #7310
@vishnukumarsinha made their first contribution in #7439
@DarthData410 made their first contribution in #7462
@akshayakp97 made their first contribution in #7468
@skytin1004 made their first contribution in #7548
@zhangbutao made their first contribution in #7592
@wmoustafa made their first contribution in #7500
@waltczhang made their first contribution in #7361
@bitsondatadev made their first contribution in #7634
@AnubhavSiddharth made their first contribution in #6952
@rakesh-das08 made their first contribution in #7647
@ludlows made their first contribution in #6760

Full Changelog: apache-iceberg-1.2.0...apache-iceberg-1.3.0

apache/iceberg apache-iceberg-1.3.0 Apache Iceberg 1.3.0 on GitHub

What's Changed

New Contributors

apache/iceberg apache-iceberg-1.3.0
Apache Iceberg 1.3.0

on GitHub