github apache/iceberg apache-iceberg-1.3.0
Apache Iceberg 1.3.0

latest releases: apache-iceberg-1.7.0, apache-iceberg-1.7.0-rc1, apache-iceberg-1.7.0-rc0...
17 months ago

What's Changed

  • Nessie: Remove compile-time Hadoop dependency by @nastra in #7054
  • Core: Fix deprecation message by @nastra in #7104
  • Build: Update ORC to 1.8.3 by @williamhyun in #7124
  • AWS: Use Apache HTTP client as default AWS HTTP client by @singhpk234 in #7119
  • AWS: Enable virtual-host-style requests for MinioContainer by @nastra in #7125
  • Flink: Bump to Flink 1.15.3 by @Fokko in #7059
  • Flink: Bump to Flink 1.16.1 by @Fokko in #7057
  • Core: Use unknown report type for forward-compatibility by @nastra in #7145
  • Aliyun: Remove AssertHelpers by @liuxiaocs7 in #7116
  • dell: remove usage of AssertHelpers by @liuxiaocs7 in #7143
  • Core: Minor refactoring of PartitionsTable by @ajantha-bhat in #6975
  • Build: Let RevAPI compare against 1.2.0 by @nastra in #7155
  • MR: Remove deprecate AssertHelpers by @liuxiaocs7 in #7159
  • Core: Remove deprecated validation APIs in MergingSnapshotProducer by @amogh-jahagirdar in #7150
  • data: Remove AssertHelpers Usage by @liuxiaocs7 in #7134
  • Flink:fix flink streaming query problem [ Cannot get a client from a closed pool] by @xuzhiwen1255 in #6614
  • Spark 3.3: Remove use of deprecated SparkFilesScan by @szehon-ho in #7106
  • Docs: Add rest to the catalog configuration by @Fokko in #7126
  • Contributing Docs: Add section for testing code by @nastra in #7131
  • Core, API: View Version implementation by @amogh-jahagirdar in #6861
  • Update defaults of max-concurrent-file-group-rewrites to 5 by @karuppayya in #6907
  • Flink: fixed Cloneable not implemented on CatalogLoader by @xuzhiwen1255 in #7168
  • Core: Refactor actions results by @ajantha-bhat in #7089
  • Docs: update doc to read easier by @joonsun-baek in #7167
  • API: Fix retainAll and removeAll in CharSequenceSet by @zhongyujiang in #7133
  • Spark 3.3: Support metadata column in the changelog table by @flyrain in #7152
  • Spark 3.2: Support metadata column in the changelog table by @flyrain in #7178
  • Flink: Backport #6614 to Flink 1.15 by @xuzhiwen1255 in #7165
  • Core: Remove deprecated code from 1.2.0 by @nastra in #7156
  • S3 Credentials provider support in DefaultAwsClientFactory #7063 by @dpaani in #7066
  • Core: Move InMemoryCatalog from test to core by @nastra in #7185
  • Doc: Retypeset the Flink document by @hililiwei in #7099
  • Core: Store split offset for delete files by @singhpk234 in #7011
  • Flink: Backport #6614 to Flink 1.14 by @xuzhiwen1255 in #7166
  • Core, Hive: Support pluggable ClientPool by @lirui-apache in #6698
  • AWS: Remove deprecated AssertHelpers by @liuxiaocs7 in #7195
  • Spark: Support loading function as FunctionCatalog in SparkSessionCatalog by @bowenliang123 in #7153
  • Flink: Implement data statistics operator to collect traffic distribution for guiding smart shuffling by @yegangy0718 in #6382
  • Build: Move RevApi breakage to correct version by @nastra in #7223
  • Ability to add multiple metrics reporters to scan by @karuppayya in #6919
  • Spark 3.3: Use ProcedureInput in AncestorsOfProcedure by @aokolnychyi in #7177
  • Core: Parse snapshot-id as long in remove-statistics update by @nastra in #7235
  • Bump Nessie to 0.54.0 by @snazy in #7146
  • Optimized spark vectorized read parquet decimal by @ConeyLiu in #3249
  • Core: Optimize S3 layout of Datafiles by expanding first character set of the hash by @singhpk234 in #7128
  • AWS: Prevent token refresh scheduling on every sign request by @nastra in #7270
  • Disable local credentials if remote signing is enabled by @danielcweeks in #7230
  • Spark: Revert "Spark: Add "Iceberg" prefix to SparkTable name string for SparkUI (#5629) by @amogh-jahagirdar in #7273
  • Spark: broadcast table instead of file IO in rewrite manifests by @bryanck in #7263
  • AWS: abort S3 input stream on close if not EOS by @bryanck in #7262
  • Spark 3.2: Use ProcedureInput in AncestorsOfProcedure and AddFilesProcedure by @aokolnychyi in #7260
  • Spark 3.3: Dataset writes for position deletes by @szehon-ho in #7029
  • REST: fix previous locations for refs-only load by @bryanck in #7284
  • Core: Fix flakiness in HadoopFileIOTest by @nastra in #7253
  • Flink: Data statistics operator sends local data statistics to coordinator and receive aggregated data statistics from coordinator for smart shuffling by @yegangy0718 in #7269
  • AWS: Make AuthSession cache static by @nastra in #7289
  • Core: Require namespace when creating table using InMemoryCatalog by @nastra in #7252
  • Refactor PartitionsTable planning by @dramaticlly in #7190
  • Flink: Introduce Flink 1.17 by @hililiwei in #7254
  • AWS: Check commit status after failed commit if AWS client performed retries by @ChristinaTech in #7198
  • Core: Fix errorprone warning by @ajantha-bhat in #7286
  • Bump Nessie to 0.56.0 by @snazy in #7283
  • Build: Bump actions/stale from 7.0.0 to 8.0.0 by @dependabot in #7200
  • Build: Bump org.apache.hadoop:hadoop-client from 3.3.4 to 3.3.5 by @dependabot in #7201
  • Spark: apply rewrite manifest action fix to 3.1,3.2 by @bryanck in #7296
  • Build: Spark version of iceberg-delta-lake to 3.3.2 by @doki23 in #7199
  • Nessie: Use latest hash for catalog APIs by @ajantha-bhat in #6789
  • Support vectorized reading int96 timestamps in imported data by @yabola in #6962
  • Flink: Expose write-parallelism in SQL Hints by @hililiwei in #7039
  • Nessie: Fix testcase failures by @ajantha-bhat in #7320
  • Flink: move the classes from flink.sink.shuffle.statistics pkg to one level up as flink.sink.shuffle pkg by @stevenzwu in #7322
  • Spark 3.3: Add doc for the changelog view procedure. by @flyrain in #7147
  • Bump Nessie from 0.56.0 to 0.57.0 by @snazy in #7323
  • Flink 1.15 1.17: Port Expose write-parallelism in SQL Hints to 1.15 & 1.17 by @hililiwei in #7327
  • Update issue template for 1.2.1 release by @danielcweeks in #7331
  • Core: Fix SnapshotProducer#targetBranch's exception message by @zhongyujiang in #7315
  • Bump Gradle from 8.0.2 to 8.1 by @snazy in #7333
  • Build: Fix flaky checkstyle issue by @ajantha-bhat in #7321
  • [Infra] Update vote mail sample in source-release.sh by @gaborkaszab in #7330
  • Core: Add missing metrics reporters when creating BaseTable by @nastra in #7341
  • Core, Spark 3.3: Add FileRewriter API by @aokolnychyi in #7175
  • Spark - Accept an output-spec-id that allows writing to a desired partition spec by @gustavoatt in #7120
  • [ORC][Spark] - Support selected vector with ORC reader on the row and batch reader by @pavibhai in #7197
  • Flink: use correct scan mode when in TABLE_SCAN_THEN_INCREMENTAL mode by @chenjunjiedada in #7338
  • Throw NoSuchIcebergTableException instead of ValidationException in G… by @ericlgoodman in #7277
  • Build: Bump Airlift from 0.21 to 0.24 by @Fokko in #7347
  • Docs: clarify Hive on Tez configuration by @preaudc in #7282
  • Spark -Simplify checks of output-spec-id in SparkWriteConf by @gustavoatt in #7348
  • Fix SetDefaultPartitionSpec to use specId instead of schemaId by @dramaticlly in #7350
  • Core, Spark: Make ObjectStoreLocationProvider serializable by @singhpk234 in #7353
  • Core: Parameterize RewriteDataFile's CommitService by @szehon-ho in #7343
  • Core: Fix flaky TestParallelIterable test by @amogh-jahagirdar in #7372
  • Flink: Apply row level filtering by @Fokko in #7109
  • Spark: Surface better error message during streaming planning when checkpoint snapshot not found by @amogh-jahagirdar in #6480
  • Flink: backport #7338 to 1.16 and 1.15 by @chenjunjiedada in #7373
  • Spark 3.4: Initial support by @aokolnychyi in #7378
  • Honor spark case sensitivity in ALTER TABLE.. ORDERED BY by @karuppayya in #7324
  • Spark 3.3: Surface better error message during streaming planning when checkpoint snapshot not found by @aokolnychyi in #7381
  • Spark: Remove Spark 2.4 by @Fokko in #7385
  • Build: Bump Hive to 2.3.9 by @Fokko in #7374
  • Core: Introduce CompositeMetricsReporter by @nastra in #7337
  • Flink: Use starting sequence number by default when rewriting data files by @linyanghao in #7218
  • Flink: Backport #6382 and #7269 to 1.15 for shuffle operator by @yegangy0718 in #7400
  • Flink: Backport row filter into 1.15 and 1.16 by @Fokko in #7397
  • Spark 3.3: support rate limit in Spark Streaming by @singhpk234 in #4479
  • Enumerate configs that should be respected in REST table load response by @danielcweeks in #7401
  • Doc: Add a page explaining migration from other table formats to iceberg by @JonasJ-ap in #6600
  • Doc: Fix typo in hive_migration.md by @JonasJ-ap in #7407
  • Spark: Fix failing SS UT by @singhpk234 in #7414
  • Flink: Backport #7218 to 1.15 and 1.17 by @linyanghao in #7404
  • MR: Fix IndexOutOfBounds by skip filter translation if there are no leaves by @edgarRd in #7123
  • Add mkdocstrings by @LuigiCerone in #7108
  • AWS: Fix default warehouse path in Dynamodb catalog by @munendrasn in #7358
  • Flink: sync 1.16 with 1.17 for backports missed or not ported identically by @stevenzwu in #7403
  • Flink: sync 1.15 with 1.17 for backports missed or not ported identically by @stevenzwu in #7402
  • Views: Clean up and clarify the view spec by @rdblue in #7416
  • Docs: Separate page for Branching and Tagging by @amogh-jahagirdar in #6723
  • Views: Fix SQL view representation field name by @rdblue in #7417
  • Hive: Use EnvironmentContext instead of Hive Locks to provide transactional commits after HIVE-26882 by @pvary in #6570
  • Spark: Backport #6480 to Spark 3.2 and Spark 3.1 by @amogh-jahagirdar in #7425
  • API, Core: Move schemaID from ViewRepresentation to ViewVersion and make it required by @amogh-jahagirdar in #7421
  • Spark 3.4: Relax constraints in SparkPartitioningAwareScan by @aokolnychyi in #7423
  • Core: Extract REST metrics reporter into its own class by @nastra in #7339
  • Spark 3.4: Add tests for SPJ when partition keys mismatch by @aokolnychyi in #7424
  • Cherry pick Order case sensitivity changes to 3.4 by @karuppayya in #7380
  • Build: Run Iceberg with JDK 17 by @singhpk234 in #7391
  • Views: Move 'operation' check to ViewVersion by @nastra in #7428
  • Updated README.md to support Java 17 by @911432 in #7434
  • Hive: Clean up expired metastore clients by @frankliee in #7310
  • Core: Make TableScanContext immutable by @nastra in #5985
  • Core: Move table-creation-without-namespace-test to CatalogTests by @nastra in #7349
  • Spark: Refactor SparkReadConf use primitive type for confs with default values by @singhpk234 in #7429
  • Spark 3.4: Remove deprecated classes by @aokolnychyi in #7448
  • Arrow: Convert dict encoded vectors to their expected Arrow vector types by @nastra in #3024
  • Spark: Fixed Typo in Spark Read Option Vectorization Javadoc by @vishnukumarsinha in #7439
  • Spark 3.4: Remove no longer needed write extensions by @aokolnychyi in #7443
  • Delta Migration: Add version and timestamp tags for each Delta Lake transaction when add to Iceberg transaction by @JonasJ-ap in #7450
  • Delta Migration: Correct snapshotDataFilesCount in SnapshotDeltaLakeTable.Result and use Immutable to implement it by @JonasJ-ap in #7454
  • Spark 3.4: Support rate limit in Spark Streaming by @singhpk234 in #7422
  • Spark: Fix Failing SS ratelimit UT by @singhpk234 in #7470
  • Spark 3.4: Switch to built-in DELETE implementation by @aokolnychyi in #7453
  • Spark: Remove Usage of deprecated AssertHelpers in spark-sql by @liuxiaocs7 in #7486
  • Spark 3.3: Remove deprecated FileScanTaskSetManager by @nastra in #7489
  • Hive: Support connecting to multiple Hive-Catalog by @szehon-ho in #7441
  • Spark: Add read/write support for UUIDs by @nastra in #7399
  • Hive: Remove deprecated AssertHelpers by @liuxiaocs7 in #7482
  • Flink: Remove deprecated AssertHelpers by @liuxiaocs7 in #7481
  • Spark: Remove deprecated AssertHelpers by @liuxiaocs7 in #7483
  • Core: Remove compile-time dependency to ResolvingFileIO by @nastra in #7488
  • Update documentation to reflect new catalog features by @dramaticlly in #7433
  • Spark 3.3: Add read and write support for UUIDs by @nastra in #7496
  • Spark 3.2: Add read and write support for UUIDs by @nastra in #7497
  • Spark 3.1: Add read and write support for UUIDs by @nastra in #7508
  • API: Remove deprecated AssertHelpers usage by @akshayakp97 in #7468
  • API: Update java doc for listTables and listViews by @ajantha-bhat in #7336
  • Core: Simplify Partitions table partition-coercion code by @szehon-ho in #7503
  • Core, Spark: Add configuration to control case sensitivity of CachingCatalog by @wypoon in #7469
  • AWS: Add finalizer to S3FileIO by @nastra in #7513
  • Move all S3FileIO related properties into a separate class S3FileIOProperties by @akshayakp97 in #7505
  • Spark 3.4: Handle skew in writes by @aokolnychyi in #7520
  • Spec: Update View spec to reflect that schema is defined at the version level and is required by @amogh-jahagirdar in #7485
  • Spark 3.4: Implement rewrite position deletes by @szehon-ho in #7389
  • Spark 3.4: Tests for coalescing small writing tasks by @aokolnychyi in #7532
  • Core: Support delete file stats in partitions metadata table by @ajantha-bhat in #6661
  • Make SparkCatalog use a case sensitive CachingCatalog by default. by @wypoon in #7535
  • Core: Remove duplicate check for ManifestEntry.dataSequenceNumber() by @gaborkaszab in #7538
  • Remove usages of S3 fields from AwsProperties within s3 package by @akshayakp97 in #7534
  • Flink: Fix Typo in Namespace by @liuxiaocs7 in #7527
  • Arrow: Fix errorprone warnings by @ajantha-bhat in #7498
  • Spark: Use UUIDUtil.convertToByteBuffer to avoid rewinding buffer by @nastra in #7525
  • Build: Bump me.champeau.jmh:jmh-gradle-plugin from 0.7.0 to 0.7.1 by @dependabot in #7408
  • API, Core: Make RewriteFiles flexible by @aokolnychyi in #7501
  • AWS: Add missing line - assign param S3FileIOProperties inside constructor by @akshayakp97 in #7559
  • Build: Update RoaringBitmap to 0.9.44 by @aokolnychyi in #7563
  • Core: Refactor naming in MergingSnapshotProducer by @aokolnychyi in #7564
  • Fix Typo and Polish in aws.md by @skytin1004 in #7548
  • Spark 3.3: Uniquess validation when computing updates of changelogs by @flyrain in #7388
  • Core: Add finalizer to ResolvingFileIO by @nastra in #7536
  • Core, AWS: Add flag to control whether initialization stack trace should be created in S3FileIO by @nastra in #7552
  • Spark 3.2/3.4: Uniqueness validation when computing updates of changelogs by @flyrain in #7573
  • AWS: create HttpClientProperties, move s3 related methods into S3FileIOProperties by @akshayakp97 in #7562
  • Doc: Updates Writing to Partitioned Table Spark Docs by @RussellSpitzer in #7499
  • Infra: Update slack invite link by @ajantha-bhat in #7583
  • Nessie: Bump Nessie dependencies from 0.57.0 to 0.58.1 by @dimas-b in #7579
  • Docs: Add identifier to each Markdown file under docs by @nastra in #7575
  • Core: Check for all specs in partitionsTable by @ajantha-bhat in #7551
  • API, Flink: StructProjection returns null projection object for null nested struct value by @stevenzwu in #7517
  • Build: Upgrade Gradle to 8.1.1 by @XN137 in #7610
  • Core: Remove deprecated AssertHelpers usage in catalog by @liuxiaocs7 in #7596
  • Build: Bump Arrow from 11.0.0 to 12.0.0 by @ajantha-bhat in #7595
  • Core: Remove deprecated AssertHelpers usage by @liuxiaocs7 in #7597
  • Spark: Fix Parquet read benchmarks for Spark 3.3 + 3.4 by @nastra in #7587
  • Docs: Improve readability on page Branching and Tagging by @zhangbutao in #7592
  • Flink: change sink shuffle to use RowData as data type and statistics key type by @stevenzwu in #7494
  • Flink: add toString, equals, hashCode overrides for RowDataProjection. by @stevenzwu in #7493
  • Implement ReadableMetrics for Entries table by @dramaticlly in #7539
  • Add unique JDBC application identifier and user agent header by @manisin in #7580
  • Spark: Remove deprecated VectorizedSparkParquetReaders#buildReader API for 1.3.0 release by @amogh-jahagirdar in #7591
  • Views: Update spec with expectations on versions, representations, and dialects by @wmoustafa in #7500
  • Core: Allow deleting old partition spec columns in V1 by @Fokko in #7398
  • API, Core, Spark:Add file groups failure in rewrite result by @waltczhang in #7361
  • Docs: update documentation site link by @liuxiaocs7 in #7117
  • AWS: Add S3FileIOAwsClientFactory with s3.client-factory-impl catalog property for S3FileIO by @akshayakp97 in #7590
  • Core: Add FileIO tracker/closer to REST catalog by @nastra in #7487
  • API, Core: Expose file and data sequence numbers through ContentFile by @gaborkaszab in #7555
  • Spark 3.4: Avoid local sort for MERGE cardinality check by @aokolnychyi in #7558
  • Spark 3.4: Fixup for RewritePositionDeleteFilesSparkAction by @szehon-ho in #7565
  • Flink: Add retry limit for IcebergSource continuous split planning errors by @pvary in #7571
  • Build: Bump com.fasterxml.jackson.core:jackson-annotations from 2.14.2 to 2.15.0 by @dependabot in #7601
  • Disable Agg push down for incremental scan by @huaxingao in #7626
  • Spark 3.4: Add RewritePositionDeleteFilesProcedure by @szehon-ho in #7572
  • Remove Kyle and add bitsondatadev to collaborators .asf.yaml by @bitsondatadev in #7634
  • Improve Error Handling to map Snowflake Exceptions into Iceberg Exceptions by @AnubhavSiddharth in #6952
  • Flink: backport Add config for max allowed consecutive planning failures in IcebergSource before failing the job (#7571) to 1.16 and 1.15 by @pvary in #7629
  • Flink: backport PR #7494. change sink shuffle to use RowData as data type and statistics key type by @stevenzwu in #7632
  • Flink: backport PR #7493. add toString, equals, hashCode overrides for RowDataProjection by @stevenzwu in #7631
  • Flink: Fixes flink sink failed due to updating partition spec by @ConeyLiu in #7171
  • Core: Allow one data writer in BasePositionDeltaWriter by @aokolnychyi in #7648
  • Spark 3.4: Cosmetic updates for SparkPositionDeltaWrite by @aokolnychyi in #7650
  • Spark-3.4: Fix errorprone warning by @ajantha-bhat in #7654
  • GCP, Pig: Switch tests to JUnit5 by @rakesh-das08 in #7647
  • Spark 3.4: Fix NPE when create branch and tag on table without snapshot by @dramaticlly in #7652
  • Spark 3.4: Split update into delete and insert for position deltas by @aokolnychyi in #7646
  • Parquet: Update parquet to 1.13.1 by @singhpk234 in #7301
  • Spark-3.4: Harmonize RewriteDataFilesSparkAction by @ajantha-bhat in #7630
  • Spark 3.3, 3.4: use a deterministic where condition to make rewrite_data_files… by @ludlows in #6760
  • Spark 3.2: backport Spark SQL extension on create/update/drop tags by @dramaticlly in #7662
  • Spark: Backport fix NPE when create branch and tag on table without snapshot by @dramaticlly in #7659
  • Core: Compacted position delete files should use the max data sequence number of source files by @szehon-ho in #7651
  • Docs: RewritePositionDeleteFiles procedure by @szehon-ho in #7589
  • OpenAPI responses should reference schemas by @snazy in #6699
  • Core, Parquet: Remove Parquet dictionary encoding table property by @amogh-jahagirdar in #7665
  • Build: Bump com.esotericsoftware:kryo-shaded from 4.0.2 to 4.0.3 by @dependabot in #7669
  • Infra: Use the standard shadow plugin by @ajantha-bhat in #7681
  • Spark 3.4: Add TimestampNTZ by @Fokko in #7553
  • Spark 3.3: Backport RewritePositionDeleteFilesSparkAction (#7389) by @szehon-ho in #7684
  • Spark 3.4: Distribution and ordering enhancements by @aokolnychyi in #7637
  • Flink: Port #7171 to flink 1.17 by @ConeyLiu in #7680
  • Flink: Port #7171 to flink 1.15 by @ConeyLiu in #7679
  • Spark 3.3: Avoid local sort for MERGE cardinality check by @aokolnychyi in #7686
  • Spark 3.3: Backport RewritePositionDeleteFilesProcedure (#7572) by @szehon-ho in #7687
  • Nessie: Bump Nessie version from 0.58.1 to 0.59.0 by @ajantha-bhat in #7642
  • Spark 3.3: Harmonize RewriteDataFilesSparkAction by @ajantha-bhat in #7676
  • Core: Minor metadata table code harmonization for readable_metrics by @szehon-ho in #7613

New Contributors

Full Changelog: apache-iceberg-1.2.0...apache-iceberg-1.3.0

Don't miss a new iceberg release

NewReleases is sending notifications on new releases.