github apache/iceberg apache-iceberg-1.1.0
Apache Iceberg 1.1.0

latest releases: apache-iceberg-1.7.0-rc0, apache-iceberg-1.6.1, apache-iceberg-1.6.1-rc2...
23 months ago

What's Changed

  • log warning when filter conversion/bind expression fail by @huaxingao in #5254
  • Build: Upgrade test dependencies to latest version by @XN137 in #5210
  • Update License Header by @nastra in #5265
  • Flink: port #4943 to flink 1.13 and 1.14 by @chenjunjiedada in #5263
  • Build: unify github action versions by @XN137 in #5211
  • Build: Exclude unnecessary git properties from iceberg-build.properties by @singhpk234 in #5277
  • only log error message(not the stack trace) if filter conversion/bind expression fail by @huaxingao in #5274
  • [SPARK] Update Spark 2.4 JMH Benchmark Instructions to Updated Module Name iceberg-spark-2.4 by @kbendick in #5189
  • Dell: Fix bugs during documenting by @wang-x-xia in #5059
  • typos depracated to deprecated by @20100507 in #5285
  • Build: Fix Scala 2.13 builds in stage-binaries.sh by @rdblue in #5270
  • Build: Add iceberg-build.properties to RAT excludes by @rdblue in #5262
  • AWS: Dynamo Catalog: Pass CommitFailedException up the stack without wrapping by @waifairer in #5299
  • Build: Use Google Java Format for spotless by @nastra in #5266
  • AWS: handle s3 and glue exceptions more gracefully as user errors by @xingfanx in #5304
  • AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests. by @amogh-jahagirdar in #5289
  • Prevent usage of @test(expected = ...) and change existing tests by @nastra in #5221
  • API: CloseableIterable.concat evaluates first item twice by @nastra in #5306
  • API: Introduce DefaultMetricsContext and Timer interface by @nastra in #5286
  • Docs: Fix Flink Connector docs with custom catalog by @nastra in #5045
  • Spark: Correct SparkCatalog javadoc for supplying custom catalog by @hrishisd in #5288
  • StreamingDelete constructor can be called by subclasses by @gustavoatt in #5271
  • #5308 - Avoid 2 reads on manifest by @palaniappa in #5309
  • Build: Upgrade slf4j to 1.7.36 by @nastra in #5320
  • Java: Catch NPE if the type isn't set by @Fokko in #5291
  • Build: Upgrade Guava to 31.1-jre by @nastra in #5322
  • Core: Add MetadataLog metadata table by @singhpk234 in #5063
  • Core: Implement BaseMetastoreCatalog.registerTable() by @Mehul2500 in #5037
  • Spec: Add sequence-number and parent-snapshot-id by @Fokko in #5196
  • Build: Let revapi compare API compatibility against apache-iceberg-0.14.0 by @nastra in #5336
  • Build: Use 'apache-iceberg-' tag prefix to figure out SNAPSHOT version by @nastra in #5341
  • Ignore case for partition transform by @southernriver in #5335
  • AWS: Fix #2796 - avoid S3 error of "Resetting to invalid mark" by re-creating input stream on retries by @jfz in #5282
  • Bump zstandard from 0.17.0 to 0.18.0 in /python by @dependabot in #5342
  • Core: Print DateTime strings always with +00:00 Zone offset by @nastra in #5337
  • API: Deprecate Counter#count() / Add Counter#value() by @nastra in #5328
  • AWS: Add LakeFormation Integration tests by @xiaoxuandev in #4423
  • More Python Expressions by @CircArgs in #5258
  • Core: Support _deleted metadata column in vectorized read by @flyrain in #4888
  • Make glue endpoint configurable #5095 by @naushadh in #5330
  • typo mulitple -> multiple by @20100507 in #5354
  • Add table spec changes for statistics information in table snapshot by @findepi in #4945
  • Core: Support building custom tasks in ManifestGroup by @aokolnychyi in #5301
  • Substitute the hard code string for a constants by @zhaomin1423 in #5347
  • Improve bloom filter test by @huaxingao in #5329
  • Spark: Adopt the new Scan Task APIs in Spark Readers by @flyrain in #5248
  • unit test should always verify mock invocation by @abmo-x in #5317
  • AWS - Fix ErrorProne warning of malformed Javadoc by @kbendick in #5359
  • ICEBERG-4346: Better handling of Orphan files by @karuppayya in #4652
  • Spark 3.2: Support different task types in readers by @flyrain in #5363
  • Build: Quicker project evaluation (memoize project version) by @snazy in #5051
  • Nessie: Do not delete default branch in tests by @snazy in #5193
  • Core: Add base implementations for changelog tasks by @aokolnychyi in #5300
  • Build: Enforce spotless & spotlessApply by @nastra in #5312
  • Update spec.md to fix broken link to parquet-format/blob/master/LogicalTypes by @skadyan in #5352
  • Flink: bridge the gap btw FlinkSource and IcebergSource (FLIP-27) and… by @stevenzwu in #5318
  • Flink: port PR #5318 to 1.14 by @stevenzwu in #5344
  • Parquet: Add option to set page row count limit by @bryanck in #5345
  • AWS: Fix S3FileIO#prefixList integration test by @amogh-jahagirdar in #5383
  • Spark 3.3: Add prefix mismatch mode for deleting orphan files by @karuppayya in #5385
  • Core: Remove TestEnvironmentUtil#testEnvironmentSubstitution() as it is bui… by @stevenzwu in #5353
  • API: Track name/unit in Counters/Timers by @nastra in #5386
  • Flink: avoid converting Iceberg MetricContext to Flink metrics in FLI… by @stevenzwu in #5393
  • Bump fastavro from 1.5.3 to 1.5.4 in /python by @dependabot in #5396
  • Github: Add issue form by @Fokko in #4867
  • Infra - Add a GH Action to Mark and Close Stale Issues by @kbendick in #4949
  • Flink: Support write options in the in-line insert SQL comments by @hililiwei in #5050
  • AWS: Call abortUpload only once when any of the completable future fails by @singhpk234 in #5366
  • Spark - Add Spark FunctionCatalog by @kbendick in #5377
  • API: Assign the right field ids when merging schema, #5394 by @karuppayya in #5395
  • [Spark] - Backport FunctionCatalog to Spark 3.2 by @kbendick in #5411
  • Nessie: Bump to 0.40.3 by @snazy in #5406
  • S3OutputStream - failure to close should persist on subsequent close calls by @abmo-x in #5311
  • [Core | Docs]: [FOLLOWUP] Add metadata_log_entries metadata table by @singhpk234 in #5367
  • AWS: S3FileIo Integration test include UUID prefix in Prefix integration tests by @amogh-jahagirdar in #5413
  • Core: Implement default value parsing and unparsing by @rzhang10 in #4871
  • Infra - Upgrade Stale GH Action to Latest 5.1.1 by @kbendick in #5420
  • API/Core: Initial Table Scan Reporting support by @nastra in #5268
  • Core: Simplify scan planning & reporting tests by @nastra in #5428
  • Hive: Fix concurrent transactions overwriting commits by adding hive lock heartbeats. by @SinghAsDev in #5036
  • Hive: Hadoop Path fails on s3 endpoint by @Fokko in #5405
  • API: changes to honour schema filed name's case by @karuppayya in #5440
  • Spark: Spark changes to honour schema filed name's case by @karuppayya in #5441
  • Core: Implement IncrementalChangelogScan without deletes by @aokolnychyi in #5382
  • Core, API: Add getting refs and snapshot by ref to the Table API by @amogh-jahagirdar in #4428
  • Flink: missed IcebergSourceReader group in PR #5393 for FLIP-27 source reader metrics by @stevenzwu in #5401
  • API: Avoid unnecessary wrapping of CloseableIterable.iterator() by @nastra in #5446
  • Doc: update Flink doc for using the new experimental FLIP-27 source by @stevenzwu in #5423
  • Spark 3.3: Delete file counts while deleting reachable files by @aokolnychyi in #5451
  • Bump coverage from 6.4.2 to 6.4.3 in /python by @dependabot in #5454
  • Move base.py for table to init by @samredai in #5458
  • CI: Enable dependabot for Github Actions by @Fokko in #5429
  • API: Improve/align error messaging in CloseableIterable/CloseableIterator by @nastra in #5433
  • Bump actions/setup-python from 3 to 4 by @dependabot in #5460
  • CI: Enable dependabot for gradle by @nastra in #5464
  • AWS: Cleanup warning about Lambda should be method reference by @amogh-jahagirdar in #5476
  • Bump nebula.dependency-recommender from 9.0.2 to 11.0.0 by @dependabot in #5474
  • Bump pyarrow from 8.0.0 to 9.0.0 in /python by @dependabot in #5455
  • Core: Partition filter pushdown for entries table by @szehon-ho in #5443
  • AWS: Fix/Suppress ErrorProne warnings by @nastra in #5368
  • Bump hiveVersion from 3.1.2 to 3.1.3 by @dependabot in #5470
  • Doc: Update web page of Flink unit test (#5480) by @lvyanquan in #5484
  • Spark 3.3: Use typed beans in BaseSparkAction by @aokolnychyi in #5469
  • Core: Prevent potential NPEs when retrieving JSON fields by @nastra in #5438
  • Spark 3.2: Count delete files in DeleteReachableFiles by @aokolnychyi in #5491
  • Spark 3.2: Use typed beans in BaseSparkAction by @aokolnychyi in #5494
  • AWS: Use executor service by default when performing batch deletion of files by @amogh-jahagirdar in #5379
  • Core, API: Performing operations on a snapshot branch ref by @namrathamyske in #4926
  • Spark: Support truncate in FunctionCatalog by @kbendick in #5431
  • Spark 3.1:Port #3721 to Spark 3.1 by @hililiwei in #5497
  • Spark 3.1:Port #3287 #4381 #3535 #4419 to Spark 3.1 by @hililiwei in #5498
  • Spark 3.1:Port #4198 to Spark 3.1 by @hililiwei in #5499
  • Spark 3.1:Port #3373 to Spark 3.1 by @hililiwei in #5500
  • Spark 3.1:Port #3491 to Spark 3.1 by @hililiwei in #5502
  • Spark 3.1:Port #3456 to Spark 3.1 by @hililiwei in #5501
  • Spark 3.2: Support truncate in FunctionCatalog by @kbendick in #5514
  • Build: Bump spotless-plugin-gradle from 6.8.0 to 6.9.1 by @dependabot in #5521
  • Build: Bump pydantic from 1.9.1 to 1.9.2 in /python by @dependabot in #5522
  • SetSnapshotOperation should commit empty operations too by @szlta in #5536
  • AWS: Support preload S3 client mode for S3FileIO by @xiaoxuandev in #5508
  • Build: Resolve unchecked Map type cast in TestAvroNameMapping by @JonasJ-ap in #5541
  • Fix linter and test failures by @samredai in #5542
  • Flink 1.13&1.14: Port #5050 to Flink 1.13&1.14 by @hililiwei in #5531
  • API: Deprecate generic Counter and replace with simpler Counter API by @nastra in #5505
  • API/Core: Scan reporting result wrappers and parsers by @nastra in #5427
  • Build: Bump gradle-git-version from 0.12.3 to 0.15.0 by @nastra in #5532
  • Core: Put property names at the end in JsonUtil error messages by @nastra in #5434
  • Replace deprecated Counter with new Counter API by @nastra in #5506
  • Spark 3.1:Port #3505 to Spark 3.1 by @hililiwei in #5503
  • Flink - Suppress Nanosecond Warning for TimestampTz ORC writer by @kbendick in #5552
  • Core: Add some tests for JsonUtil & reduce duplicated code by @nastra in #5526
  • Add s3.acceleration-enbled flag to AwsProperties by @price-qian in #5555
  • Spark 3.3: Reduce serialization in DeleteOrphanFilesSparkAction by @aokolnychyi in #5495
  • API: Remove counter name by @nastra in #5559
  • Spark 3.3 - Support bucket in FunctionCatalog by @kbendick in #5513
  • Spark 3.2: Support bucket in FunctionCatalog by @kbendick in #5571
  • Core: Don't clear snapshot log when intermediate snapshots are detected by @nastra in #5568
  • Flink: fix the bug where metrics are registered in split reader. Also updated reader metric group to be more consistent with Flink metrics style. by @stevenzwu in #5554
  • Spark 3.3: Align formatting in bucket and truncate functions by @aokolnychyi in #5573
  • Spark 3.2: Reduce serialization in DeleteOrphanFilesSparkAction by @aokolnychyi in #5572
  • ORC: Upgrade to 1.7.6 by @williamhyun in #5580
  • Spark 3.2: Delete deprecated action classes by @aokolnychyi in #5575
  • Build: Bump gradle-processors from 3.3.0 to 3.7.0 by @dependabot in #5582
  • Spark 3.2: Align formatting in bucket and truncate functions by @kbendick in #5574
  • Core: Make a shorthand for the rest catalog by @Fokko in #5570
  • Flink: add monitor metrics for Flink sink by @stevenzwu in #5410
  • API: add Histogram metric type by @stevenzwu in #5348
  • Flink: port PR #5410 to 1.14 for sink monitoring metrics by @stevenzwu in #5589
  • Build: Bump jackson-annotations from 2.6.5 to 2.13.3 by @dependabot in #5596
  • Build: Bump coverage from 6.4.3 to 6.4.4 in /python by @dependabot in #5599
  • Add Fokko as a collaborator by @Fokko in #5600
  • Build: enforce LambdaMethodReference check at compile-time by @XN137 in #5529
  • API: Extend FileIO in optional interfaces by @aokolnychyi in #5576
  • Flink - Fix Malformed Inline Tag in ContinuousSplitPlannerImpl JavaDoc by @kbendick in #5551
  • Core: Add expression JSON parser by @rdblue in #5602
  • Deps: Bump AWS SDK by @Fokko in #5612
  • Build: Bump tezVersion from 0.10.1 to 0.10.2 by @dependabot in #5520
  • Docs: Flink Streaming upsert write by @hililiwei in #5380
  • Fix message pattern in checkArgument invocation by @findepi in #5621
  • Core: Add snapshot references metadata table by @rajarshisarkar in #4807
  • Add table metadata changes for statistics information in table metadata by @findepi in #5450
  • Docs: Added missing doc for REPLACE PARTITION FIELD by @dotjdk in #5624
  • Core: Transform parquet bloom filter props when updating schema. by @zhongyujiang in #5426
  • AWS: Deprecate AwsClientFactories.s3Configuration() by @price-qian in #5592
  • Add SparkV2Filters by @huaxingao in #5302
  • API: Deprecate old incremental append scans by @aokolnychyi in #5577
  • Remove deprecations for Rollback and Overwrite Files by @danielcweeks in #5639
  • Core: Use Bulk Delete when dropping table data and metadata by @amogh-jahagirdar in #5459
  • Deprecations for 1.0 release: MR properties by @danielcweeks in #5657
  • Deprecations for 1.0 release: Aliyun OSS by @danielcweeks in #5654
  • Build: Bump spotless-plugin-gradle from 6.9.1 to 6.10.0 by @dependabot in #5650
  • Docs: Switch post- and pre- around by @Fokko in #5633
  • Deprecations for 1.0 release: remove dynamo lock manager and props by @danielcweeks in #5655
  • AWS: Add s3.dualstack-enabled flag to AwsProperties by @JonasJ-ap in #5644
  • Add API changes for statistics information in table metadata by @findepi in #5021
  • AWS: fix the wrong flag used for s3UseArnRegionEnabled by @JonasJ-ap in #5680
  • Spark: Add Changelog reader for copy-on-write by @flyrain in #5578
  • Spark 3.2: Add row-based changelog reader by @flyrain in #5682
  • [Python] FsspecFileIO, a FileIO that wraps any fsspec compliant filesystem by @samredai in #5332
  • Core: Fix exception handling in BaseTaskWriter by @rdblue in #5683
  • Support delete corrupted Iceberg table by @yabola in #5510
  • [Core | Spark | Integrations] : Fix kryo serialization failure for FileIO by @singhpk234 in #5437
  • Parquet: close zstd input stream early to avoid memory pressure by @bryanck in #5681
  • Spark: Fix stats in rewrite metadata action by @rdblue in #5691
  • Docs: Update docs to reflect AWS SDK version presently being used by @singhpk234 in #5661
  • Doc: Update doc to display the results of the table partitions query by @lvyanquan in #5662
  • Core: Add CommitStateUnknownException handling to REST by @rdblue in #5694
  • API: Remove source type from Transform by @rdblue in #5601
  • Spark: Add custom metric for number of deletes applied by a SparkScan by @wypoon in #4588
  • Flink: fix missing generic types for some IcebergSource$Builder methods by @stevenzwu in #5697
  • API/Core: Include Expression filter in ScanReport by @nastra in #5705
  • Bump avro from 1.9.2/1.10.2 to 1.11.1 by @nastra in #5483
  • Core: Avoid useless metadata retries. by @rdblue in #5696
  • Build: Bump pytest from 7.1.2 to 7.1.3 in /python by @dependabot in #5703
  • Build: Bump jackson-annotations from 2.13.3 to 2.13.4 by @dependabot in #5702
  • Build: Bump jmh-gradle-plugin from 0.6.6 to 0.6.7 by @dependabot in #5700
  • Update ORC to 1.8.0 by @williamhyun in #5699
  • Build: Enforce logging conventions with errorprone by @XN137 in #5528
  • Build: Upgrade to Gradle 7.5.1 by @XN137 in #5278
  • Flink: Fixed an issue where Flink batch entry was not accurate by @xuzhiwen1255 in #5642
  • Dell: Add document. by @wang-x-xia in #4993
  • Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags by @ajantha-bhat in #5718
  • Flink: Fixed an issue where Flink1.14 batch entry was not accurate by @xuzhiwen1255 in #5716
  • API: Add rowsCount to ScanTask by @aokolnychyi in #5720
  • Docs: Add snapshot references metadata table by @rajarshisarkar in #5725
  • Flink: Fixed an issue where Flink1.13 batch entry was not accurate by @xuzhiwen1255 in #5731
  • Build: Bump fastavro from 1.6.0 to 1.6.1 in /python by @dependabot in #5745
  • Build: Bump pydantic from 1.10.1 to 1.10.2 in /python by @dependabot in #5744
  • API: Use hashCode instead of hash by @Fokko in #5751
  • AWS: Preload S3 client in GlueCatalog For LakeFormation enabled tables by @xiaoxuandev in #5756
  • Build - Remove unused global flink dependency from versions.props by @kbendick in #5758
  • Build - Move global Spark 2.4 dependency in version.props to Spark 2.4 subproject by @kbendick in #5759
  • CI: Fix names and jobs by @Fokko in #5749
  • JdbcCatalog don't override namespace location if set by @danielcweeks in #5737
  • Spark: Fix runtime jars packaging scala library files by @ajantha-bhat in #5754
  • Build: relocate httpclient5 dependency for runtime jars by @ajantha-bhat in #5761
  • AWS: Refactor util methods for applying AWS clients configurations by @JonasJ-ap in #5684
  • Bump actions/stale from 5.1.1 to 5.2.0 by @dependabot in #5785
  • Bump spotless-plugin-gradle from 6.10.0 to 6.11.0 by @dependabot in #5786
  • PyArrow support for S3/S3A with properties by @joshuarobinson in #5747
  • REST: implement handling of OAuth error responses by @bryanck in #5698
  • AWS: Allow users to set the assume role session name by @JonasJ-ap in #5765
  • Revert "REST: implement handling of OAuth error responses (#5698)" by @danielcweeks in #5810
  • Flink 1.14&1.15 backport: Set custom Hadoop configuration by @lvyanquan in #5775
  • API/Core: Make ScanReport and its related classes Immutable by @nastra in #5780
  • API: Remove unneeded class variable by @Fokko in #5805
  • Core: Serialize statistics files in TableMetadata by @findepi in #5799
  • Core: Reduce duplicated code in JSON Parsers by @nastra in #5802
  • API,Core: Add scan planning metrics for skipped data/delete files by @nastra in #5788
  • Github: Update issue template with latest release by @Fokko in #5818
  • Core: Use JsonUtil.generate in ErrorResponseParser by @nastra in #5816
  • Build: Fix CI paths by @Fokko in #5821
  • Build: Add the path to the Action yaml by @Fokko in #5828
  • Build: Apply spotless on integration modules as well by @nastra in #5827
  • Don't check row filter when deciding whether to copy data file with stats by @manuzhang in #5815
  • Add a BoundBooleanExpressionVisitor for visiting bound expressions by @samredai in #5303
  • REST: implement handling of OAuth error responses followup by @bryanck in #5820
  • Add REST Servlet/Server Implementations by @danielcweeks in #5781
  • AWS: update AWS Integration Test to fix false positives by @JonasJ-ap in #5784
  • API/Core: Remove deprecated methods from Snapshot API by @nastra in #5734
  • Build: Bump Rat to 0.15 by @Fokko in #5839
  • AWS: Add socket connection timeout for Apache Http Builder by @JonasJ-ap in #5787
  • Core: Add strict-mode property to JDBC Catalog by @nastra in #5830
  • core: Provide mechanism to cache manifest file content by @rizaon in #4518
  • Support setting table statistics by @findepi in #5794
  • Core: Ignore TestManifestCaching#testWeakFileIOReferenceCleanUp untl it's fixed by @nastra in #5865
  • Spark: Fix MERGE INTO Query failure on tables with non-nullable columns by @singhpk234 in #5679
  • Docs: Make it clear metadata tables support time travel in Spark by @liuml07 in #4709
  • Doc: Update output of expire_snapshots procedure by @lvyanquan in #5866
  • Ensure the default value of hive.in.test to avoid overwriting by @viirya in #5844
  • API: Extended some deprecation comments in API folder by @gaborkaszab in #5726
  • Core: Deprecate functions in TableMetadata and DataWriter by @gaborkaszab in #5772
  • Core: Deprecate functions in DeleteWriters by @gaborkaszab in #5771
  • AWS: Add table and namespace S3 tags by @rajarshisarkar in #4402
  • Core: Avoid extra getFileStatus call in HadoopInputFile by @singhpk234 in #5864
  • Orc: Closes #5777 - Obtain ORC stripe offsets from writer by @pavibhai in #5778
  • Flink: add defensive check in IcebergFilesCommitter for restoring state by @stevenzwu in #5873
  • Spark 3.x: Backport snapshot references metadata table test by @rajarshisarkar in #5806
  • Build: Fix & Run spark integration tests on CI by @nastra in #5819
  • API,Core: Add scan planning metrics for scanned/skipped delete manifests by @nastra in #5792
  • Doc: Update the default value of table property read.parquet.vectorization.enabled by @Kontinuation in #5776
  • Bump Nessie to 0.43.0 by @snazy in #5807
  • Doc: Update default values of Lock catalog properties to avoid wrong way of filling. by @lvyanquan in #5708
  • Build: Bump hadoop-client from 3.1.0 to 3.3.4 by @dependabot in #5519
  • Spark: Bump Spark version for vulnerability by @deadwind4 in #5292
  • Expose table statistics in Table API by @findepi in #4741
  • [Python][Docs] Very small formatting fix by @samredai in #5868
  • Build: workflows cache gradle wrapper by @XN137 in #4165
  • API,Core: Add scan planning metrics for indexed/eq/pos delete files by @nastra in #5809
  • Build: Bump gradle-baseline-java from 4.0.0 to 4.42.0 by @nastra in #5530
  • Docs: Add table and namespace S3 tags doc by @rajarshisarkar in #5894
  • Retain table statistics during orphan files removal by @findepi in #5795
  • [Docs] Update drop table behavior in spark-ddl docs by @sumeetgajjar in #5645
  • Spark 3.3: Fix failing jmh benchmarks under org.apache.iceberg.spark.data.parquet package by @sumeetgajjar in #5635
  • Core: Only validate the current partition specs by @Fokko in #5707
  • Core: Add RESTScanReporter to send scan report to REST endpoint by @nastra in #5407
  • Build: Bump jmh-gradle-plugin from 0.6.7 to 0.6.8 by @dependabot in #5850
  • Build: Bump actions/stale from 5.2.0 to 6.0.0 by @dependabot in #5851
  • Build: Bump jinja2 from 3.0.3 to 3.1.2 in /python by @dependabot in #5849
  • Build: Bump coverage from 6.4.4 to 6.5.0 in /python by @dependabot in #5904
  • Build: Bump rich from 12.5.1 to 12.6.0 in /python by @dependabot in #5905
  • Build: Bump pytest-checkdocs from 2.7.1 to 2.8.1 in /python by @dependabot in #5903
  • Core: Rename misleading local variable in planFiles() by @gaborkaszab in #5889
  • INFRA: Avoid running engine tests on ISSUE_TEMPLATE update by @singhpk234 in #5859
  • Core, API: Support scanning from refs by @amogh-jahagirdar in #5364
  • Spark: Set the version explicitly by @Fokko in #5907
  • API: Make COUNT default unit when creating a Counter by @nastra in #5912
  • Core: Reuse PositionDelete by @nastra in #5896
  • Spark 3.3: Fix nullability in merge-on-read projections by @aokolnychyi in #5880
  • Spark 3.2: Fix nullability in merge-on-read projections by @aokolnychyi in #5917
  • Replace & Ban ExpectedException usage by @nastra in #5921
  • API: Handle negative/zero during num-digits calculation by @nastra in #5928
  • Core: Provide better error message on invalid enums by @nastra in #5910
  • Reduce 'Scanning table' log verbosity for long IN list by @findepi in #5908
  • Core: Deprecate write.manifest-lists.enabled flag by @nastra in #5773
  • Spark 3.3: Add SparkChangelogTable by @aokolnychyi in #5740
  • Core: Add dataSequenceNumber to ManifestEntry by @aokolnychyi in #5913
  • AWS: Add socket connection timeout for UrlConnectionHttpClient by @JonasJ-ap in #5900
  • AWS: Add additional configurations for ApacheHttpClientBuilder by @JonasJ-ap in #5899
  • Docs: Add doc for HTTP client configurations by @JonasJ-ap in #5902
  • Build: Bump actions/stale from 6.0.0 to 6.0.1 by @dependabot in #5940
  • Build: Bump pytest-checkdocs from 2.8.1 to 2.9.0 in /python by @dependabot in #5941
  • Core: Deflake TestManifestCaching.testWeakFileIOReferenceCleanUp by @rizaon in #5862
  • AWS: Fix NotSerializableException when using AssumeRoleAwsClientFactory in Spark by @JonasJ-ap in #5939
  • API: Provide better error message for invalid FileFormat enum by @nastra in #5918
  • Api: Optimize the code by @linfey90 in #5733
  • Docs: the table name should be the same as sql create table name by @mggger in #5962
  • Core: Make testEnvironmentSubstitution effective when USER is not set by @dimas-b in #5770
  • API: Fix estimated row count in ContentScanTask by @wypoon in #5755
  • Core: Clear queue and future task when close ParallelIterable by @Heltman in #5887
  • Core: Expire Snapshots reachability analysis by @amogh-jahagirdar in #5669
  • Spark 3.3: Split SparkScan and SparkBatch by @aokolnychyi in #5934
  • Core/Spark: Fix kryo deserialization of SerializableTable by @Kontinuation in #5975
  • Flink: revise unit test of FlinkUpsert so the table is partitioned by date by @lvyanquan in #5486
  • Spark: Improve performance of expire snapshot by not double-scanning retained Snapshots by @szehon-ho in #3457
  • Docs: Fix incorrect glue catalog class name for Hive by @singhpk234 in #5973
  • Core: Fix confusing log from RemoveSnapshots by @ajantha-bhat in #5478
  • API: Add BatchScan to Table by @aokolnychyi in #5922
  • Docs: Typo in loading table from DataFrameReader by @szehon-ho in #5978
  • Api: Fix transforms.day() returns a format document and javadoc by @xuzhiwen1255 in #5980
  • AwsProperties prints format specifier in IllegalArgumentException message by @szlta in #5995
  • Spark: Fix DATE_ADD expression in IcebergSourceFlatParquetDataWriteBenchmark by @dramaticlly in #5991
  • Support performing merge appends and delete files on branches by @amogh-jahagirdar in #5618
  • Bump Nessie from 0.43.0 to 0.44.0 by @snazy in #6008
  • Doc: Fix typos related to date transforms by @fb913bf0de288ba84fe98f7a23d35edfdb22381 in #5992
  • Spark: Remove backup table after a successful migrate action. by @sririshindra in #5622
  • Core: Fix NPE for parent snapshot does not exist by @hililiwei in #6005
  • Flink: Fix NoClassDefFound with Flink runtime jar / Add integration test by @nastra in #6001
  • Spark 3.2: Use ScanTaskGroup methods when computing stats by @aokolnychyi in #6011
  • Spark 3.2: Add SparkChangelogTable by @aokolnychyi in #6013
  • Spark 3.2: Remove redundant imports in SparkScan by @aokolnychyi in #6016
  • Core: Fix TestSnapshotUtil time random disorder by @hililiwei in #6015
  • Spark 3.2: Split SparkScan and SparkBatch by @aokolnychyi in #6014
  • Core: Parallelize the determining of reachable manifests during file cleanup by @amogh-jahagirdar in #5981
  • Orc: Support row group bloom filters by @deadwind4 in #5313
  • Core,Spark: Refactor to move "copy-on-write" and "merge-on-read" literals to constants by @gaborkaszab in #6006
  • [python_legacy] BOTO_STS_CLIENT lazy initialization by @puchengy in #5930
  • Core: Don't fail scan planning if REST metric reporting fails by @nastra in #6023
  • Nessie: no longer push whole metadata JSON to Nessie by @snazy in #5999
  • Core: Deprecate HTTPClientFactory / Allow configuring ObjectMapper for HTTPClient by @nastra in #5998
  • Closes #5988 - Allow configuration of Hive MetastoreClient using Catalog properties by @pavibhai in #5989
  • docs:Add an example of CTAS with PARTITIONED BY (rebased, fix #3854) by @samredai in #6020
  • Hive: Set the Table owner on table creation by @gaborkaszab in #5763
  • Replace Assert.fail usage with AssertJ fluent testing by @nastra in #6029
  • Replace and ban hamcrest usage by @nastra in #6030
  • API: Update expression sanitization for relative dates and times by @rdblue in #5944
  • Core: Rename TableTestBase.Assertions to not conflict with AssertJ Assertions by @nastra in #6022
  • Add section on semantic versioning and deprecations by @danielcweeks in #6032
  • Core: Increase inferred column metrics limit to 100 by @rdblue in #5916
  • Build: Bump mkdocs from 1.3.1 to 1.4.1 in /python by @dependabot in #6033
  • API,Core: Move ScanReport to core module / extract TimerResult/CounterResult/ScanMetricsResult into own classes by @nastra in #6037
  • Spark 3.3: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly by @wypoon in #6026
  • Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly by @wypoon in #6041
  • add Aggregate Expressions by @huaxingao in #5961
  • Flink: Add Sink options to override the compression properties of the Table by @pvary in #6049
  • Core: Add file seq number to ManifestEntry by @aokolnychyi in #6002
  • Spark 3.1: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly by @wypoon in #6046
  • Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport by @nastra in #6047
  • Spark 3.2: #6041 follow-up/cleanup by @wypoon in #6063
  • Build: Update Spark to 3.3.1 by @wangyum in #5783
  • Build: Bump pytest from 7.1.3 to 7.2.0 in /python by @dependabot in #6080
  • Build: Bump pyarrow from 9.0.0 to 10.0.0 in /python by @dependabot in #6081
  • Build: Bump zstandard from 0.18.0 to 0.19.0 in /python by @dependabot in #6082
  • PyArrow should convert timestamps to microseconds. by @joshuarobinson in #6070
  • Spark 3.3: Use separate scan during file filtering in copy-on-write operations by @aokolnychyi in #6077
  • Spark: Remove redundant check for max_concurrent_deletes in spark actions by @ajantha-bhat in #6083
  • Infra: Publish nightly build for Spark-3.3_2.13 by @ajantha-bhat in #6054
  • Infra: Update slack invite link by @ajantha-bhat in #6052
  • Docs: Fix link in the Java Custom Catalog page by @Jonathan-Rosenberg in #6068
  • Infra: Add 1.0.0 in issue template dropdown by @ajantha-bhat in #6057
  • Flink: Remove Flink 1.13 by @hililiwei in #6103
  • Core,Spark: Fix raw generics usage of ManifestWriter by @nastra in #6059
  • Spark 3.2: Use separate scan during file filtering in copy-on-write ops by @aokolnychyi in #6095
  • Spark 3.3: Relocate all Netty dependencies by @aokolnychyi in #6107
  • Spark 3.2: Relocate all Netty classes by @aokolnychyi in #6109
  • Spark: Optimize Preconditions.checkArgument in procedures by @ajantha-bhat in #6096
  • Docs: Update spotless apply command for non-default versions by @ajantha-bhat in #6101
  • Core: Improve collection handling in JsonUtil by @nastra in #6051
  • Build: Add gaborkaszab as a collaborator by @gaborkaszab in #6036
  • Flink: Add support for Flink 1.16 by @hililiwei in #6092
  • Core: Avoid reading ManifestFile when create ManifestReader by @ConeyLiu in #5632
  • Struct fields should be provided to Schema constructor by @ddrinka in #6115
  • Remove Fokko from the list of collaborators by @Fokko in #6119
  • Use Java collections in AwsProperties to fix Kryo serialization. by @jfz in #5812
  • [Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs. by @sririshindra in #6025
  • [Core | Spark] Strip trailing slash from custom metadatalocation by @singhpk234 in #6121
  • Build: Bump mkdocs from 1.4.1 to 1.4.2 in /python by @dependabot in #6130
  • API: Hash floats -0.0 and 0.0 to the same bucket by @fb913bf0de288ba84fe98f7a23d35edfdb22381 in #6110
  • Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds by @ajantha-bhat in #6093
  • Support 2-level list and maps type in RemoveIds. by @SinghAsDev in #6064
  • Fix TestAggregateBinding by @huaxingao in #6065
  • SparkBatchQueryScan logs too much - #6106 by @Omega359 in #6108
  • Fix typo in _ManifestEvalVisitor.visit_equal by @ddrinka in #6117
  • Flink: Optimize test code of TestSourceUtil by @lvyanquan in #6143
  • Spark-3.0: Remove spark/v3.0 folder by @ajantha-bhat in #6094
  • Fixes read metadata table failed due to illegal character by @ConeyLiu in #4577
  • Core: Pass purgeRequested flag to REST server by @nastra in #6073
  • Build: Let revapi compare API compatibility against apache-iceberg-1.0.0 by @ajantha-bhat in #6053
  • Core: Rename HMS_TABLE_OWNER to follow naming convention by @gaborkaszab in #6154
  • Docs: Update spotless apply command by @lvyanquan in #6157
  • Nessie: Use unique path for different table with same name by @ajantha-bhat in #4826
  • Spark Integration to read from Snapshot ref by @namrathamyske in #5150
  • Cache dropStats result for ManifestReader iterator by @manuzhang in #5836
  • Core: Reduce code duplication around writing JSON collections by @nastra in #6113
  • Core: Sync client/server properties in REST catalog by @rdblue in #6150
  • Flink: Port #6049 to Flink 1.14 to add Sink options of compression properties by @lvyanquan in #6166
  • Build: Bump jackson-annotations from 2.13.4 to 2.14.0 by @dependabot in #6129
  • Build: Add -DallVersions property that exposes all component versions by @nastra in #6167
  • Core,Spark: Add metadata to Scan Report by @nastra in #6058
  • Fix typo in unused python iceberg paramter by @alec-heif in #6173
  • AWS: Fix catalog names in LakeFormationTestBase by @aajisaka in #5767
  • Spark: Backport setting the EnvironmentContext for Spark by @nastra in #6183
  • Flink: Add engine name/version to EnvironmentContext by @nastra in #6184
  • Core: Add Iceberg version to EnvironmentContext by @nastra in #6185
  • Core: Add a util method to combine tasks by partition by @sunchao in #2276
  • Spark: Fix QueryFailure when running RewriteManifestProcedure on Date partitioned table by @singhpk234 in #5860
  • Build: Enable revapi on core/parquet/orc/common/data modules & fix API breaks by @nastra in #6146
  • Spark 3.3: Preserve file seq numbers while rewriting manifests by @aokolnychyi in #6176
  • Docs: fix link of Write options in Flink by @lvyanquan in #6191
  • Core: Remove unused toTaskGroupStream from TableScanUtil by @sunchao in #6189
  • Spark 3.2: Preserve file seq numbers while rewriting manifests by @aokolnychyi in #6192
  • Spark 3.1: Preserve file seq numbers while rewriting manifests by @aokolnychyi in #6193
  • Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog by @lvyanquan in #6111
  • Core: Method for building grouping key type by @aokolnychyi in #6163
  • Revert "Hive: Forward catalog-specific Hive configuration properties … by @pavibhai in #6187
  • Core: Add time zone info to LocalDate in ExpressionUtil tests by @nastra in #6200
  • REST: Assign metadata UUID on create transaction by @bryanck in #6201
  • API, Core: Move micros and days conversions to DateTimeUtil by @aokolnychyi in #6199
  • Core: Remove redundant initialization by @krvikash in #6178
  • Extract Flink package version programmatically for EnvironmentContext… by @stevenzwu in #6206
  • Flink: Add unit test for FlinkPackage util class by @stevenzwu in #6213
  • Parquet: Fixes get null values for the nested field partition column by @ConeyLiu in #4627
  • API: Make the PartitionSpec less lazy by @Fokko in #6220
  • Spark: Add missing override by @Fokko in #6227
  • API: Ignore case when comparing truncate by @Fokko in #6226
  • Release: Fix the version template by @Fokko in #6195
  • Replace ImmutableMap.Builder.build() with buildOrThrow() by @krvikash in #6212
  • Allow dropping a column used by old SortOrders but not current SortOrder by @islamismailov in #6211
  • Nessie: Refactor NessieTableOperations#doCommit by @ajantha-bhat in #6240
  • API: Restore the type of the identity transform by @Fokko in #6242

New Contributors

  • @waifairer made their first contribution in #5299
  • @hrishisd made their first contribution in #5288
  • @palaniappa made their first contribution in #5309
  • @Mehul2500 made their first contribution in #5037
  • @naushadh made their first contribution in #5330
  • @abmo-x made their first contribution in #5317
  • @skadyan made their first contribution in #5352
  • @lvyanquan made their first contribution in #5484
  • @namrathamyske made their first contribution in #4926
  • @price-qian made their first contribution in #5555
  • @dotjdk made their first contribution in #5624
  • @yabola made their first contribution in #5510
  • @xuzhiwen1255 made their first contribution in #5642
  • @joshuarobinson made their first contribution in #5717
  • @rizaon made their first contribution in #4518
  • @viirya made their first contribution in #5844
  • @gaborkaszab made their first contribution in #5726
  • @pavibhai made their first contribution in #5778
  • @Kontinuation made their first contribution in #5776
  • @linfey90 made their first contribution in #5733
  • @mggger made their first contribution in #5962
  • @Heltman made their first contribution in #5887
  • @fb913bf0de288ba84fe98f7a23d35edfdb22381 made their first contribution in #5992
  • @wangyum made their first contribution in #5783
  • @Jonathan-Rosenberg made their first contribution in #6068
  • @ddrinka made their first contribution in #6115
  • @Omega359 made their first contribution in #6108
  • @hendrikmakait made their first contribution in #6135
  • @foarsitter made their first contribution in #6158
  • @alec-heif made their first contribution in #6173
  • @aajisaka made their first contribution in #5767
  • @krvikash made their first contribution in #6178
  • @LuigiCerone made their first contribution in #6159
  • @islamismailov made their first contribution in #6211

Full Changelog: apache-iceberg-0.14.0...apache-iceberg-1.1.0

Don't miss a new iceberg release

NewReleases is sending notifications on new releases.