apache/iceberg apache-iceberg-1.1.0 on GitHub

What's Changed

log warning when filter conversion/bind expression fail by @huaxingao in #5254
Build: Upgrade test dependencies to latest version by @XN137 in #5210
Update License Header by @nastra in #5265
Flink: port #4943 to flink 1.13 and 1.14 by @chenjunjiedada in #5263
Build: unify github action versions by @XN137 in #5211
Build: Exclude unnecessary git properties from iceberg-build.properties by @singhpk234 in #5277
only log error message(not the stack trace) if filter conversion/bind expression fail by @huaxingao in #5274
[SPARK] Update Spark 2.4 JMH Benchmark Instructions to Updated Module Name iceberg-spark-2.4 by @kbendick in #5189
Dell: Fix bugs during documenting by @wang-x-xia in #5059
typos depracated to deprecated by @20100507 in #5285
Build: Fix Scala 2.13 builds in stage-binaries.sh by @rdblue in #5270
Build: Add iceberg-build.properties to RAT excludes by @rdblue in #5262
AWS: Dynamo Catalog: Pass CommitFailedException up the stack without wrapping by @waifairer in #5299
Build: Use Google Java Format for spotless by @nastra in #5266
AWS: handle s3 and glue exceptions more gracefully as user errors by @xingfanx in #5304
AWS: Reduce TestS3FileIO prefix test scale factors. Replace with S3FileIO integration tests. by @amogh-jahagirdar in #5289
Prevent usage of @test(expected = ...) and change existing tests by @nastra in #5221
API: CloseableIterable.concat evaluates first item twice by @nastra in #5306
API: Introduce DefaultMetricsContext and Timer interface by @nastra in #5286
Docs: Fix Flink Connector docs with custom catalog by @nastra in #5045
Spark: Correct SparkCatalog javadoc for supplying custom catalog by @hrishisd in #5288
StreamingDelete constructor can be called by subclasses by @gustavoatt in #5271
#5308 - Avoid 2 reads on manifest by @palaniappa in #5309
Build: Upgrade slf4j to 1.7.36 by @nastra in #5320
Java: Catch NPE if the type isn't set by @Fokko in #5291
Build: Upgrade Guava to 31.1-jre by @nastra in #5322
Core: Add MetadataLog metadata table by @singhpk234 in #5063
Core: Implement BaseMetastoreCatalog.registerTable() by @Mehul2500 in #5037
Spec: Add sequence-number and parent-snapshot-id by @Fokko in #5196
Build: Let revapi compare API compatibility against apache-iceberg-0.14.0 by @nastra in #5336
Build: Use 'apache-iceberg-' tag prefix to figure out SNAPSHOT version by @nastra in #5341
Ignore case for partition transform by @southernriver in #5335
AWS: Fix #2796 - avoid S3 error of "Resetting to invalid mark" by re-creating input stream on retries by @jfz in #5282
Bump zstandard from 0.17.0 to 0.18.0 in /python by @dependabot in #5342
Core: Print DateTime strings always with +00:00 Zone offset by @nastra in #5337
API: Deprecate Counter#count() / Add Counter#value() by @nastra in #5328
AWS: Add LakeFormation Integration tests by @xiaoxuandev in #4423
More Python Expressions by @CircArgs in #5258
Core: Support _deleted metadata column in vectorized read by @flyrain in #4888
Make glue endpoint configurable #5095 by @naushadh in #5330
typo mulitple -> multiple by @20100507 in #5354
Add table spec changes for statistics information in table snapshot by @findepi in #4945
Core: Support building custom tasks in ManifestGroup by @aokolnychyi in #5301
Substitute the hard code string for a constants by @zhaomin1423 in #5347
Improve bloom filter test by @huaxingao in #5329
Spark: Adopt the new Scan Task APIs in Spark Readers by @flyrain in #5248
unit test should always verify mock invocation by @abmo-x in #5317
AWS - Fix ErrorProne warning of malformed Javadoc by @kbendick in #5359
ICEBERG-4346: Better handling of Orphan files by @karuppayya in #4652
Spark 3.2: Support different task types in readers by @flyrain in #5363
Build: Quicker project evaluation (memoize project version) by @snazy in #5051
Nessie: Do not delete default branch in tests by @snazy in #5193
Core: Add base implementations for changelog tasks by @aokolnychyi in #5300
Build: Enforce spotless & spotlessApply by @nastra in #5312
Update spec.md to fix broken link to parquet-format/blob/master/LogicalTypes by @skadyan in #5352
Flink: bridge the gap btw FlinkSource and IcebergSource (FLIP-27) and… by @stevenzwu in #5318
Flink: port PR #5318 to 1.14 by @stevenzwu in #5344
Parquet: Add option to set page row count limit by @bryanck in #5345
AWS: Fix S3FileIO#prefixList integration test by @amogh-jahagirdar in #5383
Spark 3.3: Add prefix mismatch mode for deleting orphan files by @karuppayya in #5385
Core: Remove TestEnvironmentUtil#testEnvironmentSubstitution() as it is bui… by @stevenzwu in #5353
API: Track name/unit in Counters/Timers by @nastra in #5386
Flink: avoid converting Iceberg MetricContext to Flink metrics in FLI… by @stevenzwu in #5393
Bump fastavro from 1.5.3 to 1.5.4 in /python by @dependabot in #5396
Github: Add issue form by @Fokko in #4867
Infra - Add a GH Action to Mark and Close Stale Issues by @kbendick in #4949
Flink: Support write options in the in-line insert SQL comments by @hililiwei in #5050
AWS: Call abortUpload only once when any of the completable future fails by @singhpk234 in #5366
Spark - Add Spark FunctionCatalog by @kbendick in #5377
API: Assign the right field ids when merging schema, #5394 by @karuppayya in #5395
[Spark] - Backport FunctionCatalog to Spark 3.2 by @kbendick in #5411
Nessie: Bump to 0.40.3 by @snazy in #5406
S3OutputStream - failure to close should persist on subsequent close calls by @abmo-x in #5311
[Core | Docs]: [FOLLOWUP] Add metadata_log_entries metadata table by @singhpk234 in #5367
AWS: S3FileIo Integration test include UUID prefix in Prefix integration tests by @amogh-jahagirdar in #5413
Core: Implement default value parsing and unparsing by @rzhang10 in #4871
Infra - Upgrade Stale GH Action to Latest 5.1.1 by @kbendick in #5420
API/Core: Initial Table Scan Reporting support by @nastra in #5268
Core: Simplify scan planning & reporting tests by @nastra in #5428
Hive: Fix concurrent transactions overwriting commits by adding hive lock heartbeats. by @SinghAsDev in #5036
Hive: Hadoop Path fails on s3 endpoint by @Fokko in #5405
API: changes to honour schema filed name's case by @karuppayya in #5440
Spark: Spark changes to honour schema filed name's case by @karuppayya in #5441
Core: Implement IncrementalChangelogScan without deletes by @aokolnychyi in #5382
Core, API: Add getting refs and snapshot by ref to the Table API by @amogh-jahagirdar in #4428
Flink: missed IcebergSourceReader group in PR #5393 for FLIP-27 source reader metrics by @stevenzwu in #5401
API: Avoid unnecessary wrapping of CloseableIterable.iterator() by @nastra in #5446
Doc: update Flink doc for using the new experimental FLIP-27 source by @stevenzwu in #5423
Spark 3.3: Delete file counts while deleting reachable files by @aokolnychyi in #5451
Bump coverage from 6.4.2 to 6.4.3 in /python by @dependabot in #5454
Move base.py for table to init by @samredai in #5458
CI: Enable dependabot for Github Actions by @Fokko in #5429
API: Improve/align error messaging in CloseableIterable/CloseableIterator by @nastra in #5433
Bump actions/setup-python from 3 to 4 by @dependabot in #5460
CI: Enable dependabot for gradle by @nastra in #5464
AWS: Cleanup warning about Lambda should be method reference by @amogh-jahagirdar in #5476
Bump nebula.dependency-recommender from 9.0.2 to 11.0.0 by @dependabot in #5474
Bump pyarrow from 8.0.0 to 9.0.0 in /python by @dependabot in #5455
Core: Partition filter pushdown for entries table by @szehon-ho in #5443
AWS: Fix/Suppress ErrorProne warnings by @nastra in #5368
Bump hiveVersion from 3.1.2 to 3.1.3 by @dependabot in #5470
Doc: Update web page of Flink unit test (#5480) by @lvyanquan in #5484
Spark 3.3: Use typed beans in BaseSparkAction by @aokolnychyi in #5469
Core: Prevent potential NPEs when retrieving JSON fields by @nastra in #5438
Spark 3.2: Count delete files in DeleteReachableFiles by @aokolnychyi in #5491
Spark 3.2: Use typed beans in BaseSparkAction by @aokolnychyi in #5494
AWS: Use executor service by default when performing batch deletion of files by @amogh-jahagirdar in #5379
Core, API: Performing operations on a snapshot branch ref by @namrathamyske in #4926
Spark: Support truncate in FunctionCatalog by @kbendick in #5431
Spark 3.1:Port #3721 to Spark 3.1 by @hililiwei in #5497
Spark 3.1:Port #3287 #4381 #3535 #4419 to Spark 3.1 by @hililiwei in #5498
Spark 3.1:Port #4198 to Spark 3.1 by @hililiwei in #5499
Spark 3.1:Port #3373 to Spark 3.1 by @hililiwei in #5500
Spark 3.1:Port #3491 to Spark 3.1 by @hililiwei in #5502
Spark 3.1:Port #3456 to Spark 3.1 by @hililiwei in #5501
Spark 3.2: Support truncate in FunctionCatalog by @kbendick in #5514
Build: Bump spotless-plugin-gradle from 6.8.0 to 6.9.1 by @dependabot in #5521
Build: Bump pydantic from 1.9.1 to 1.9.2 in /python by @dependabot in #5522
SetSnapshotOperation should commit empty operations too by @szlta in #5536
AWS: Support preload S3 client mode for S3FileIO by @xiaoxuandev in #5508
Build: Resolve unchecked Map type cast in TestAvroNameMapping by @JonasJ-ap in #5541
Fix linter and test failures by @samredai in #5542
Flink 1.13&1.14: Port #5050 to Flink 1.13&1.14 by @hililiwei in #5531
API: Deprecate generic Counter and replace with simpler Counter API by @nastra in #5505
API/Core: Scan reporting result wrappers and parsers by @nastra in #5427
Build: Bump gradle-git-version from 0.12.3 to 0.15.0 by @nastra in #5532
Core: Put property names at the end in JsonUtil error messages by @nastra in #5434
Replace deprecated Counter with new Counter API by @nastra in #5506
Spark 3.1:Port #3505 to Spark 3.1 by @hililiwei in #5503
Flink - Suppress Nanosecond Warning for TimestampTz ORC writer by @kbendick in #5552
Core: Add some tests for JsonUtil & reduce duplicated code by @nastra in #5526
Add s3.acceleration-enbled flag to AwsProperties by @price-qian in #5555
Spark 3.3: Reduce serialization in DeleteOrphanFilesSparkAction by @aokolnychyi in #5495
API: Remove counter name by @nastra in #5559
Spark 3.3 - Support bucket in FunctionCatalog by @kbendick in #5513
Spark 3.2: Support bucket in FunctionCatalog by @kbendick in #5571
Core: Don't clear snapshot log when intermediate snapshots are detected by @nastra in #5568
Flink: fix the bug where metrics are registered in split reader. Also updated reader metric group to be more consistent with Flink metrics style. by @stevenzwu in #5554
Spark 3.3: Align formatting in bucket and truncate functions by @aokolnychyi in #5573
Spark 3.2: Reduce serialization in DeleteOrphanFilesSparkAction by @aokolnychyi in #5572
ORC: Upgrade to 1.7.6 by @williamhyun in #5580
Spark 3.2: Delete deprecated action classes by @aokolnychyi in #5575
Build: Bump gradle-processors from 3.3.0 to 3.7.0 by @dependabot in #5582
Spark 3.2: Align formatting in bucket and truncate functions by @kbendick in #5574
Core: Make a shorthand for the rest catalog by @Fokko in #5570
Flink: add monitor metrics for Flink sink by @stevenzwu in #5410
API: add Histogram metric type by @stevenzwu in #5348
Flink: port PR #5410 to 1.14 for sink monitoring metrics by @stevenzwu in #5589
Build: Bump jackson-annotations from 2.6.5 to 2.13.3 by @dependabot in #5596
Build: Bump coverage from 6.4.3 to 6.4.4 in /python by @dependabot in #5599
Add Fokko as a collaborator by @Fokko in #5600
Build: enforce LambdaMethodReference check at compile-time by @XN137 in #5529
API: Extend FileIO in optional interfaces by @aokolnychyi in #5576
Flink - Fix Malformed Inline Tag in ContinuousSplitPlannerImpl JavaDoc by @kbendick in #5551
Core: Add expression JSON parser by @rdblue in #5602
Deps: Bump AWS SDK by @Fokko in #5612
Build: Bump tezVersion from 0.10.1 to 0.10.2 by @dependabot in #5520
Docs: Flink Streaming upsert write by @hililiwei in #5380
Fix message pattern in checkArgument invocation by @findepi in #5621
Core: Add snapshot references metadata table by @rajarshisarkar in #4807
Add table metadata changes for statistics information in table metadata by @findepi in #5450
Docs: Added missing doc for REPLACE PARTITION FIELD by @dotjdk in #5624
Core: Transform parquet bloom filter props when updating schema. by @zhongyujiang in #5426
AWS: Deprecate AwsClientFactories.s3Configuration() by @price-qian in #5592
Add SparkV2Filters by @huaxingao in #5302
API: Deprecate old incremental append scans by @aokolnychyi in #5577
Remove deprecations for Rollback and Overwrite Files by @danielcweeks in #5639
Core: Use Bulk Delete when dropping table data and metadata by @amogh-jahagirdar in #5459
Deprecations for 1.0 release: MR properties by @danielcweeks in #5657
Deprecations for 1.0 release: Aliyun OSS by @danielcweeks in #5654
Build: Bump spotless-plugin-gradle from 6.9.1 to 6.10.0 by @dependabot in #5650
Docs: Switch post- and pre- around by @Fokko in #5633
Deprecations for 1.0 release: remove dynamo lock manager and props by @danielcweeks in #5655
AWS: Add s3.dualstack-enabled flag to AwsProperties by @JonasJ-ap in #5644
Add API changes for statistics information in table metadata by @findepi in #5021
AWS: fix the wrong flag used for s3UseArnRegionEnabled by @JonasJ-ap in #5680
Spark: Add Changelog reader for copy-on-write by @flyrain in #5578
Spark 3.2: Add row-based changelog reader by @flyrain in #5682
[Python] FsspecFileIO, a FileIO that wraps any fsspec compliant filesystem by @samredai in #5332
Core: Fix exception handling in BaseTaskWriter by @rdblue in #5683
Support delete corrupted Iceberg table by @yabola in #5510
[Core | Spark | Integrations] : Fix kryo serialization failure for FileIO by @singhpk234 in #5437
Parquet: close zstd input stream early to avoid memory pressure by @bryanck in #5681
Spark: Fix stats in rewrite metadata action by @rdblue in #5691
Docs: Update docs to reflect AWS SDK version presently being used by @singhpk234 in #5661
Doc: Update doc to display the results of the table partitions query by @lvyanquan in #5662
Core: Add CommitStateUnknownException handling to REST by @rdblue in #5694
API: Remove source type from Transform by @rdblue in #5601
Spark: Add custom metric for number of deletes applied by a SparkScan by @wypoon in #4588
Flink: fix missing generic types for some IcebergSource$Builder methods by @stevenzwu in #5697
API/Core: Include Expression filter in ScanReport by @nastra in #5705
Bump avro from 1.9.2/1.10.2 to 1.11.1 by @nastra in #5483
Core: Avoid useless metadata retries. by @rdblue in #5696
Build: Bump pytest from 7.1.2 to 7.1.3 in /python by @dependabot in #5703
Build: Bump jackson-annotations from 2.13.3 to 2.13.4 by @dependabot in #5702
Build: Bump jmh-gradle-plugin from 0.6.6 to 0.6.7 by @dependabot in #5700
Update ORC to 1.8.0 by @williamhyun in #5699
Build: Enforce logging conventions with errorprone by @XN137 in #5528
Build: Upgrade to Gradle 7.5.1 by @XN137 in #5278
Flink: Fixed an issue where Flink batch entry was not accurate by @xuzhiwen1255 in #5642
Dell: Add document. by @wang-x-xia in #4993
Nessie: Prevent accidental deletion of files which are still referenced by other branches/tags by @ajantha-bhat in #5718
Flink: Fixed an issue where Flink1.14 batch entry was not accurate by @xuzhiwen1255 in #5716
API: Add rowsCount to ScanTask by @aokolnychyi in #5720
Docs: Add snapshot references metadata table by @rajarshisarkar in #5725
Flink: Fixed an issue where Flink1.13 batch entry was not accurate by @xuzhiwen1255 in #5731
Build: Bump fastavro from 1.6.0 to 1.6.1 in /python by @dependabot in #5745
Build: Bump pydantic from 1.10.1 to 1.10.2 in /python by @dependabot in #5744
API: Use hashCode instead of hash by @Fokko in #5751
AWS: Preload S3 client in GlueCatalog For LakeFormation enabled tables by @xiaoxuandev in #5756
Build - Remove unused global flink dependency from versions.props by @kbendick in #5758
Build - Move global Spark 2.4 dependency in version.props to Spark 2.4 subproject by @kbendick in #5759
CI: Fix names and jobs by @Fokko in #5749
JdbcCatalog don't override namespace location if set by @danielcweeks in #5737
Spark: Fix runtime jars packaging scala library files by @ajantha-bhat in #5754
Build: relocate httpclient5 dependency for runtime jars by @ajantha-bhat in #5761
AWS: Refactor util methods for applying AWS clients configurations by @JonasJ-ap in #5684
Bump actions/stale from 5.1.1 to 5.2.0 by @dependabot in #5785
Bump spotless-plugin-gradle from 6.10.0 to 6.11.0 by @dependabot in #5786
PyArrow support for S3/S3A with properties by @joshuarobinson in #5747
REST: implement handling of OAuth error responses by @bryanck in #5698
AWS: Allow users to set the assume role session name by @JonasJ-ap in #5765
Revert "REST: implement handling of OAuth error responses (#5698)" by @danielcweeks in #5810
Flink 1.14&1.15 backport: Set custom Hadoop configuration by @lvyanquan in #5775
API/Core: Make ScanReport and its related classes Immutable by @nastra in #5780
API: Remove unneeded class variable by @Fokko in #5805
Core: Serialize statistics files in TableMetadata by @findepi in #5799
Core: Reduce duplicated code in JSON Parsers by @nastra in #5802
API,Core: Add scan planning metrics for skipped data/delete files by @nastra in #5788
Github: Update issue template with latest release by @Fokko in #5818
Core: Use JsonUtil.generate in ErrorResponseParser by @nastra in #5816
Build: Fix CI paths by @Fokko in #5821
Build: Add the path to the Action yaml by @Fokko in #5828
Build: Apply spotless on integration modules as well by @nastra in #5827
Don't check row filter when deciding whether to copy data file with stats by @manuzhang in #5815
Add a BoundBooleanExpressionVisitor for visiting bound expressions by @samredai in #5303
REST: implement handling of OAuth error responses followup by @bryanck in #5820
Add REST Servlet/Server Implementations by @danielcweeks in #5781
AWS: update AWS Integration Test to fix false positives by @JonasJ-ap in #5784
API/Core: Remove deprecated methods from Snapshot API by @nastra in #5734
Build: Bump Rat to 0.15 by @Fokko in #5839
AWS: Add socket connection timeout for Apache Http Builder by @JonasJ-ap in #5787
Core: Add strict-mode property to JDBC Catalog by @nastra in #5830
core: Provide mechanism to cache manifest file content by @rizaon in #4518
Support setting table statistics by @findepi in #5794
Core: Ignore TestManifestCaching#testWeakFileIOReferenceCleanUp untl it's fixed by @nastra in #5865
Spark: Fix MERGE INTO Query failure on tables with non-nullable columns by @singhpk234 in #5679
Docs: Make it clear metadata tables support time travel in Spark by @liuml07 in #4709
Doc: Update output of expire_snapshots procedure by @lvyanquan in #5866
Ensure the default value of hive.in.test to avoid overwriting by @viirya in #5844
API: Extended some deprecation comments in API folder by @gaborkaszab in #5726
Core: Deprecate functions in TableMetadata and DataWriter by @gaborkaszab in #5772
Core: Deprecate functions in DeleteWriters by @gaborkaszab in #5771
AWS: Add table and namespace S3 tags by @rajarshisarkar in #4402
Core: Avoid extra getFileStatus call in HadoopInputFile by @singhpk234 in #5864
Orc: Closes #5777 - Obtain ORC stripe offsets from writer by @pavibhai in #5778
Flink: add defensive check in IcebergFilesCommitter for restoring state by @stevenzwu in #5873
Spark 3.x: Backport snapshot references metadata table test by @rajarshisarkar in #5806
Build: Fix & Run spark integration tests on CI by @nastra in #5819
API,Core: Add scan planning metrics for scanned/skipped delete manifests by @nastra in #5792
Doc: Update the default value of table property read.parquet.vectorization.enabled by @Kontinuation in #5776
Bump Nessie to 0.43.0 by @snazy in #5807
Doc: Update default values of Lock catalog properties to avoid wrong way of filling. by @lvyanquan in #5708
Build: Bump hadoop-client from 3.1.0 to 3.3.4 by @dependabot in #5519
Spark: Bump Spark version for vulnerability by @deadwind4 in #5292
Expose table statistics in Table API by @findepi in #4741
[Python][Docs] Very small formatting fix by @samredai in #5868
Build: workflows cache gradle wrapper by @XN137 in #4165
API,Core: Add scan planning metrics for indexed/eq/pos delete files by @nastra in #5809
Build: Bump gradle-baseline-java from 4.0.0 to 4.42.0 by @nastra in #5530
Docs: Add table and namespace S3 tags doc by @rajarshisarkar in #5894
Retain table statistics during orphan files removal by @findepi in #5795
[Docs] Update drop table behavior in spark-ddl docs by @sumeetgajjar in #5645
Spark 3.3: Fix failing jmh benchmarks under org.apache.iceberg.spark.data.parquet package by @sumeetgajjar in #5635
Core: Only validate the current partition specs by @Fokko in #5707
Core: Add RESTScanReporter to send scan report to REST endpoint by @nastra in #5407
Build: Bump jmh-gradle-plugin from 0.6.7 to 0.6.8 by @dependabot in #5850
Build: Bump actions/stale from 5.2.0 to 6.0.0 by @dependabot in #5851
Build: Bump jinja2 from 3.0.3 to 3.1.2 in /python by @dependabot in #5849
Build: Bump coverage from 6.4.4 to 6.5.0 in /python by @dependabot in #5904
Build: Bump rich from 12.5.1 to 12.6.0 in /python by @dependabot in #5905
Build: Bump pytest-checkdocs from 2.7.1 to 2.8.1 in /python by @dependabot in #5903
Core: Rename misleading local variable in planFiles() by @gaborkaszab in #5889
INFRA: Avoid running engine tests on ISSUE_TEMPLATE update by @singhpk234 in #5859
Core, API: Support scanning from refs by @amogh-jahagirdar in #5364
Spark: Set the version explicitly by @Fokko in #5907
API: Make COUNT default unit when creating a Counter by @nastra in #5912
Core: Reuse PositionDelete by @nastra in #5896
Spark 3.3: Fix nullability in merge-on-read projections by @aokolnychyi in #5880
Spark 3.2: Fix nullability in merge-on-read projections by @aokolnychyi in #5917
Replace & Ban ExpectedException usage by @nastra in #5921
API: Handle negative/zero during num-digits calculation by @nastra in #5928
Core: Provide better error message on invalid enums by @nastra in #5910
Reduce 'Scanning table' log verbosity for long IN list by @findepi in #5908
Core: Deprecate write.manifest-lists.enabled flag by @nastra in #5773
Spark 3.3: Add SparkChangelogTable by @aokolnychyi in #5740
Core: Add dataSequenceNumber to ManifestEntry by @aokolnychyi in #5913
AWS: Add socket connection timeout for UrlConnectionHttpClient by @JonasJ-ap in #5900
AWS: Add additional configurations for ApacheHttpClientBuilder by @JonasJ-ap in #5899
Docs: Add doc for HTTP client configurations by @JonasJ-ap in #5902
Build: Bump actions/stale from 6.0.0 to 6.0.1 by @dependabot in #5940
Build: Bump pytest-checkdocs from 2.8.1 to 2.9.0 in /python by @dependabot in #5941
Core: Deflake TestManifestCaching.testWeakFileIOReferenceCleanUp by @rizaon in #5862
AWS: Fix NotSerializableException when using AssumeRoleAwsClientFactory in Spark by @JonasJ-ap in #5939
API: Provide better error message for invalid FileFormat enum by @nastra in #5918
Api: Optimize the code by @linfey90 in #5733
Docs: the table name should be the same as sql create table name by @mggger in #5962
Core: Make testEnvironmentSubstitution effective when USER is not set by @dimas-b in #5770
API: Fix estimated row count in ContentScanTask by @wypoon in #5755
Core: Clear queue and future task when close ParallelIterable by @Heltman in #5887
Core: Expire Snapshots reachability analysis by @amogh-jahagirdar in #5669
Spark 3.3: Split SparkScan and SparkBatch by @aokolnychyi in #5934
Core/Spark: Fix kryo deserialization of SerializableTable by @Kontinuation in #5975
Flink: revise unit test of FlinkUpsert so the table is partitioned by date by @lvyanquan in #5486
Spark: Improve performance of expire snapshot by not double-scanning retained Snapshots by @szehon-ho in #3457
Docs: Fix incorrect glue catalog class name for Hive by @singhpk234 in #5973
Core: Fix confusing log from RemoveSnapshots by @ajantha-bhat in #5478
API: Add BatchScan to Table by @aokolnychyi in #5922
Docs: Typo in loading table from DataFrameReader by @szehon-ho in #5978
Api: Fix transforms.day() returns a format document and javadoc by @xuzhiwen1255 in #5980
AwsProperties prints format specifier in IllegalArgumentException message by @szlta in #5995
Spark: Fix DATE_ADD expression in IcebergSourceFlatParquetDataWriteBenchmark by @dramaticlly in #5991
Support performing merge appends and delete files on branches by @amogh-jahagirdar in #5618
Bump Nessie from 0.43.0 to 0.44.0 by @snazy in #6008
Doc: Fix typos related to date transforms by @fb913bf0de288ba84fe98f7a23d35edfdb22381 in #5992
Spark: Remove backup table after a successful migrate action. by @sririshindra in #5622
Core: Fix NPE for parent snapshot does not exist by @hililiwei in #6005
Flink: Fix NoClassDefFound with Flink runtime jar / Add integration test by @nastra in #6001
Spark 3.2: Use ScanTaskGroup methods when computing stats by @aokolnychyi in #6011
Spark 3.2: Add SparkChangelogTable by @aokolnychyi in #6013
Spark 3.2: Remove redundant imports in SparkScan by @aokolnychyi in #6016
Core: Fix TestSnapshotUtil time random disorder by @hililiwei in #6015
Spark 3.2: Split SparkScan and SparkBatch by @aokolnychyi in #6014
Core: Parallelize the determining of reachable manifests during file cleanup by @amogh-jahagirdar in #5981
Orc: Support row group bloom filters by @deadwind4 in #5313
Core,Spark: Refactor to move "copy-on-write" and "merge-on-read" literals to constants by @gaborkaszab in #6006
[python_legacy] BOTO_STS_CLIENT lazy initialization by @puchengy in #5930
Core: Don't fail scan planning if REST metric reporting fails by @nastra in #6023
Nessie: no longer push whole metadata JSON to Nessie by @snazy in #5999
Core: Deprecate HTTPClientFactory / Allow configuring ObjectMapper for HTTPClient by @nastra in #5998
Closes #5988 - Allow configuration of Hive MetastoreClient using Catalog properties by @pavibhai in #5989
docs:Add an example of CTAS with PARTITIONED BY (rebased, fix #3854) by @samredai in #6020
Hive: Set the Table owner on table creation by @gaborkaszab in #5763
Replace Assert.fail usage with AssertJ fluent testing by @nastra in #6029
Replace and ban hamcrest usage by @nastra in #6030
API: Update expression sanitization for relative dates and times by @rdblue in #5944
Core: Rename TableTestBase.Assertions to not conflict with AssertJ Assertions by @nastra in #6022
Add section on semantic versioning and deprecations by @danielcweeks in #6032
Core: Increase inferred column metrics limit to 100 by @rdblue in #5916
Build: Bump mkdocs from 1.3.1 to 1.4.1 in /python by @dependabot in #6033
API,Core: Move ScanReport to core module / extract TimerResult/CounterResult/ScanMetricsResult into own classes by @nastra in #6037
Spark 3.3: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly by @wypoon in #6026
Spark 3.2: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly by @wypoon in #6041
add Aggregate Expressions by @huaxingao in #5961
Flink: Add Sink options to override the compression properties of the Table by @pvary in #6049
Core: Add file seq number to ManifestEntry by @aokolnychyi in #6002
Spark 3.1: Ensure rowStartPosInBatch in ColumnarBatchReader is set correctly by @wypoon in #6046
Core: Replace projected Schema with schemaId/fieldIds/fieldNames in ScanReport by @nastra in #6047
Spark 3.2: #6041 follow-up/cleanup by @wypoon in #6063
Build: Update Spark to 3.3.1 by @wangyum in #5783
Build: Bump pytest from 7.1.3 to 7.2.0 in /python by @dependabot in #6080
Build: Bump pyarrow from 9.0.0 to 10.0.0 in /python by @dependabot in #6081
Build: Bump zstandard from 0.18.0 to 0.19.0 in /python by @dependabot in #6082
PyArrow should convert timestamps to microseconds. by @joshuarobinson in #6070
Spark 3.3: Use separate scan during file filtering in copy-on-write operations by @aokolnychyi in #6077
Spark: Remove redundant check for max_concurrent_deletes in spark actions by @ajantha-bhat in #6083
Infra: Publish nightly build for Spark-3.3_2.13 by @ajantha-bhat in #6054
Infra: Update slack invite link by @ajantha-bhat in #6052
Docs: Fix link in the Java Custom Catalog page by @Jonathan-Rosenberg in #6068
Infra: Add 1.0.0 in issue template dropdown by @ajantha-bhat in #6057
Flink: Remove Flink 1.13 by @hililiwei in #6103
Core,Spark: Fix raw generics usage of ManifestWriter by @nastra in #6059
Spark 3.2: Use separate scan during file filtering in copy-on-write ops by @aokolnychyi in #6095
Spark 3.3: Relocate all Netty dependencies by @aokolnychyi in #6107
Spark 3.2: Relocate all Netty classes by @aokolnychyi in #6109
Spark: Optimize Preconditions.checkArgument in procedures by @ajantha-bhat in #6096
Docs: Update spotless apply command for non-default versions by @ajantha-bhat in #6101
Core: Improve collection handling in JsonUtil by @nastra in #6051
Build: Add gaborkaszab as a collaborator by @gaborkaszab in #6036
Flink: Add support for Flink 1.16 by @hililiwei in #6092
Core: Avoid reading ManifestFile when create ManifestReader by @ConeyLiu in #5632
Struct fields should be provided to Schema constructor by @ddrinka in #6115
Remove Fokko from the list of collaborators by @Fokko in #6119
Use Java collections in AwsProperties to fix Kryo serialization. by @jfz in #5812
[Docs] Update migrate behaviour with respect to drop_table in spark-procedures docs. by @sririshindra in #6025
[Core | Spark] Strip trailing slash from custom metadatalocation by @singhpk234 in #6121
Build: Bump mkdocs from 1.4.1 to 1.4.2 in /python by @dependabot in #6130
API: Hash floats -0.0 and 0.0 to the same bucket by @fb913bf0de288ba84fe98f7a23d35edfdb22381 in #6110
Spark-3.0: Remove/update spark-3.0 mention from Docs and Builds by @ajantha-bhat in #6093
Support 2-level list and maps type in RemoveIds. by @SinghAsDev in #6064
Fix TestAggregateBinding by @huaxingao in #6065
SparkBatchQueryScan logs too much - #6106 by @Omega359 in #6108
Fix typo in _ManifestEvalVisitor.visit_equal by @ddrinka in #6117
Flink: Optimize test code of TestSourceUtil by @lvyanquan in #6143
Spark-3.0: Remove spark/v3.0 folder by @ajantha-bhat in #6094
Fixes read metadata table failed due to illegal character by @ConeyLiu in #4577
Core: Pass purgeRequested flag to REST server by @nastra in #6073
Build: Let revapi compare API compatibility against apache-iceberg-1.0.0 by @ajantha-bhat in #6053
Core: Rename HMS_TABLE_OWNER to follow naming convention by @gaborkaszab in #6154
Docs: Update spotless apply command by @lvyanquan in #6157
Nessie: Use unique path for different table with same name by @ajantha-bhat in #4826
Spark Integration to read from Snapshot ref by @namrathamyske in #5150
Cache dropStats result for ManifestReader iterator by @manuzhang in #5836
Core: Reduce code duplication around writing JSON collections by @nastra in #6113
Core: Sync client/server properties in REST catalog by @rdblue in #6150
Flink: Port #6049 to Flink 1.14 to add Sink options of compression properties by @lvyanquan in #6166
Build: Bump jackson-annotations from 2.13.4 to 2.14.0 by @dependabot in #6129
Build: Add -DallVersions property that exposes all component versions by @nastra in #6167
Core,Spark: Add metadata to Scan Report by @nastra in #6058
Fix typo in unused python iceberg paramter by @alec-heif in #6173
AWS: Fix catalog names in LakeFormationTestBase by @aajisaka in #5767
Spark: Backport setting the EnvironmentContext for Spark by @nastra in #6183
Flink: Add engine name/version to EnvironmentContext by @nastra in #6184
Core: Add Iceberg version to EnvironmentContext by @nastra in #6185
Core: Add a util method to combine tasks by partition by @sunchao in #2276
Spark: Fix QueryFailure when running RewriteManifestProcedure on Date partitioned table by @singhpk234 in #5860
Build: Enable revapi on core/parquet/orc/common/data modules & fix API breaks by @nastra in #6146
Spark 3.3: Preserve file seq numbers while rewriting manifests by @aokolnychyi in #6176
Docs: fix link of Write options in Flink by @lvyanquan in #6191
Core: Remove unused toTaskGroupStream from TableScanUtil by @sunchao in #6189
Spark 3.2: Preserve file seq numbers while rewriting manifests by @aokolnychyi in #6192
Spark 3.1: Preserve file seq numbers while rewriting manifests by @aokolnychyi in #6193
Flink: Add 'cache.expiration-interval-ms' option to FlinkCatalog by @lvyanquan in #6111
Core: Method for building grouping key type by @aokolnychyi in #6163
Revert "Hive: Forward catalog-specific Hive configuration properties … by @pavibhai in #6187
Core: Add time zone info to LocalDate in ExpressionUtil tests by @nastra in #6200
REST: Assign metadata UUID on create transaction by @bryanck in #6201
API, Core: Move micros and days conversions to DateTimeUtil by @aokolnychyi in #6199
Core: Remove redundant initialization by @krvikash in #6178
Extract Flink package version programmatically for EnvironmentContext… by @stevenzwu in #6206
Flink: Add unit test for FlinkPackage util class by @stevenzwu in #6213
Parquet: Fixes get null values for the nested field partition column by @ConeyLiu in #4627
API: Make the PartitionSpec less lazy by @Fokko in #6220
Spark: Add missing override by @Fokko in #6227
API: Ignore case when comparing truncate by @Fokko in #6226
Release: Fix the version template by @Fokko in #6195
Replace ImmutableMap.Builder.build() with buildOrThrow() by @krvikash in #6212
Allow dropping a column used by old SortOrders but not current SortOrder by @islamismailov in #6211
Nessie: Refactor NessieTableOperations#doCommit by @ajantha-bhat in #6240
API: Restore the type of the identity transform by @Fokko in #6242

New Contributors

@waifairer made their first contribution in #5299
@hrishisd made their first contribution in #5288
@palaniappa made their first contribution in #5309
@Mehul2500 made their first contribution in #5037
@naushadh made their first contribution in #5330
@abmo-x made their first contribution in #5317
@skadyan made their first contribution in #5352
@lvyanquan made their first contribution in #5484
@namrathamyske made their first contribution in #4926
@price-qian made their first contribution in #5555
@dotjdk made their first contribution in #5624
@yabola made their first contribution in #5510
@xuzhiwen1255 made their first contribution in #5642
@joshuarobinson made their first contribution in #5717
@rizaon made their first contribution in #4518
@viirya made their first contribution in #5844
@gaborkaszab made their first contribution in #5726
@pavibhai made their first contribution in #5778
@Kontinuation made their first contribution in #5776
@linfey90 made their first contribution in #5733
@mggger made their first contribution in #5962
@Heltman made their first contribution in #5887
@fb913bf0de288ba84fe98f7a23d35edfdb22381 made their first contribution in #5992
@wangyum made their first contribution in #5783
@Jonathan-Rosenberg made their first contribution in #6068
@ddrinka made their first contribution in #6115
@Omega359 made their first contribution in #6108
@hendrikmakait made their first contribution in #6135
@foarsitter made their first contribution in #6158
@alec-heif made their first contribution in #6173
@aajisaka made their first contribution in #5767
@krvikash made their first contribution in #6178
@LuigiCerone made their first contribution in #6159
@islamismailov made their first contribution in #6211

Full Changelog: apache-iceberg-0.14.0...apache-iceberg-1.1.0

apache/iceberg apache-iceberg-1.1.0 Apache Iceberg 1.1.0 on GitHub

What's Changed

New Contributors

apache/iceberg apache-iceberg-1.1.0
Apache Iceberg 1.1.0

on GitHub