apache/hudi release-1.2.0 on GitHub

What's Changed

chore: Moving to 1.2.0-SNAPSHOT on master branch by @yihua in #14055
[HUDI-9782] Add validation and cleanup apis for storage LP audit validation by @alexr17 in #13886
feat: add new Hudi demo app hudi-notebooks by @deepakpanda93 in #14023
chore: Reduce log volume by changing INFO to DEBUG for table loading messages by @shubhampatel28 in #14057
fix: Fix output type extracting for key selector in flink stream read by @cshuo in #14065
docs: RFC-100 - Unstructured Data Storage in Hudi (Initial strawman proposal) by @vinothchandar in #13924
chore: Exclude hudi-trino-plugin from RAT checks by @voonhous in #14067
fix: creating warehouse bucket automatically by @deepakpanda93 in #14079
fix: update utils.py and notebook by @deepakpanda93 in #14080
fix: Spark Schema Evolution Fix for nested columns by @the-other-tim-brown in #14075
fix: Fixing point lookup in MDT partitions by @nsivabalan in #14085
fix: disable embedded timeline service for flink upgrade by @cshuo in #14096
fix: flink mdt compaction should finish pending compactions first by @danny0405 in #14095
fix: exclude unused dependencies from META INF in presto bundle by @vamsikarnika in #14102
fix: Upgrade Parquet Avro and commons lang3 versions in presto bundle by @vamsikarnika in #14099
fix: show_index command output had incorrect order of column names by @linliu-code in #14113
fix: Remove catalog access from SparkSQLWriter by @linliu-code in #14083
fix: Ignore field nullability while checking whether record should be… by @cshuo in #14094
fix: disable NBCC with default single writer for downgrade less than version 8 by @vamshikrishnakyatham in #14109
fix: Skip payload class validation when merge mode is not custom by @linliu-code in #14116
fix: MERGE INTO statement produces misleading UNRESOLVED_COLUMN error when target table doesn't exist instead of TABLE_OR_VIEW_NOT_FOUND by @vamshikrishnakyatham in #14118
fix: Handle deletes and updates properly in secondary index by @yihua in #14090
fix(core): add table level validation for decimal evolution by @jonvex in #14089
fix: Handle missing valueType column after upgrade by @linliu-code in #14105
perf: Reduce memory usage of writing HFile log block by @yihua in #14078
fix: Fix cleaning of historical internal schema files by @cshuo in #14126
fix: Upgrade parquet-avro version to 1.15.1 in trino bundle and plugin by @vamsikarnika in #14140
fix: Upgrade Java xmlbuilder version to fix CVE-2014-125087 by @vamsikarnika in #14144
fix: fix partition stats delete properly for downgrade from V9 to V8 by @vamshikrishnakyatham in #14138
fix: updating error messages thrown to end users by @vamshikrishnakyatham in #14115
fix: Fix the instant time issue for row writer bulk insert hoodie streamer by @vamsikarnika in #14153
fix: Fixed the recovering method for the older versions where checksum is not present by @Rajeev-01 in #14148
refactor: Add required setter methods for Flink-CDC by @voonhous in #14150
fix: Fixing secondary index read perf for V1 layout by @nsivabalan in #14149
fix: Give proper error message for multi-writer scenarios without lock provider set by @linliu-code in #14119
feat: Partition predicate fix for Databricks runtime support by @ad1happy2go in #14059
fix: Partition stats should be controlled using column stats config by @lokeshj1703 in #14165
fix: Fix upgrade handling for MySqlDebeziumAvroPayload with deltastreamer by @lokeshj1703 in #14159
fix: fix downgrade to not delete unintended partitions in MDT by @vamsikarnika in #14162
fix: correct indentation in utils.py and add docker compose validation by @deepakpanda93 in #14168
perf: [Flink] Introduce a Flink Clustering Plan Strategy to Eliminate Redundant Small-File Merges by @XianghuiBai in #14087
fix: ensure that InlineFS is seeked to the correct offset upon init by @voonhous in #14178
fix(ingest): Fix Timestamp Conversions, Add legacy api support by @jonvex in #14076
fix: Avoid changing table configs when creating a table with an existing base path on Spark by @yihua in #14175
fix: build notebook hive image using compatible mode for arm64 by @xushiyan in #14190
fix: Fix file pruning based on column stats for flink reader by @cshuo in #14186
refactor: remove buildx option in docker build for notebooks by @deepakpanda93 in #14199
feat: introduce pk filter push-down to base file by @TheR1sing3un in #14183
fix: Fix predicates for base file reader in Flink FileGroup reader by @cshuo in #14197
fix: Avoid deleting metadata table with MOR during upgrade / downgrade by @linliu-code in #14191
docs: update javadoc of BucketIndexUtil by @voonhous in #14195
fix: Fixing record index related configs and enums by @nsivabalan in #14180
refactor: change access level of flushRemaining for flink-cdc require… by @voonhous in #14206
fix: Fix build because of record index config renaming by @yihua in #14215
test: fix flaky test case in TestBootstrapReadBase by @TheR1sing3un in #14210
test: fix flaky test in TestSecondaryIndex by @TheR1sing3un in #14211
test: Enhance downgrade test with compaction by @yihua in #14226
perf: reduce unnecessary row group metadata loading by @TheR1sing3un in #14208
fix: Move hudi split loaders to resumable tasks architecture to prevent deadlocks by @ratuldawar11 in #14225
test: Disable failing test testFiltersInFileFormat to unblock CI by @yihua in #14236
fix: Persist RLI index bootstrap records only if estimation is required and add unpersist by @lokeshj1703 in #14069
refactor: Reuse sparkSession and sparkContext variables in HoodieSparkSqlWriter by @huangxiaopingRD in #14231
fix: Exclude unnecessary netty dependencies from hudi jars by @vamsikarnika in #14142
feat: Only when the target table to be inserted/merged is a hudi table should the meta fields be eliminated by @TheR1sing3un in #14230
feat: Support TIMELINE_SERVER_BASED markers for flink writer by @cshuo in #14202
fix: Upgrade parquet-avro version for hudi-presto-bundle and CVE-2025-30065 by @sumi-mathew in #13358
fix: Disable positional merging for spark version < 3.5 by @linliu-code in #14241
docs: Claim RFC-81: Introduce Primary Key Sorted Table by @TheR1sing3un in #14245
fix(ingest): Repair affected logical timestamp milli tables by @jonvex in #14161
fix: Update metadata table record level index config keys naming for standardization by @linliu-code in #14244
feat: introduce pk filter to log file by @TheR1sing3un in #14205
chore: Update DOAP with 1.1.0 Release by @yihua in #14294
chore: Update release candidate validation in Github action by @yihua in #14295
docs: RFC-95 - New Hudi Flink Source implementation by @HuangZhenQiu in #13381
test: Clean up all the behaviors of directly setting spark conf in spark test to avoid flaky tests by @TheR1sing3un in #14198
[MINOR] Cleanup old spark3.5 version in pom.xml by @yongkyunlee in #14304
feat(schema): New Hudi Schema Class - Initial implementation. Also, Add new APIs based on current usage of Avro schema by @bvaradar in #14265
chore: Integration Test Flakiness: free more disk space before running by @the-other-tim-brown in #14316
feat: adding support for trino in notebooks by @deepakpanda93 in #14242
feat(schema): Add types for decimal, date, timestamp, time, and uuid by @the-other-tim-brown in #14312
refactor: Migrate HoodieFileReader and HoodieFileWriter io.storage to use HoodieSchema by @rahil-c in #14313
refactor: Clean up Spurious log block handling in LogRecordReader by @PavithranRick in #14287
refactor(spark): Remove glob paths and deprecate read paths support by @jonvex in #14060
feat(schema): HoodieSchema: add helper methods, fix issues with schema subtypes not returned by @the-other-tim-brown in #14346
feat(schema): Internal Schema System Integration with HoodieSchema by @the-other-tim-brown in #14314
fix: Fix duplicate field exception in hive query with where clause by @cshuo in #14337
fix: push down pk filters to log file when spark enable parquetFilterPushDown by @TheR1sing3un in #14332
fix: Fix the mismatch between operation metrics and the actual operation in the compaction plan by @TheR1sing3un in #14362
fix: Support handling complex data types in convertRowToJsonString fo… by @cshuo in #14351
fix: fix get empty completion time in corner case by @TheR1sing3un in #14379
feat: Bump spark version to 4.0.1 by @CTTY in #14380
refactor: Create parquet filters using the spark adapter by @TheR1sing3un in #14335
test: Fix test setup and assertions in TestTableColumnTypeMismatch by @nsivabalan in #13792
fix: Bump springboot version to fix CVE-2022-1471 by @CTTY in #14383
fix: Only use index when index metadata is present by @CTTY in #14385
feat: Add storage in HoodieCatalogTable by @CTTY in #14386
fix: Include parquet-format in Hive sync bundle by @gggyd123 in #13843
fix: Exclude guava from hive-metastore by @CTTY in #14388
fix: Exclude jetty from javalin to fix CVE-2023-40167 by @CTTY in #14384
chore: add hudi-bot to collaborators by @xushiyan in #14391
feat: Use Storage from catalog table in drop table command by @CTTY in #14390
feat: Support read virtual metadata columns for Flink reader by @cshuo in #14309
feat(schema): Add helper to get HoodieSchema in TableSchemaResolver by @rahil-c in #17456
feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines by @PavithranRick in #14261
refactor(spark): Rework tests that disable FileGroup Reader in Spark by @jonvex in #14061
feat: Use storage conf for alter rename command by @CTTY in #14389
feat: Change the config for record index max file group size to be a long by @prashantwason in #17461
feat(schema): phase 2 - Perform Column Statistics Schema Migration by @voonhous in #14311
feat(schema): Add support for time, fixed length byte arrays, and local timestamps to ParquetToSparkSchemaUtils by @the-other-tim-brown in #17450
fix: Fix flaky TestHoodieIndex#testCheckIfValidCommit test by @voonhous in #17484
feat(schema): Migrate SchemaProviders in Hudi-Utilities to use HoodieSchema by @the-other-tim-brown in #14364
refactor: Update conversion from StructType to Avro Schema to include docs and default values by @the-other-tim-brown in #17473
refactor: [HUDI-9335] Make RowDataKeyGens::instance the common point for keygen instantiation for Flink by @geserdugarov in #13570
feat(schema): Migrate hudi-spark writer related classes to use HoodieSchema by @rahil-c in #14374
chore: move disk space cleanup in integration-test CI module by @the-other-tim-brown in #17496
refactor: introduce lombok dependency by @vinothchandar in #17500
fix: Remove explicit casting to HoodieWriteMergeHandle in Fl… by @cshuo in #13590
feat: add Hudi Flink source split POJOs by @HuangZhenQiu in #17483
fix: Fixing streaming writes to metadata table for perf regression by @nsivabalan in #17477
fix(cli): Use long type when sorting based on file status modification time by @prashantwason in #17487
refactor: Use HoodieFileGroupReader paths for all Spark Datasource reads by @the-other-tim-brown in #17457
feat(schema): phase 12 - Perform Data Source Helpers Migration by @voonhous in #14382
chore: reduce log volume by changing per-file/per-block logs to DEBUG level by @shubhampatel28 in #14357
chore: Update deploy script for release by @yihua in #14296
fix: fix the issue where lsm writer could not write again after failure by @TheR1sing3un in #17472
refactor: Apply lombok annotations remove boilerplate code to hudi-aws by @voonhous in #17522
feat(schema): Update HiveSyncTool and other meta sync tools to use HoodieSchema by @the-other-tim-brown in #14344
feat(schema): Port more utility code for HoodieSchema by @the-other-tim-brown in #17526
fix: Fix Flink profiles and modules for release version change by @yihua in #17528
feat(metrics): Publish log block compaction metrics by @suryaprasanna in #17518
feat(schema): phase 5 - Perform Java Client Core Migration by @voonhous in #14340
refactor: Apply lombok annotations and remove boilerplate code to hudi-cli by @voonhous in #17523
feat(schema): Migrate hudi-flink to use HoodieSchema instead of avro Schema by @rahil-c in #14355
feat: add Hudi static split enumerator for Flink source by @HuangZhenQiu in #17503
refactor: Add lombok annotations to hudi-flink-client module by @voonhous in #17534
chore: add cshuo to collaborators by @xushiyan in #17544
refactor: remove unused method in IncrementalInputSplits by @HuangZhenQiu in #17545
refactor: Remove getAvroSchemaConverters API from Spark adapter by @yihua in #17554
feat: add CLI command to show inflight instants older than specified duration by @suryaprasanna in #17511
feat(schema): Add converter for Spark StructType to HoodieSchema by @rahil-c in #17475
chore: add new collaborators by @xushiyan in #17555
chore: keep collaborators <= 10 by @xushiyan in #17559
fix: Fix failing CI caused by multiple definition of IncompatibleSchemaExc… by @voonhous in #17561
chore: rename BaseInstantTime to LogFileInstantTime in log summary by @shubhampatel28 in #17567
feat(schema): Migrate HoodieFileGroupReader and related classes to use HoodieSchema by @the-other-tim-brown in #17536
feat(schema): Update serialization for HoodieSchema by @the-other-tim-brown in #17575
fix: Lazily load the dfs properties configuration to avoid static initialization failures by @alexr17 in #17552
test: Make testReadChangelogIncremental parametrized by @kamronis in #17564
refactor: small cleanups in hudi-cli classes by @vinothchandar in #17585
fix: Avoid using HoodieHadoopStorage directly by @CTTY in #17560
feat(schema): Migrate log reader and partitioners to take HoodieSchema by @the-other-tim-brown in #17548
refactor: code sweep on hudi-io, hudi-hadoop-mr to streamline class organization by @vinothchandar in #17586
refactor: Add Lombok annotations to hudi-spark-client module by @voonhous in #17572
chore: Add SamplingLogger utility for reducing log volume while maintaining observability by @shubhampatel28 in #14354
refactor: Apply lombok annotations and remove boilerplate code to hudi-client-c… by @voonhous in #17524
test: Add validation on Spark SQL test classes and fix package structure by @yihua in #14381
feat(schema): Migrate BigQuery schema converter to use HoodieSchema by @the-other-tim-brown in #17498
refactor: Add Lombok annotations to hudi-example modules by @voonhous in #17589
refactor: Remove old code and comments after deprecating Scala 2.11 support by @yihua in #17592
fix(schema): Fix creation of HoodieSchema from avro string not delega… by @voonhous in #17597
refactor: Add Lombok annotations to hudi-java-client module by @voonhous in #17588
feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 1) by @voonhous in #17535
fix: Support column stats prunning on metadata columns for flink reader by @cshuo in #17580
refactor: Add Lombok annotations to hudi-flink-x modules by @voonhous in #17590
refactor: Remove unnecessary utils in FileCreateUtils by @yihua in #17593
docs: Claim RFC-102: RLI support for Flink streaming by @danny0405 in #17609
refactor: code sweep on hudi-hadoop-common, hudi-common on class organization by @vinothchandar in #17611
refactor(spark): Keep one latestCommitCompletionTime method in DataSourceTestUtils by @CTTY in #17608
feat: Support data skipping based on record index for flink reader by @cshuo in #17490
refactor: Add Lombok annotations to hudi-gcp module by @voonhous in #17621
feat: add flink continuous split enumerator by @HuangZhenQiu in #17562
feat: Support flink 2.1 by @cshuo in #17574
feat: add record write failure log and metrics by @HuangZhenQiu in #13417
fix: incorrect CDC read from table with unfinished compaction by @kamronis in #17607
feat(schema): Migrate spark reader side related classes to use HoodieSchema directly by @rahil-c in #17573
chore: Updating doap file for 1.1.1 release by @nsivabalan in #17635
refactor: Add Lombok annotations to hudi-common module (part 1) by @voonhous in #17630
feat: Upgrade build target to Java 11 by default by @the-other-tim-brown in #17637
feat: Support MDT compaction configs of frequency seconds and trigger strategy by @kbuci in #17603
chore: Add scripts and docs for Docker image used in Azure CI by @yihua in #17602
docs: Fix javadoc comments by @VahidRamezaniDB in #17645
refactor: Add Lombok annotations to hudi-flink module by @voonhous in #17612
feat: support extract hadoop conf from Flink runtime by @HuangZhenQiu in #13259
feat(schema): Migrate StreamSync code path and its dependencies to use HoodieSchema by @the-other-tim-brown in #17600
refactor: Add Lombok annotations to hudi-hadoop-common module by @voonhous in #17662
chore(deps): bump org.apache.logging.log4j:log4j-core from 2.17.2 to 2.25.3 by @dependabot[bot] in #17653
chore(deps): bump org.apache.parquet:parquet-avro from 1.15.1 to 1.15.2 in hudi-trino-plugin by @dependabot[bot] in #17666
chore(deps): bump org.glassfish:jakarta.el from 3.0.3 to 3.0.4 in hudi-cli-bundle by @dependabot[bot] in #17651
refactor: Add Lombok annotations to hudi-integ-test module by @voonhous in #17667
perf: optimize removeCommitMetadata method in HoodieCDCLogger by @kamronis in #17669
feat(schema): Phase 24 - Restore O(1) reference equality comparison i… by @voonhous in #17672
feat: Add HoodieBaseLanceFileWriter and implementation for SparkFileWriter by @rahil-c in #17629
refactor: Remove org.jetbrains.annotations imports by @yihua in #17680
test: Fix flaky test ITTestHoodieFlinkCompactor#testHoodieFlinkCompac… by @cshuo in #17677
chore: Test Runtime Improvements: lower number of files, parallelize reads by @the-other-tim-brown in #17671
feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 2) by @voonhous in #17581
test(ci): Add JVM tuning for Java 11+ test execution to reduce CI runtime by @yihua in #17712
refactor: Add Lombok annotations to hudi-kafka-connect by @voonhous in #17715
refactor: Add Lombok annotations to hudi-io module by @voonhous in #17685
test: Fix flaky test testLatestCheckpointCarryOverWithMultipleWriters by @yihua in #17722
fix: Fix ConcurrentModificationException in RocksDBDAO when accessed by Timeline Service by @yihua in #17717
feat: Add HoodieSparkLanceReader for reading lance files to internal row by @rahil-c in #17632
refactor: Add Lombok annotations to hudi-platform-service by @voonhous in #17719
chore(ci): Use non-archive repo for maven binary download by @voonhous in #17723
feat(schema): Migrate clustering operations to use HoodieSchema by @the-other-tim-brown in #17691
refactor: Add Lombok annotations to hudi-spark,hudi-spark-common by @voonhous in #17718
chore(ci): Retry spark downloads by @the-other-tim-brown in #17732
chore(ci): remove arg line overrides in azure pipelines by @the-other-tim-brown in #17741
perf: optimize rollback validation by checking lazy rollback policy before clustering validation by @suryaprasanna in #17537
perf: use shallow projection where applicable by @kamronis in #17682
chore(ci): upgrade to newer plugins and new test dependencies that required java11+ by @the-other-tim-brown in #17657
docs: Improve the annotation format of examples by @huangxiaopingRD in #17749
refactor: Replace HoodieHadoopStorage instantiation with HoodieStorageFactory by @KiteSoar in #17661
feat: Implement SparkColumnarFileReader for Datasource integration with Lance by @rahil-c in #17660
chore(ci): Add cache for checkout by @the-other-tim-brown in #17738
feat(schema): Migrate hudi spark client to use HoodieSchema by @rahil-c in #17743
feat(schema): Phase 18 - HoodieAvroUtils removal (Part 1) by @voonhous in #17599
feat: Support COW bulk-insert, insert, upsert, delete works with spark datasource and lance by @rahil-c in #17731
chore(ci): remove repeated checkout by @the-other-tim-brown in #17755
chore(ci): Hudi-utilities test improvements by @the-other-tim-brown in #17758
feat(schema): Migrate spark schema conversion utils to their HoodieSchema equivalent by @the-other-tim-brown in #17765
feat(schema): Migrate json and proto converters to use HoodieSchema by @the-other-tim-brown in #17740
refactor: Remove Builder from DynamoDbBasedLockConfig by @voonhous in #17780
perf: Reduce unnecessary timeline loading on the Flink-TM side by @TheR1sing3un in #17762
style: Correct wrong Apache license by @huangxiaopingRD in #17790
fix(metadata): propagate timeline server config from main dataset to metadata by @prashantwason in #17486
test: Fix flaky test in TestHoodieClientMultiWriter by @yihua in #17793
fix: Fix the timeline compaction blocked caused by the archived file being too large by @TheR1sing3un in #17784
fix: Handle hudi table reads when databaseName is not set during initTable by @vinishjail97 in #17695
feat(schema): Phase 18 - HoodieAvroUtils removal (Part 2) by @voonhous in #17763
feat: Support splitting tasks based on file size when reading the cow table by @TheR1sing3un in #17730
fix: Add complex types testing for lance by @rahil-c in #17769
refactor: Add Lombok annotations to hudi-timeline-service by @voonhous in #17742
fix(spark): Add clearJobStatus() calls after setJobStatus() operations by @prashantwason in #17451
feat: Use official Kafka docker images by @rangareddy in #17794
chore: Exclude data file from rat-plugin check by @huangxiaopingRD in #17789
refactor: Add Lombok Builder annotation to TimelineService.Config by @voonhous in #17807
feat: Introduce inflight record index cache for bucket assigning by @cshuo in #17802
feat(schema): Migrate HoodieRecord methods to use HoodieSchema instead of Avro.Schema by @KiteSoar in #17772
refactor: Add Lombok annotations to hudi-common module (part 2) by @voonhous in #17655
perf: improve performance for S3 meta event source by @the-other-tim-brown in #17822
fix: Wiring in clean max commits to metadata table by @nsivabalan in #17819
refactor: Add Lombok annotations to hudi-sync modules by @voonhous in #17728
docs: Update minimum Java version to JDK 11 in documentation by @yihua in #17824
feat: the basic new hudi source reader by @HuangZhenQiu in #17773
docs: Claim RFC-103 Hudi LSM Tree Layout by @zhangyue19921010 in #17826
feat: double buffer based async write for append only write by @HuangZhenQiu in #13892
feat(schema): Remove direct usage of Avro schema in Flink-client path by @the-other-tim-brown in #17739
test: correct arguments pass to TestData::assertRowsEqualsUnordered by @geserdugarov in #17840
feat: Support bucket assigning based on record level index by @cshuo in #17803
feat: align clustering and compaction retry flow for Flink and Spark by @xushiyan in #17839
fix: make insert overwrite with bulk insert more performant on unpartitioned table by @alexr17 in #17821
feat(schema): Spark Row to/from Avro conversion updates by @the-other-tim-brown in #17817
fix: Ensure that custom logical types in records are preserved during… by @voonhous in #17845
feat(schema): Phase 18 - HoodieAvroUtils removal (Part 3) by @voonhous in #17659
chore: update collaborator list by @xushiyan in #17881
[MINOR] Remove duplicate shade relocation by @majian1998 in #17841
feat(schema): Remove spark-avro schema converter by @the-other-tim-brown in #17884
feat(schema): Add fetching default values for FIXED, DECIMAL, TIME, … by @voonhous in #17892
perf: Support mini-batch access to the MDT index for bucket assign fu… by @cshuo in #17867
feat: support disruptor-queue buffer for Flink writers by @xushiyan in #17864
feat(schema): Fix null checks after migrating to HoodieSchema by @voonhous in #17909
fix: Handle all exception types when fetching table path on reads by @suryaprasanna in #17860
feat(lance): Upgrade Lance version for new writer functionality by @the-other-tim-brown in #17900
feat: Ensure MOR table works, with lance base files and avro logs file by @rahil-c in #17768
fix: address minor compilation issue in getAvroBytes by @rahil-c in #17926
chore: Include table name in FileSystemBackedTableMetadata stage names by @suryaprasanna in #17861
fix: handle ArrayIndexOutOfBoundsException for non-partitioned datasets during upgrade by @suryaprasanna in #17933
feat: support mini batch split reader by @HuangZhenQiu in #17872
fix: Prevent HiveSyncTool from running twice in meta sync by @suryaprasanna in #17937
feat: Support bucket assgin operator fetching inflight instants from coordinator by @cshuo in #17885
perf: Allow all processed commits to be cached in the CompletionTimeQueryViewV2 by @the-other-tim-brown in #17914
feat(schema): Phase 18 - HoodieAvroUtils removal (Part 4) by @voonhous in #17801
fix: Allow both checkpoint v1 and v2 keys to be resolved by @voonhous in #17919
fix: Fix TestBucketizedBloomCheckPartitioner assertArrayEquals compar… by @voonhous in #17888
feat(lance): Remove extra buffering in Lance writer by @the-other-tim-brown in #17916
fix: Check for existing SparkContext before creating new one in CLI by @suryaprasanna in #17862
refactor: rename MergeOnReadSplitReaderFunction by @HuangZhenQiu in #17967
fix: Allow String ordering fields can work with JSON src with COW by @voonhous in #17953
fix: fix viewfs schema file creation as not atomic by @TheR1sing3un in #17965
fix: Rename HoodieDataSourceHelpers#listCompletionTimeSince references by @voonhous in #17983
fix: Propagate cfg.sourceOrderingFields in HoodieStreamer by @voonhous in #17984
fix: Prevent unnecessary rewrites for skeleton records by @voonhous in #17969
fix: Handle nested map and array columns in MDT by @vinishjail97 in #17694
feat: add basic hoodie source by @HuangZhenQiu in #17989
test: add flink mini cluster for append function integ test by @xushiyan in #17972
refactor: simplify the HoodieSplitReaderFunction by @HuangZhenQiu in #18004
fix: Fix incremental query with full scan mode on MOR tables on Databricks Runtime by @yihua in #18003
fix: Handle external file groups in ExternalFilePathUtil by @vinishjail97 in #17788
refactor: drop unused InMemoryFileSystem class and test by @vinothchandar in #17997
feat(schema): Add VARIANT support to HoodieSchema by @voonhous in #17751
feat: introduce timeline manifest retained version conf by @TheR1sing3un in #17996
fix: Update default Parquet version to 1.13.1 by @suryaprasanna in #17941
fix: unpersist cached objects in SqlQueryEqualityPreCommitValidator by @suryaprasanna in #17931
test(record-index): add coverage for tag location call for various indexes by @suryaprasanna in #17494
feat(storage): add config to allow duplicates while writing to HFiles by @suryaprasanna in #17495
fix: Not ignore IOException when cleaning the file by @TheR1sing3un in #17987
fix: Remove default record key and ordering fields values on the Flink side, consistent with Spark by @geserdugarov in #17994
fix: too many properties passed to hive table through hoodie hive catalog by @kamronis in #18011
feat: Add a new index write function for flink writer by @cshuo in #17838
feat: add flink HoodieSourceSplitComparator by @HuangZhenQiu in #18009
feat: Add configurable cleaner policy for metadata table by @suryaprasanna in #17935
fix: Provide commit timeline during HoodieROTablePathFilter construction by @suryaprasanna in #17859
fix: Adding tests for rolling back on commits older than replacecommit and compaction commits by @suryaprasanna in #17932
perf: Reduce memory usage in getAllPartitions by storing only path and directory flag by @suryaprasanna in #17947
fix: include Hoodie metadata fields when reading Parquet files in precommit validators by @suryaprasanna in #17505
fix: Allow configurable storage level while computing expression index update by @lokeshj1703 in #17737
feat(schema): Update schema repair tools to work on HoodieSchema by @the-other-tim-brown in #17952
fix(common): Handle null actionState in LegacyArchivedMetaEntryReader by @prashantwason in #18024
feat: publish clean and archival duration metrics in finally block by @suryaprasanna in #17945
fix: enable Hive support when creating JavaSparkContext for Spark SQL queries by @suryaprasanna in #17510
feat: enable new Hoodie source in HoodieTableSource by @HuangZhenQiu in #18022
feat: Integrate the mdt compaction with existing flink compaction pipeline by @cshuo in #17991
perf: Bloom filter improvements for memory usage by @the-other-tim-brown in #18015
feat: Support slash separated date partitioning for Hudi tables by @suryaprasanna in #17787
fix: Use TableSchemaResolver in setWriteSchemaForDeletes for better schema resolution by @prashantwason in #18030
feat(metadata): Handle metadata table service failures gracefully and emit metrics by @suryaprasanna in #17930
fix: allows eager failure from abnormals for streaming write by @fhan688 in #12150
perf: Bloom filter improvements for memory usage (address feedback) by @the-other-tim-brown in #18063
fix(utilities): Use passed-in configs when propsFilePath is null or empty in HoodieStreamer by @prashantwason in #17467
fix: Add config version information to DataSourceOptions by @huangxiaopingRD in #17733
fix: Ensure Lance works when populateMetaFields is false with user defined keygen by @rahil-c in #18042
refactor: Add Lombok annotations to hudi-common module (part 4) by @voonhous in #17830
refactor: Add Lombok annotations to hudi-utilities (Part 2) by @voonhous in #17876
fix: reload table config after record index bootstrap to avoid bloom index fallback by @suryaprasanna in #17508
refactor: migrate to ScanV2Internal API and remove ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN config by @suryaprasanna in #17520
fix(flink): Handle Non-Null Complex Types with Nullable Elements in ParquetSchemaConverter by @prashantwason in #18087
perf: Support lazy clean of the RLI cache during bucket assigning by @cshuo in #18018
fix: correct deleted keys computation in computeRevivedAndDeletedKeys by @vamsikarnika in #18094
fix: disable retries in s3/gcs storage lock clients for storage based LP by @alexr17 in #17869
feat(schema): Remove direct reliance on Avro for schema compatibility checks by @the-other-tim-brown in #18006
fix: exit transaction with error in storage LP when unlock failure due to lock acquired by others by @alexr17 in #17871
perf: Avoid re-fetching file status from FS for HFile readers by @the-other-tim-brown in #17709
feat(schema): Remove usage of migrated AvroSchemaUtils and HoodieAvroUtils methods (part 1) by @the-other-tim-brown in #18007
feat: support flink split distribution strategy by @HuangZhenQiu in #18082
feat: Lance schema evolution (add column, type promotion) by @rahil-c in #17904
feat(schema): Minor cleanup of Avro schema usage by @the-other-tim-brown in #18043
feat: support partition pruner in Flink hudi source v2 by @HuangZhenQiu in #18074
refactor: apply lombok for flink source v2 related classes by @HuangZhenQiu in #18122
refactor: Add Lombok annotations to hudi-common module (part 6) by @voonhous in #17880
[MINOR] Preload file listing for partitions in BloomIndex to avoid repeated listings by @prashantwason in #17462
fix: (table-services) When using multiwriter do not delete pending roll… by @kbuci in #18093
feat(spark): Add guardrail to prevent writes when Spark speculative execution is enabled by @prashantwason in #18045
fix: interrupt storage LP when heartbeat fails by @alexr17 in #17870
fix: correct unsigned int conversion in TestProtoConversionUtil by @suryaprasanna in #18120
feat: add flink stream read metrics for hudi source v2 by @HuangZhenQiu in #18130
[MINOR] Fix HoodieLockMetrics.createTimerForMetrics to not share metric timer by @lokeshj1703 in #18097
feat(schema): Consolidate null type handling by @the-other-tim-brown in #18163
[HUDI-9730] RFC-99 Hudi Type System by @bvaradar in #13743
fix: flink source v2 serializability by @HuangZhenQiu in #18165
feat: Add metadata record_index lookup command to Hudi CLI by @suryaprasanna in #17940
test: add unit test for multiple partition filters on same column by @suryaprasanna in #17934
feat: Adding Presto to Hudi Notebooks by @rangareddy in #18078
[MINOR] Publish HUDI version metrics as integers by @prashantwason in #17466
refactor: Add Lombok annotations to hudi-common module (part 5) by @voonhous in #17878
test(concurrency): add tests for write conflicts with different conflict resolution strategies by @suryaprasanna in #17501
fix: Include metadata file cache size option in the configuration for… by @cshuo in #18175
fix(spark): Fix TestSparkSchemaUtils failing with Spark 3.3 due to timestamp_ntz by @prashantwason in #17917
fix(flink): include exception stacktrace in error logs by @prashantwason in #18091
feat: Publish commits to process metrics for deltastreamer by @suryaprasanna in #17929
fix: Use local engine context for clean planning on metadata and non-partitioned tables by @suryaprasanna in #17942
perf(common): Make ThreadLocal variables in HoodieAvroDataBlock static by @prashantwason in #18023
fix(metadata-table): exclude failed deletes when updating MDT with clean metadata by @prashantwason in #18035
chore: Fix flakey test by ensuring unsigned values in Proto conversion are positive by @the-other-tim-brown in #18186
feat(blob): update approach to remove reliance on column groups, break down plan by @the-other-tim-brown in #18013
fix: Empty write should not cause spark analysis errors with pre-commit validators by @kbuci in #18128
fix: throw correct exception when reading hoodie.properties file without access by @suryaprasanna in #18176
refactor: Remove redundancy in index validation logic in HoodieIndexU… by @voonhous in #17911
fix: SimpleAvro-, NonpartitionedAvro- and ComplexAvroKeyGenerator are also valid for writing by Spark when meta-fields are disabled by @wombatu-kun in #18187
feat(flink): lookup join with retry and async capabilities by @wombatu-kun in #18193
fix: revert (feat: support mini batch split reader) by @HuangZhenQiu in #18200
fix(flink): Use blocking instant generation when CDC is enabled by @cshuo in #18206
refactor: Remove not used classes from org.apache.hudi.spark.internal by @geserdugarov in #18211
chore: Add .claude and .codex directories to .gitignore by @vinothchandar in #18213
fix(trino): Fix Docker initialization issue in the Trino plugin by @vamsikarnika in #18220
docs(spark): Update description of modules related to integration with Spark by @geserdugarov in #18219
fix: Handle case when 0 byte completed commit files present in the timeline by @suryaprasanna in #18210
feat(blob): Blob schema definition by @the-other-tim-brown in #18108
chore(ci): Add Codecov coverage report in GitHub actions by @yihua in #18230
feat: support predicate push down in hoodie flink source v2 by @HuangZhenQiu in #18212
feat(flink): Off-heap lookup join cache backed by RocksDB by @wombatu-kun in #18231
fix: Remove trailing colon from incomplete error message in HoodieTableMetadataUtil by @shangxinli in #18233
fix: Fix typos across codebase by @shangxinli in #18232
fix: Fix SHOW PARTITIONS commands functionality for slash-separated date partitioning by @suryaprasanna in #18195
fix: Fix string handling on bloom index Metadata Payload by @the-other-tim-brown in #18240
chore(ci): cleanup for print statements, showing tables/schemas by @the-other-tim-brown in #17771
fix: Use correct lastCompletedTransactionMetadata while acquiring lock for clustering by @suryaprasanna in #18198
feat(spark): add HoodieSparkSQLUtils APIs and tests by @suryaprasanna in #18202
feat(spark-datasource): support spark.hoodie.* read config overrides by @suryaprasanna in #18205
test: Add Scala test for record index rebootstrap on non-Hoodie partitions by @suryaprasanna in #18208
fix: Fail metadata bootstrap early in presence of 0 byte file by @suryaprasanna in #18209
feat(metadata-table): Add count validation for record index bootstrap by @prashantwason in #18029
refactor: move source assign package under split by @HuangZhenQiu in #18253
perf: Adding support for LatestBaseFilesPathFilter to Spark File Index by @suryaprasanna in #18136
fix: add all fields in HoodieSourceSplitSerializer by @HuangZhenQiu in #18243
fix: [HUDI-CLUSTERING] Optimize binary copy performance with lazy loading, bulk reads, and double buffering by @gudladona in #18241
fix(flink): Use timestamp based partitioning in AutoRowDataKeyGen by @prashantwason in #18090
feat(flink): collect event time in HoodieRowDataCreateHandle for min/max event time metrics by @jianchun in #18250
feat(table-services): Emit archival metrics for monitoring and debugging by @nada-attia in #18133
feat(table-services): Add config to filter partitions during full clean by @prashantwason in #17550
feat(metrics): emit metric for rollback failures by @nada-attia in #18148
feat: Notebooks to support multiple hudi versions by @rangareddy in #18255
perf: eliminate unnecessary timeline loading for Flink append only write path by @danny0405 in #18264
feat: Use PartitionValueExtractor interface in Spark reader path by @suryaprasanna in #17850
feat(vector): add VECTOR type to HoodieSchema by @rahil-c in #18146
fix: infer record merge mode for pre-v9 tables in generateRequiredSchema by @vamsikarnika in #18106
test(common): add JVM memory reporting test for environment diagnostics by @suryaprasanna in #18207
fix(table-services): When applying rollback metadata to metadata table (v6) do not rollback a metadata table deltacommit if it has been already rolled back by post-commit rollback by @kbuci in #18160
refactor: Hudi Flink source v2 with better context management by @HuangZhenQiu in #18269
feat(table-services): Allow users to not parallelize each partition with engine context during clustering planning by @kbuci in #18191
feat(client): Add pre-write validator framework by @nada-attia in #18239
feat(vector): Add further research for supporting VECTOR type to RFC-99 by @rahil-c in #18184
feat(table-services): Support clustering file groups with earlier instants times first by @kbuci in #18174
feat(spark): ZooKeeper node should hold spark app id (for helping debug when lock is held for long time) by @kbuci in #18123
fix(flink): Don't perform table service during mdt initialization if … by @cshuo in #18283
fix: Remove noisy logging when table partition is empty by @yihua in #18290
fix: Improve config docs of enabling column stats in metadata table by @yihua in #18289
feat(vector): add converters from spark to hoodieSchema for vectors by @rahil-c in #18190
fix(flink): enable integration test for Hudi Flink Source V2 by @HuangZhenQiu in #18287
fix: Databricks Spark 3.4 Runtime compatibility for reading Hudi tables by @yihua in #18292
feat(flink): Add Kafka offset tracking to Flink Hudi commits by @shangxinli in #18127
perf(table-services): Incremental clean planning (for COW) should ignore partitions from instants with only new file groups by @kbuci in #18016
feat(flink): Add helper functions to parse Kafka offset differences b… by @shangxinli in #18125
fix(spark): SparkSQL write queries should correctly infer HUDI write configs from spark.hoodie.* configs in spark conf by @kbuci in #18297
fix(table-services): When single clustering group config is disabled, clustering should not create clustering groups with same number of input/output files by @kbuci in #18172
feat: add support for touch partitions in HiveSyncTool by @nada-attia in #18064
feat(flink): Support create table DDL without primary key by @prashantwason in #18086
fix: sort partitions after filtering for clustering planning by @prashantwason in #18092
refactor: rewrite executors tests to avoid code duplication by @yaojiejia in #18005
fix(common): Handle zero byte properties file and ensure atomic writes during modification by @prashantwason in #18058
[HUDI-7503] Compaction execution should fail if another active writer is already executing the same plan by @kbuci in #18012
feat(common): Add Policy for cleanup/rollback before each write by @kbuci in #18197
fix(metadata): Allow metadata table bootstrap when pending commits are being rolled back by @prashantwason in #18033
fix(common): Filter stray files when loading partitions in AbstractTableFileSystemView by @prashantwason in #18047
fix(clustering): When inferring wether an instant is clustering, do not fail if replacecommit was rolled back already (by a concurrent writer) by @kbuci in #18288
docs: RFC-102 - Spark Vector Search in Apache Hudi by @rahil-c in #14218
feat(conflict-resolution): Allow PreferWriterConflictResolutionStrategy to abort clustering if there is an ongoing write that is in requested state. by @kbuci in #18280
feat(hudi-sync): Publish HUDI version to Hive metastore (allowing users to infer which HUDI client jar to use for a given dataset) by @kbuci in #18307
chore(ci): Add test jobs and Codecov integration in GitHub Actions by @yihua in #18225
chore(ci): Simplify test combinations on Spark in Github actions by @yihua in #18336
chore(ci): Add codecov coverage from tests running on Spark 4.0 by @yihua in #18335
feat(metasync): Support HMS 4.x in JDBC sync mode via automatic Thrift fallback by @bvaradar in #18227
feat(flink): Support write buffer based on flink managed memory by @cshuo in #18319
feat(lance): Support bloom filter in Lance writer and reader by @wombatu-kun in #18304
fix: Use explicit Throwable type in AvroConversionUtils catch clause by @yihua in #18342
docs: Update the build instructions by mentioning profiles in README by @rangareddy in #18310
feat(utilities): add DELETE operation support for HudiStreamer by @prashantwason in #18088
feat(metadata-table): add config to disable automatic deletion of MDT partitions by @prashantwason in #18181
fix(concurrency): detect rollback conflicts with ongoing commit operations by @prashantwason in #18089
feat(common): add core pre-commit validation framework - Phase 1 by @shangxinli in #18068
fix: Fix flaky test TestProtoConversionUtil#allFieldsSet_wellKnownTyp… by @cshuo in #18352
fix(flink): enable batch read it for flink source v2 by @HuangZhenQiu in #18325
fix: modify the incorrect Hive configuration in hoodie hive catalog by @yangxiao0320 in #18365
feat: support read commits limit in Hudi Flink Source V2 by @HuangZhenQiu in #18369
feat(hive-sync): add Spark-catalog based metastore client implementation to avoid Hive-on-Spark classloader issues by @suryaprasanna in #18203
fix(common): fix typos commited -> committed, commiting -> committing by @shangxinli in #18363
feat: support read splits limit in Hudi Flink Source V2 by @HuangZhenQiu in #18370
feat(flink): Support bootstrap from RLI to local RocksDB for flink bu… by @cshuo in #18254
perf: Skip unnecessary clean planning for MOR metadata table file-version cleaning by @suryaprasanna in #17943
feat: add graceful handling for post-commit failures with metrics by @suryaprasanna in #18196
feat(flink): Support more efficient customized serializer for HoodieRecordGlobalLocation by @cshuo in #18326
feat(metadata): Defer RLI initialization for fresh tables to optimize file group allocation by @nsivabalan in #18353
feat(flink): add pre-commit validation framework for Flink - Phase 2 by @shangxinli in #18362
feat: add Flink source reader function for cdc splits by @HuangZhenQiu in #18361
feat(vector): Support writing VECTOR to parquet and avro formats using Spark by @rahil-c in #18328
fix: Optimizing internal schema lookup in TableSchemaResolver by @nsivabalan in #18387
[HUDI-7030] Commit-based Clustering Plan Strategy by @prashantwason in #18251
fix: Fixed the issue of incorrect opName values in Flink bulk insert writing by @empcl in #18313
fix(flink): Improve splits distribution strategy for mor table w/ bucket index by @Joy-2000 in #18103
feat: Add Unshredded Variant read & write support by @voonhous in #17833
chore: include table name in recursive listing Spark job descriptions by @suryaprasanna in #18416
refactor: modularize long test methods in TestHoodieClientOnCopyOnWriteStorage by @yaojiejia in #18377
test(lance): Add test of bloomFilter support to TestLanceDataSource by @wombatu-kun in #18388
fix: Use target schema for non-FileBased/SchemaRegistry providers in SourceFormatAdapter by @suryaprasanna in #17946
perf: Improve Serialization Performance of BufferedRecord by @cshuo in #18418
feat(utilities): add option to make all schema columns nullable for backwards compatibility by @prashantwason in #17777
feat(blob): Create blobs in Spark SQL by @the-other-tim-brown in #18347
refactor: remove HoodieWriteConfig.getOrcCompressionCodec() function by @skywalker0618 in #18422
fix: [HUDI-3055] Fix hardcoded GZIP compression codec in HFileUtils by @ZZZxDong in #18263
feat(lance): Implement canWrite() in HoodieSparkLanceWriter with configurable max file size for Lance by @wombatu-kun in #18341
refactor: Clean up imports by @voonhous in #18428
feat: support limit push down in Hudi Flink Source V2 by @HuangZhenQiu in #18406
fix(spark): validate and normalize incremental start/end instants by @yaojiejia in #18426
feat(vector): Add guard for user creating nested VECTOR by @rahil-c in #18431
fix(spark): Ignore duplicate fields when merging schema in IncrementalRelation by @prashantwason in #17776
feat(spark): implement column pruning for incremental queries by @suryaprasanna in #17514
perf(table-services): Only attempt scheduling log compaction if number of deltacommits is at least LogCompactionBlocksThreshold by @kbuci in #18306
fix(common): close parquet reader iterator on EOF by @suryaprasanna in #18407
feat(metrics): Add table-specific metrics registry support for multi-tenant scenarios by @prashantwason in #18179
feat(table-services): Support hoodie.clustering.enable.expirations to allow cleanup of failed clustering plans (intended for PreferWriterConflictResolutionStrategy) by @kbuci in #18302
feat: improve bucket assignment for MOR with bucket index by @HuangZhenQiu in #18444
fix(flink): Trigger a failover after pending instants recommitted to b… by @cshuo in #18434
refactor: consolidate common utility classes for Flink CDC read by @HuangZhenQiu in #18436
feat(common): Add API to fetch log files created on or before given instant time by @nada-attia in #18142
fix: do not shutdown distruptor thread in snapshotState in flink connnector by @skywalker0618 in #18446
perf(metadata): avoid recursive calls for partition listing using catalog by @suryaprasanna in #18265
feat(vector): Add Spark VECTOR Search TVF with intial KNN algorithm by @rahil-c in #18432
feat(schema): Add support to write shredded variants for HoodieRecordType.SPARK by @voonhous in #18036
fix(flink): Reject deferred RLI initialization for flink writer by @cshuo in #18399
feat: use ScanOperation for Spark 3.3 and 3.4 partition pruning by @suryaprasanna in #17936
fix: remove unused code by @ychris78 in https://github.com/apache/hudi/pull/18473
feat(lance): throwing exception/guard for users trying to read Lance from non-spark engines by @wombatu-kun in https://github.com/apache/hudi/pull/18481
fix(hfile): use Hadoop WritableUtils VarInt encoding in HFile block index writer by @officialasishkumar in https://github.com/apache/hudi/pull/18465
fix: avoid duplicate archived timeline instants from leftover merge files by @suryaprasanna in https://github.com/apache/hudi/pull/18408
Fix BufferedReader resource leak in InputStreamConsumer by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18469
refactor(flink): Refactor Flink compaction/clean pipeline with compos… by @cshuo in https://github.com/apache/hudi/pull/18477
perf(core): optimize rollback listing calls on metadata table by @nbalajee in https://github.com/apache/hudi/pull/18279
fix(flink): Handle bootstrap write metadata correctly after job resca… by @cshuo in https://github.com/apache/hudi/pull/18485
chore(ci): Clean up env variable leak in TestSqlConf by @geserdugarov in https://github.com/apache/hudi/pull/18486
feat(sync): Map VECTOR type to binary for metastore sync support by @voonhous in https://github.com/apache/hudi/pull/18480
chore(deps): bump org.apache.logging.log4j:log4j-core from 2.25.3 to 2.25.4 by @dependabot[bot] in https://github.com/apache/hudi/pull/18490
feat(common): add log reader scan metrics and logging for log block processing by @suryaprasanna in https://github.com/apache/hudi/pull/18412
feat(flink): Add metrics for RocksDB index backend in bucket assigner by @cshuo in https://github.com/apache/hudi/pull/18484
feat(sync): Map BLOB type to struct in Hive and BigQuery sync by @voonhous in https://github.com/apache/hudi/pull/18482
chore: Allow versions to be specified in build_docker_images.sh by @voonhous in https://github.com/apache/hudi/pull/17948
feat(sync): Map VARIANT type to struct in Hive, Spark, and BigQuery sync by @voonhous in https://github.com/apache/hudi/pull/18483
fix(payload): support sentinel no-op updates in DefaultHoodieRecordPayload by @suryaprasanna in https://github.com/apache/hudi/pull/18413
feat: Support to cap max commits to clean in one round of clean commit by @nsivabalan in https://github.com/apache/hudi/pull/18322
fix(common): FutureUtils:allOf should always throw root cause exception by @kbuci in https://github.com/apache/hudi/pull/18456
feat: Adding rolling extra metadata support by @nsivabalan in https://github.com/apache/hudi/pull/18421
fix: Scanner resource leak in SqlFileBasedSource.fetchNextBatch by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18467
fix: fix Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18457
feat: Add ReverseOrderHoodieRecordPayload and configurable ordering behavior by @suryaprasanna in https://github.com/apache/hudi/pull/17928
feat(spark): refresh parquet tools clustering strategy for current master by @suryaprasanna in https://github.com/apache/hudi/pull/18409
fix: Fix BufferedReader resource leak in FileIOUtils.readAsUTFStringLines by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18470
[HUDI-14922] Fix Windows path separator in DFSPropertiesConfiguration.getConfPathFromEnv by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18454
chore: cleanup docker-compose files by @voonhous in https://github.com/apache/hudi/pull/17950
fix(docker): fix docker image build with Java 11 and Hive 2.3.10 by @yihua in https://github.com/apache/hudi/pull/18519
chore: Add Java 17 Hadoop base image and Spark 4.0.1 docker compose s… by @voonhous in https://github.com/apache/hudi/pull/18520
feat(metadata): Allow users to safely execute compaction plans on metadata table concurrently through a table service platform (rather than only inline during write) by @kbuci in https://github.com/apache/hudi/pull/18295
feat(docker): add --multi-arch flag for cross-platform image builds by @yihua in https://github.com/apache/hudi/pull/18522
chore: add timing logs for file index partition and file listing by @suryaprasanna in https://github.com/apache/hudi/pull/18417
chore(docker): bump integ-test docker-compose to Hive 2.3.10 by @voonhous in https://github.com/apache/hudi/pull/18525
feat: Add Azure-based storage lock by @chrevanthreddy in https://github.com/apache/hudi/pull/17951
refactor: introduce static helper method to remove clones by @aaaZayne in https://github.com/apache/hudi/pull/18533
fix: whitelist Flink _2.12 artifacts in scala-2.13 enforcer rule by @voonhous in https://github.com/apache/hudi/pull/18508
chore(docker): fix Hadoop entrypoint.sh property bugs in all base modules by @voonhous in https://github.com/apache/hudi/pull/18527
chore(common): Consolidate MapUtils into CollectionUtils by @voonhous in https://github.com/apache/hudi/pull/18529
perf(common): avoid stream allocation in CollectionUtils.createImmuta… by @voonhous in https://github.com/apache/hudi/pull/18530
feat(flink): Implement continuous sorting feature for append write by @prashantwason in https://github.com/apache/hudi/pull/18083
feat(utilities): add external HudiHiveSyncJob for on-demand Hive sync by @suryaprasanna in https://github.com/apache/hudi/pull/18204
feat(blob): Read Blobs in Spark SQL by @the-other-tim-brown in https://github.com/apache/hudi/pull/18098
fix: HoodieStorage resource leak in FileSystemBasedLockProvider.close() by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18461
perf(common): Avoid double-iterating log files in file-system-view fi… by @voonhous in https://github.com/apache/hudi/pull/18531
feat(vector): Add Spark SQL DDL CREATE TABLE support for VECTOR type by @voonhous in https://github.com/apache/hudi/pull/18488
[MINOR] Bump lance to 4.0.0 and lance-spark to 0.4.0 by @rahil-c in https://github.com/apache/hudi/pull/18498
feat: Adding support to block archival on last known ECTR for v6 tables by @nsivabalan in https://github.com/apache/hudi/pull/18380
fix: prevent parseTypeDescriptor crash for VARIANT by @voonhous in https://github.com/apache/hudi/pull/18510
fix: VARIANT Hive sync error when performing CREATE table DDL by @voonhous in https://github.com/apache/hudi/pull/18511
feat: Add support for exclusive rollbacks with multi writer by @lokeshj1703 in https://github.com/apache/hudi/pull/18448
feat(blob): followup fixes for blob reader by @rahil-c in https://github.com/apache/hudi/pull/18538
chore(docker): add Hadoop 3.4.0 / Hive 2.3.10 / Spark 4.0.2 compose s… by @voonhous in https://github.com/apache/hudi/pull/18550
fix: Parquet small-precision decimals decode ClassCastException by @skywalker0618 in https://github.com/apache/hudi/pull/18552
fix: JDBC connection leak in HiveIncrementalPuller.saveDelta() by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18460
chore(spark): bump spark4.version to 4.0.2 by @voonhous in https://github.com/apache/hudi/pull/18549
fix(lance): Add Hive InputFormat stubs and fix Spark SQL for Lance file format by @rahil-c in https://github.com/apache/hudi/pull/18162
feat(flink): Introduces dictionary encoding of payload partition path for RocksDBIndexBackend by @cshuo in https://github.com/apache/hudi/pull/18560
feat(lance): round-trip Hudi VECTOR columns as native Lance fixed-size lists by @rahil-c in https://github.com/apache/hudi/pull/18497
fix(vector): Register VECTOR HMS column as BINARY on Spark CREATE by @voonhous in https://github.com/apache/hudi/pull/18545
fix(variant): allow VariantType writes through Hudi's V1 DataSource on Spark 4 by @voonhous in https://github.com/apache/hudi/pull/18564
fix: ProtoConversionUtil$AvroSupport static init under Avro 1.12 by @tiennguyen-onehouse in https://github.com/apache/hudi/pull/18571
fix: FileGroupReader drops mandatory partition columns from dataSchema by @tiennguyen-onehouse in https://github.com/apache/hudi/pull/18570
feat: Adding support to inject custom configs to parquet writer by @nsivabalan in https://github.com/apache/hudi/pull/18379
feat(clean): Adding empty clean support to hudi by @nsivabalan in https://github.com/apache/hudi/pull/18337
fix(vector): Pass plain FIXED through to VECTOR projection on Hive read by @voonhous in https://github.com/apache/hudi/pull/18582
fix(clean): address review comments on empty clean support (#18337) by @yihua in https://github.com/apache/hudi/pull/18587
fix(ci): bump surefire test heap from 3g to 4g by @yihua in https://github.com/apache/hudi/pull/18589
feat(lance): support simplified path for lance blob inline reading by @rahil-c in https://github.com/apache/hudi/pull/18575
feat(blob): add support for lance blob inline descriptor reading by @rahil-c in https://github.com/apache/hudi/pull/18586
feat(ci): enable auto-merge and required status checks on master by @yihua in https://github.com/apache/hudi/pull/18594
feat(flink): Vendor Flink 2.1 Dremel nested-reader support classes by @skywalker0618 in https://github.com/apache/hudi/pull/18567
fix(vector): Preserve VECTOR/BLOB metadata on SQL INSERT path by @voonhous in https://github.com/apache/hudi/pull/18540
feat(spark): add Spark 4.1 support by @yihua in https://github.com/apache/hudi/pull/17674
fix: Curator class conflict in ZookeeperBasedLockProvider by @ychris78 in https://github.com/apache/hudi/pull/18593
fix(schema): Allow nested projection on BLOB and VARIANT columns in p… by @voonhous in https://github.com/apache/hudi/pull/18566
feat: Create JsonKinesisSource by @linliu-code in https://github.com/apache/hudi/pull/18224
feat(lance): fix lance writer/reader regarding arrow memory limit issue by @rahil-c in https://github.com/apache/hudi/pull/18613
feat(common): When inferring checkpoint/schema from timeline, check non-ingestion write commits (in case they have metadata rolled-over) by @kbuci in https://github.com/apache/hudi/pull/18576
feat: Add variant support description to RFC-99 by @voonhous in https://github.com/apache/hudi/pull/18274
feat(flink): Cherry-pick Flink dynamic bucket streaming and global RLI fixes into release-1.2.0 by @cshuo in https://github.com/apache/hudi/pull/18788
fix(flink): Trigger a failover after pending instants recommitted for… by @cshuo in https://github.com/apache/hudi/pull/18789

New Contributors

@XianghuiBai made their first contribution in #14087
@ratuldawar11 made their first contribution in #14225
@sumi-mathew made their first contribution in #13358
@gggyd123 made their first contribution in #13843
@VahidRamezaniDB made their first contribution in #17645
@gudladona made their first contribution in #18241
@jianchun made their first contribution in #18250
@nada-attia made their first contribution in #18133
@ZZZxDong made their first contribution in #18263
@mailtoboggavarapu-coder made their first contribution in https://github.com/apache/hudi/pull/18469
@chrevanthreddy made their first contribution in https://github.com/apache/hudi/pull/17951
@aaaZayne made their first contribution in https://github.com/apache/hudi/pull/18533

Full Changelog: release-1.1.0...release-1.2.0

apache/hudi release-1.2.0 1.2.0 Release on GitHub

What's Changed

New Contributors

apache/hudi release-1.2.0
1.2.0 Release

on GitHub