What's Changed
- chore: Moving to 1.2.0-SNAPSHOT on master branch by @yihua in #14055
- [HUDI-9782] Add validation and cleanup apis for storage LP audit validation by @alexr17 in #13886
- feat: add new Hudi demo app
hudi-notebooksby @deepakpanda93 in #14023 - chore: Reduce log volume by changing INFO to DEBUG for table loading messages by @shubhampatel28 in #14057
- fix: Fix output type extracting for key selector in flink stream read by @cshuo in #14065
- docs: RFC-100 - Unstructured Data Storage in Hudi (Initial strawman proposal) by @vinothchandar in #13924
- chore: Exclude hudi-trino-plugin from RAT checks by @voonhous in #14067
- fix: creating warehouse bucket automatically by @deepakpanda93 in #14079
- fix: update utils.py and notebook by @deepakpanda93 in #14080
- fix: Spark Schema Evolution Fix for nested columns by @the-other-tim-brown in #14075
- fix: Fixing point lookup in MDT partitions by @nsivabalan in #14085
- fix: disable embedded timeline service for flink upgrade by @cshuo in #14096
- fix: flink mdt compaction should finish pending compactions first by @danny0405 in #14095
- fix: exclude unused dependencies from META INF in presto bundle by @vamsikarnika in #14102
- fix: Upgrade Parquet Avro and commons lang3 versions in presto bundle by @vamsikarnika in #14099
- fix: show_index command output had incorrect order of column names by @linliu-code in #14113
- fix: Remove catalog access from SparkSQLWriter by @linliu-code in #14083
- fix: Ignore field nullability while checking whether record should be… by @cshuo in #14094
- fix: disable NBCC with default single writer for downgrade less than version 8 by @vamshikrishnakyatham in #14109
- fix: Skip payload class validation when merge mode is not custom by @linliu-code in #14116
- fix: MERGE INTO statement produces misleading UNRESOLVED_COLUMN error when target table doesn't exist instead of TABLE_OR_VIEW_NOT_FOUND by @vamshikrishnakyatham in #14118
- fix: Handle deletes and updates properly in secondary index by @yihua in #14090
- fix(core): add table level validation for decimal evolution by @jonvex in #14089
- fix: Handle missing valueType column after upgrade by @linliu-code in #14105
- perf: Reduce memory usage of writing HFile log block by @yihua in #14078
- fix: Fix cleaning of historical internal schema files by @cshuo in #14126
- fix: Upgrade parquet-avro version to 1.15.1 in trino bundle and plugin by @vamsikarnika in #14140
- fix: Upgrade Java xmlbuilder version to fix CVE-2014-125087 by @vamsikarnika in #14144
- fix: fix partition stats delete properly for downgrade from V9 to V8 by @vamshikrishnakyatham in #14138
- fix: updating error messages thrown to end users by @vamshikrishnakyatham in #14115
- fix: Fix the instant time issue for row writer bulk insert hoodie streamer by @vamsikarnika in #14153
- fix: Fixed the recovering method for the older versions where checksum is not present by @Rajeev-01 in #14148
- refactor: Add required setter methods for Flink-CDC by @voonhous in #14150
- fix: Fixing secondary index read perf for V1 layout by @nsivabalan in #14149
- fix: Give proper error message for multi-writer scenarios without lock provider set by @linliu-code in #14119
- feat: Partition predicate fix for Databricks runtime support by @ad1happy2go in #14059
- fix: Partition stats should be controlled using column stats config by @lokeshj1703 in #14165
- fix: Fix upgrade handling for MySqlDebeziumAvroPayload with deltastreamer by @lokeshj1703 in #14159
- fix: fix downgrade to not delete unintended partitions in MDT by @vamsikarnika in #14162
- fix: correct indentation in utils.py and add docker compose validation by @deepakpanda93 in #14168
- perf: [Flink] Introduce a Flink Clustering Plan Strategy to Eliminate Redundant Small-File Merges by @XianghuiBai in #14087
- fix: ensure that InlineFS is seeked to the correct offset upon init by @voonhous in #14178
- fix(ingest): Fix Timestamp Conversions, Add legacy api support by @jonvex in #14076
- fix: Avoid changing table configs when creating a table with an existing base path on Spark by @yihua in #14175
- fix: build notebook hive image using compatible mode for arm64 by @xushiyan in #14190
- fix: Fix file pruning based on column stats for flink reader by @cshuo in #14186
- refactor: remove buildx option in docker build for notebooks by @deepakpanda93 in #14199
- feat: introduce pk filter push-down to base file by @TheR1sing3un in #14183
- fix: Fix predicates for base file reader in Flink FileGroup reader by @cshuo in #14197
- fix: Avoid deleting metadata table with MOR during upgrade / downgrade by @linliu-code in #14191
- docs: update javadoc of BucketIndexUtil by @voonhous in #14195
- fix: Fixing record index related configs and enums by @nsivabalan in #14180
- refactor: change access level of flushRemaining for flink-cdc require… by @voonhous in #14206
- fix: Fix build because of record index config renaming by @yihua in #14215
- test: fix flaky test case in TestBootstrapReadBase by @TheR1sing3un in #14210
- test: fix flaky test in TestSecondaryIndex by @TheR1sing3un in #14211
- test: Enhance downgrade test with compaction by @yihua in #14226
- perf: reduce unnecessary row group metadata loading by @TheR1sing3un in #14208
- fix: Move hudi split loaders to resumable tasks architecture to prevent deadlocks by @ratuldawar11 in #14225
- test: Disable failing test testFiltersInFileFormat to unblock CI by @yihua in #14236
- fix: Persist RLI index bootstrap records only if estimation is required and add unpersist by @lokeshj1703 in #14069
- refactor: Reuse sparkSession and sparkContext variables in HoodieSparkSqlWriter by @huangxiaopingRD in #14231
- fix: Exclude unnecessary netty dependencies from hudi jars by @vamsikarnika in #14142
- feat: Only when the target table to be inserted/merged is a hudi table should the meta fields be eliminated by @TheR1sing3un in #14230
- feat: Support TIMELINE_SERVER_BASED markers for flink writer by @cshuo in #14202
- fix: Upgrade parquet-avro version for hudi-presto-bundle and CVE-2025-30065 by @sumi-mathew in #13358
- fix: Disable positional merging for spark version < 3.5 by @linliu-code in #14241
- docs: Claim RFC-81: Introduce Primary Key Sorted Table by @TheR1sing3un in #14245
- fix(ingest): Repair affected logical timestamp milli tables by @jonvex in #14161
- fix: Update metadata table record level index config keys naming for standardization by @linliu-code in #14244
- feat: introduce pk filter to log file by @TheR1sing3un in #14205
- chore: Update DOAP with 1.1.0 Release by @yihua in #14294
- chore: Update release candidate validation in Github action by @yihua in #14295
- docs: RFC-95 - New Hudi Flink Source implementation by @HuangZhenQiu in #13381
- test: Clean up all the behaviors of directly setting spark conf in spark test to avoid flaky tests by @TheR1sing3un in #14198
- [MINOR] Cleanup old spark3.5 version in pom.xml by @yongkyunlee in #14304
- feat(schema): New Hudi Schema Class - Initial implementation. Also, Add new APIs based on current usage of Avro schema by @bvaradar in #14265
- chore: Integration Test Flakiness: free more disk space before running by @the-other-tim-brown in #14316
- feat: adding support for trino in notebooks by @deepakpanda93 in #14242
- feat(schema): Add types for decimal, date, timestamp, time, and uuid by @the-other-tim-brown in #14312
- refactor: Migrate HoodieFileReader and HoodieFileWriter io.storage to use HoodieSchema by @rahil-c in #14313
- refactor: Clean up Spurious log block handling in LogRecordReader by @PavithranRick in #14287
- refactor(spark): Remove glob paths and deprecate read paths support by @jonvex in #14060
- feat(schema): HoodieSchema: add helper methods, fix issues with schema subtypes not returned by @the-other-tim-brown in #14346
- feat(schema): Internal Schema System Integration with HoodieSchema by @the-other-tim-brown in #14314
- fix: Fix duplicate field exception in hive query with where clause by @cshuo in #14337
- fix: push down pk filters to log file when spark enable parquetFilterPushDown by @TheR1sing3un in #14332
- fix: Fix the mismatch between operation metrics and the actual operation in the compaction plan by @TheR1sing3un in #14362
- fix: Support handling complex data types in convertRowToJsonString fo… by @cshuo in #14351
- fix: fix get empty completion time in corner case by @TheR1sing3un in #14379
- feat: Bump spark version to 4.0.1 by @CTTY in #14380
- refactor: Create parquet filters using the spark adapter by @TheR1sing3un in #14335
- test: Fix test setup and assertions in TestTableColumnTypeMismatch by @nsivabalan in #13792
- fix: Bump springboot version to fix CVE-2022-1471 by @CTTY in #14383
- fix: Only use index when index metadata is present by @CTTY in #14385
- feat: Add storage in HoodieCatalogTable by @CTTY in #14386
- fix: Include parquet-format in Hive sync bundle by @gggyd123 in #13843
- fix: Exclude guava from hive-metastore by @CTTY in #14388
- fix: Exclude jetty from javalin to fix CVE-2023-40167 by @CTTY in #14384
- chore: add
hudi-botto collaborators by @xushiyan in #14391 - feat: Use Storage from catalog table in drop table command by @CTTY in #14390
- feat: Support read virtual metadata columns for Flink reader by @cshuo in #14309
- feat(schema): Add helper to get HoodieSchema in TableSchemaResolver by @rahil-c in #17456
- feat: [HUDI-9766] Support for show_timeline Procedure with appropriate start and end time for both active and archive timelines by @PavithranRick in #14261
- refactor(spark): Rework tests that disable FileGroup Reader in Spark by @jonvex in #14061
- feat: Use storage conf for alter rename command by @CTTY in #14389
- feat: Change the config for record index max file group size to be a long by @prashantwason in #17461
- feat(schema): phase 2 - Perform Column Statistics Schema Migration by @voonhous in #14311
- feat(schema): Add support for time, fixed length byte arrays, and local timestamps to ParquetToSparkSchemaUtils by @the-other-tim-brown in #17450
- fix: Fix flaky TestHoodieIndex#testCheckIfValidCommit test by @voonhous in #17484
- feat(schema): Migrate SchemaProviders in Hudi-Utilities to use HoodieSchema by @the-other-tim-brown in #14364
- refactor: Update conversion from StructType to Avro Schema to include docs and default values by @the-other-tim-brown in #17473
- refactor: [HUDI-9335] Make RowDataKeyGens::instance the common point for keygen instantiation for Flink by @geserdugarov in #13570
- feat(schema): Migrate hudi-spark writer related classes to use HoodieSchema by @rahil-c in #14374
- chore: move disk space cleanup in integration-test CI module by @the-other-tim-brown in #17496
- refactor: introduce lombok dependency by @vinothchandar in #17500
- fix: Remove explicit casting to HoodieWriteMergeHandle in Fl… by @cshuo in #13590
- feat: add Hudi Flink source split POJOs by @HuangZhenQiu in #17483
- fix: Fixing streaming writes to metadata table for perf regression by @nsivabalan in #17477
- fix(cli): Use long type when sorting based on file status modification time by @prashantwason in #17487
- refactor: Use HoodieFileGroupReader paths for all Spark Datasource reads by @the-other-tim-brown in #17457
- feat(schema): phase 12 - Perform Data Source Helpers Migration by @voonhous in #14382
- chore: reduce log volume by changing per-file/per-block logs to DEBUG level by @shubhampatel28 in #14357
- chore: Update deploy script for release by @yihua in #14296
- fix: fix the issue where lsm writer could not write again after failure by @TheR1sing3un in #17472
- refactor: Apply lombok annotations remove boilerplate code to hudi-aws by @voonhous in #17522
- feat(schema): Update HiveSyncTool and other meta sync tools to use HoodieSchema by @the-other-tim-brown in #14344
- feat(schema): Port more utility code for HoodieSchema by @the-other-tim-brown in #17526
- fix: Fix Flink profiles and modules for release version change by @yihua in #17528
- feat(metrics): Publish log block compaction metrics by @suryaprasanna in #17518
- feat(schema): phase 5 - Perform Java Client Core Migration by @voonhous in #14340
- refactor: Apply lombok annotations and remove boilerplate code to hudi-cli by @voonhous in #17523
- feat(schema): Migrate hudi-flink to use HoodieSchema instead of avro Schema by @rahil-c in #14355
- feat: add Hudi static split enumerator for Flink source by @HuangZhenQiu in #17503
- refactor: Add lombok annotations to hudi-flink-client module by @voonhous in #17534
- chore: add
cshuoto collaborators by @xushiyan in #17544 - refactor: remove unused method in IncrementalInputSplits by @HuangZhenQiu in #17545
- refactor: Remove getAvroSchemaConverters API from Spark adapter by @yihua in #17554
- feat: add CLI command to show inflight instants older than specified duration by @suryaprasanna in #17511
- feat(schema): Add converter for Spark StructType to HoodieSchema by @rahil-c in #17475
- chore: add new collaborators by @xushiyan in #17555
- chore: keep collaborators <= 10 by @xushiyan in #17559
- fix: Fix failing CI caused by multiple definition of IncompatibleSchemaExc… by @voonhous in #17561
- chore: rename BaseInstantTime to LogFileInstantTime in log summary by @shubhampatel28 in #17567
- feat(schema): Migrate HoodieFileGroupReader and related classes to use HoodieSchema by @the-other-tim-brown in #17536
- feat(schema): Update serialization for HoodieSchema by @the-other-tim-brown in #17575
- fix: Lazily load the dfs properties configuration to avoid static initialization failures by @alexr17 in #17552
- test: Make testReadChangelogIncremental parametrized by @kamronis in #17564
- refactor: small cleanups in hudi-cli classes by @vinothchandar in #17585
- fix: Avoid using HoodieHadoopStorage directly by @CTTY in #17560
- feat(schema): Migrate log reader and partitioners to take HoodieSchema by @the-other-tim-brown in #17548
- refactor: code sweep on hudi-io, hudi-hadoop-mr to streamline class organization by @vinothchandar in #17586
- refactor: Add Lombok annotations to hudi-spark-client module by @voonhous in #17572
- chore: Add SamplingLogger utility for reducing log volume while maintaining observability by @shubhampatel28 in #14354
- refactor: Apply lombok annotations and remove boilerplate code to hudi-client-c… by @voonhous in #17524
- test: Add validation on Spark SQL test classes and fix package structure by @yihua in #14381
- feat(schema): Migrate BigQuery schema converter to use HoodieSchema by @the-other-tim-brown in #17498
- refactor: Add Lombok annotations to hudi-example modules by @voonhous in #17589
- refactor: Remove old code and comments after deprecating Scala 2.11 support by @yihua in #17592
- fix(schema): Fix creation of HoodieSchema from avro string not delega… by @voonhous in #17597
- refactor: Add Lombok annotations to hudi-java-client module by @voonhous in #17588
- feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 1) by @voonhous in #17535
- fix: Support column stats prunning on metadata columns for flink reader by @cshuo in #17580
- refactor: Add Lombok annotations to hudi-flink-x modules by @voonhous in #17590
- refactor: Remove unnecessary utils in FileCreateUtils by @yihua in #17593
- docs: Claim RFC-102: RLI support for Flink streaming by @danny0405 in #17609
- refactor: code sweep on hudi-hadoop-common, hudi-common on class organization by @vinothchandar in #17611
- refactor(spark): Keep one latestCommitCompletionTime method in DataSourceTestUtils by @CTTY in #17608
- feat: Support data skipping based on record index for flink reader by @cshuo in #17490
- refactor: Add Lombok annotations to hudi-gcp module by @voonhous in #17621
- feat: add flink continuous split enumerator by @HuangZhenQiu in #17562
- feat: Support flink 2.1 by @cshuo in #17574
- feat: add record write failure log and metrics by @HuangZhenQiu in #13417
- fix: incorrect CDC read from table with unfinished compaction by @kamronis in #17607
- feat(schema): Migrate spark reader side related classes to use HoodieSchema directly by @rahil-c in #17573
- chore: Updating doap file for 1.1.1 release by @nsivabalan in #17635
- refactor: Add Lombok annotations to hudi-common module (part 1) by @voonhous in #17630
- feat: Upgrade build target to Java 11 by default by @the-other-tim-brown in #17637
- feat: Support MDT compaction configs of frequency seconds and trigger strategy by @kbuci in #17603
- chore: Add scripts and docs for Docker image used in Azure CI by @yihua in #17602
- docs: Fix javadoc comments by @VahidRamezaniDB in #17645
- refactor: Add Lombok annotations to hudi-flink module by @voonhous in #17612
- feat: support extract hadoop conf from Flink runtime by @HuangZhenQiu in #13259
- feat(schema): Migrate StreamSync code path and its dependencies to use HoodieSchema by @the-other-tim-brown in #17600
- refactor: Add Lombok annotations to hudi-hadoop-common module by @voonhous in #17662
- chore(deps): bump org.apache.logging.log4j:log4j-core from 2.17.2 to 2.25.3 by @dependabot[bot] in #17653
- chore(deps): bump org.apache.parquet:parquet-avro from 1.15.1 to 1.15.2 in hudi-trino-plugin by @dependabot[bot] in #17666
- chore(deps): bump org.glassfish:jakarta.el from 3.0.3 to 3.0.4 in hudi-cli-bundle by @dependabot[bot] in #17651
- refactor: Add Lombok annotations to hudi-integ-test module by @voonhous in #17667
- perf: optimize removeCommitMetadata method in HoodieCDCLogger by @kamronis in #17669
- feat(schema): Phase 24 - Restore O(1) reference equality comparison i… by @voonhous in #17672
- feat: Add HoodieBaseLanceFileWriter and implementation for SparkFileWriter by @rahil-c in #17629
- refactor: Remove org.jetbrains.annotations imports by @yihua in #17680
- test: Fix flaky test ITTestHoodieFlinkCompactor#testHoodieFlinkCompac… by @cshuo in #17677
- chore: Test Runtime Improvements: lower number of files, parallelize reads by @the-other-tim-brown in #17671
- feat(schema): phase 17 - Remove AvroSchemaUtils usage (part 2) by @voonhous in #17581
- test(ci): Add JVM tuning for Java 11+ test execution to reduce CI runtime by @yihua in #17712
- refactor: Add Lombok annotations to hudi-kafka-connect by @voonhous in #17715
- refactor: Add Lombok annotations to hudi-io module by @voonhous in #17685
- test: Fix flaky test testLatestCheckpointCarryOverWithMultipleWriters by @yihua in #17722
- fix: Fix ConcurrentModificationException in RocksDBDAO when accessed by Timeline Service by @yihua in #17717
- feat: Add HoodieSparkLanceReader for reading lance files to internal row by @rahil-c in #17632
- refactor: Add Lombok annotations to hudi-platform-service by @voonhous in #17719
- chore(ci): Use non-archive repo for maven binary download by @voonhous in #17723
- feat(schema): Migrate clustering operations to use HoodieSchema by @the-other-tim-brown in #17691
- refactor: Add Lombok annotations to hudi-spark,hudi-spark-common by @voonhous in #17718
- chore(ci): Retry spark downloads by @the-other-tim-brown in #17732
- chore(ci): remove arg line overrides in azure pipelines by @the-other-tim-brown in #17741
- perf: optimize rollback validation by checking lazy rollback policy before clustering validation by @suryaprasanna in #17537
- perf: use shallow projection where applicable by @kamronis in #17682
- chore(ci): upgrade to newer plugins and new test dependencies that required java11+ by @the-other-tim-brown in #17657
- docs: Improve the annotation format of examples by @huangxiaopingRD in #17749
- refactor: Replace HoodieHadoopStorage instantiation with HoodieStorageFactory by @KiteSoar in #17661
- feat: Implement SparkColumnarFileReader for Datasource integration with Lance by @rahil-c in #17660
- chore(ci): Add cache for checkout by @the-other-tim-brown in #17738
- feat(schema): Migrate hudi spark client to use HoodieSchema by @rahil-c in #17743
- feat(schema): Phase 18 - HoodieAvroUtils removal (Part 1) by @voonhous in #17599
- feat: Support COW bulk-insert, insert, upsert, delete works with spark datasource and lance by @rahil-c in #17731
- chore(ci): remove repeated checkout by @the-other-tim-brown in #17755
- chore(ci): Hudi-utilities test improvements by @the-other-tim-brown in #17758
- feat(schema): Migrate spark schema conversion utils to their HoodieSchema equivalent by @the-other-tim-brown in #17765
- feat(schema): Migrate json and proto converters to use HoodieSchema by @the-other-tim-brown in #17740
- refactor: Remove Builder from DynamoDbBasedLockConfig by @voonhous in #17780
- perf: Reduce unnecessary timeline loading on the Flink-TM side by @TheR1sing3un in #17762
- style: Correct wrong Apache license by @huangxiaopingRD in #17790
- fix(metadata): propagate timeline server config from main dataset to metadata by @prashantwason in #17486
- test: Fix flaky test in TestHoodieClientMultiWriter by @yihua in #17793
- fix: Fix the timeline compaction blocked caused by the archived file being too large by @TheR1sing3un in #17784
- fix: Handle hudi table reads when databaseName is not set during initTable by @vinishjail97 in #17695
- feat(schema): Phase 18 - HoodieAvroUtils removal (Part 2) by @voonhous in #17763
- feat: Support splitting tasks based on file size when reading the cow table by @TheR1sing3un in #17730
- fix: Add complex types testing for lance by @rahil-c in #17769
- refactor: Add Lombok annotations to hudi-timeline-service by @voonhous in #17742
- fix(spark): Add clearJobStatus() calls after setJobStatus() operations by @prashantwason in #17451
- feat: Use official Kafka docker images by @rangareddy in #17794
- chore: Exclude data file from rat-plugin check by @huangxiaopingRD in #17789
- refactor: Add Lombok Builder annotation to TimelineService.Config by @voonhous in #17807
- feat: Introduce inflight record index cache for bucket assigning by @cshuo in #17802
- feat(schema): Migrate HoodieRecord methods to use HoodieSchema instead of Avro.Schema by @KiteSoar in #17772
- refactor: Add Lombok annotations to hudi-common module (part 2) by @voonhous in #17655
- perf: improve performance for S3 meta event source by @the-other-tim-brown in #17822
- fix: Wiring in clean max commits to metadata table by @nsivabalan in #17819
- refactor: Add Lombok annotations to hudi-sync modules by @voonhous in #17728
- docs: Update minimum Java version to JDK 11 in documentation by @yihua in #17824
- feat: the basic new hudi source reader by @HuangZhenQiu in #17773
- docs: Claim RFC-103 Hudi LSM Tree Layout by @zhangyue19921010 in #17826
- feat: double buffer based async write for append only write by @HuangZhenQiu in #13892
- feat(schema): Remove direct usage of Avro schema in Flink-client path by @the-other-tim-brown in #17739
- test: correct arguments pass to TestData::assertRowsEqualsUnordered by @geserdugarov in #17840
- feat: Support bucket assigning based on record level index by @cshuo in #17803
- feat: align clustering and compaction retry flow for Flink and Spark by @xushiyan in #17839
- fix: make insert overwrite with bulk insert more performant on unpartitioned table by @alexr17 in #17821
- feat(schema): Spark Row to/from Avro conversion updates by @the-other-tim-brown in #17817
- fix: Ensure that custom logical types in records are preserved during… by @voonhous in #17845
- feat(schema): Phase 18 - HoodieAvroUtils removal (Part 3) by @voonhous in #17659
- chore: update collaborator list by @xushiyan in #17881
- [MINOR] Remove duplicate shade relocation by @majian1998 in #17841
- feat(schema): Remove spark-avro schema converter by @the-other-tim-brown in #17884
- feat(schema): Add fetching default values for FIXED, DECIMAL, TIME, … by @voonhous in #17892
- perf: Support mini-batch access to the MDT index for bucket assign fu… by @cshuo in #17867
- feat: support disruptor-queue buffer for Flink writers by @xushiyan in #17864
- feat(schema): Fix null checks after migrating to HoodieSchema by @voonhous in #17909
- fix: Handle all exception types when fetching table path on reads by @suryaprasanna in #17860
- feat(lance): Upgrade Lance version for new writer functionality by @the-other-tim-brown in #17900
- feat: Ensure MOR table works, with lance base files and avro logs file by @rahil-c in #17768
- fix: address minor compilation issue in getAvroBytes by @rahil-c in #17926
- chore: Include table name in FileSystemBackedTableMetadata stage names by @suryaprasanna in #17861
- fix: handle ArrayIndexOutOfBoundsException for non-partitioned datasets during upgrade by @suryaprasanna in #17933
- feat: support mini batch split reader by @HuangZhenQiu in #17872
- fix: Prevent HiveSyncTool from running twice in meta sync by @suryaprasanna in #17937
- feat: Support bucket assgin operator fetching inflight instants from coordinator by @cshuo in #17885
- perf: Allow all processed commits to be cached in the CompletionTimeQueryViewV2 by @the-other-tim-brown in #17914
- feat(schema): Phase 18 - HoodieAvroUtils removal (Part 4) by @voonhous in #17801
- fix: Allow both checkpoint v1 and v2 keys to be resolved by @voonhous in #17919
- fix: Fix TestBucketizedBloomCheckPartitioner assertArrayEquals compar… by @voonhous in #17888
- feat(lance): Remove extra buffering in Lance writer by @the-other-tim-brown in #17916
- fix: Check for existing SparkContext before creating new one in CLI by @suryaprasanna in #17862
- refactor: rename MergeOnReadSplitReaderFunction by @HuangZhenQiu in #17967
- fix: Allow String ordering fields can work with JSON src with COW by @voonhous in #17953
- fix: fix viewfs schema file creation as not atomic by @TheR1sing3un in #17965
- fix: Rename HoodieDataSourceHelpers#listCompletionTimeSince references by @voonhous in #17983
- fix: Propagate cfg.sourceOrderingFields in HoodieStreamer by @voonhous in #17984
- fix: Prevent unnecessary rewrites for skeleton records by @voonhous in #17969
- fix: Handle nested map and array columns in MDT by @vinishjail97 in #17694
- feat: add basic hoodie source by @HuangZhenQiu in #17989
- test: add flink mini cluster for append function integ test by @xushiyan in #17972
- refactor: simplify the HoodieSplitReaderFunction by @HuangZhenQiu in #18004
- fix: Fix incremental query with full scan mode on MOR tables on Databricks Runtime by @yihua in #18003
- fix: Handle external file groups in ExternalFilePathUtil by @vinishjail97 in #17788
- refactor: drop unused InMemoryFileSystem class and test by @vinothchandar in #17997
- feat(schema): Add VARIANT support to HoodieSchema by @voonhous in #17751
- feat: introduce timeline manifest retained version conf by @TheR1sing3un in #17996
- fix: Update default Parquet version to 1.13.1 by @suryaprasanna in #17941
- fix: unpersist cached objects in SqlQueryEqualityPreCommitValidator by @suryaprasanna in #17931
- test(record-index): add coverage for tag location call for various indexes by @suryaprasanna in #17494
- feat(storage): add config to allow duplicates while writing to HFiles by @suryaprasanna in #17495
- fix: Not ignore IOException when cleaning the file by @TheR1sing3un in #17987
- fix: Remove default record key and ordering fields values on the Flink side, consistent with Spark by @geserdugarov in #17994
- fix: too many properties passed to hive table through hoodie hive catalog by @kamronis in #18011
- feat: Add a new index write function for flink writer by @cshuo in #17838
- feat: add flink HoodieSourceSplitComparator by @HuangZhenQiu in #18009
- feat: Add configurable cleaner policy for metadata table by @suryaprasanna in #17935
- fix: Provide commit timeline during HoodieROTablePathFilter construction by @suryaprasanna in #17859
- fix: Adding tests for rolling back on commits older than replacecommit and compaction commits by @suryaprasanna in #17932
- perf: Reduce memory usage in getAllPartitions by storing only path and directory flag by @suryaprasanna in #17947
- fix: include Hoodie metadata fields when reading Parquet files in precommit validators by @suryaprasanna in #17505
- fix: Allow configurable storage level while computing expression index update by @lokeshj1703 in #17737
- feat(schema): Update schema repair tools to work on HoodieSchema by @the-other-tim-brown in #17952
- fix(common): Handle null actionState in LegacyArchivedMetaEntryReader by @prashantwason in #18024
- feat: publish clean and archival duration metrics in finally block by @suryaprasanna in #17945
- fix: enable Hive support when creating JavaSparkContext for Spark SQL queries by @suryaprasanna in #17510
- feat: enable new Hoodie source in HoodieTableSource by @HuangZhenQiu in #18022
- feat: Integrate the mdt compaction with existing flink compaction pipeline by @cshuo in #17991
- perf: Bloom filter improvements for memory usage by @the-other-tim-brown in #18015
- feat: Support slash separated date partitioning for Hudi tables by @suryaprasanna in #17787
- fix: Use TableSchemaResolver in setWriteSchemaForDeletes for better schema resolution by @prashantwason in #18030
- feat(metadata): Handle metadata table service failures gracefully and emit metrics by @suryaprasanna in #17930
- fix: allows eager failure from abnormals for streaming write by @fhan688 in #12150
- perf: Bloom filter improvements for memory usage (address feedback) by @the-other-tim-brown in #18063
- fix(utilities): Use passed-in configs when propsFilePath is null or empty in HoodieStreamer by @prashantwason in #17467
- fix: Add config version information to DataSourceOptions by @huangxiaopingRD in #17733
- fix: Ensure Lance works when populateMetaFields is false with user defined keygen by @rahil-c in #18042
- refactor: Add Lombok annotations to hudi-common module (part 4) by @voonhous in #17830
- refactor: Add Lombok annotations to hudi-utilities (Part 2) by @voonhous in #17876
- fix: reload table config after record index bootstrap to avoid bloom index fallback by @suryaprasanna in #17508
- refactor: migrate to ScanV2Internal API and remove ENABLE_OPTIMIZED_LOG_BLOCKS_SCAN config by @suryaprasanna in #17520
- fix(flink): Handle Non-Null Complex Types with Nullable Elements in ParquetSchemaConverter by @prashantwason in #18087
- perf: Support lazy clean of the RLI cache during bucket assigning by @cshuo in #18018
- fix: correct deleted keys computation in computeRevivedAndDeletedKeys by @vamsikarnika in #18094
- fix: disable retries in s3/gcs storage lock clients for storage based LP by @alexr17 in #17869
- feat(schema): Remove direct reliance on Avro for schema compatibility checks by @the-other-tim-brown in #18006
- fix: exit transaction with error in storage LP when unlock failure due to lock acquired by others by @alexr17 in #17871
- perf: Avoid re-fetching file status from FS for HFile readers by @the-other-tim-brown in #17709
- feat(schema): Remove usage of migrated AvroSchemaUtils and HoodieAvroUtils methods (part 1) by @the-other-tim-brown in #18007
- feat: support flink split distribution strategy by @HuangZhenQiu in #18082
- feat: Lance schema evolution (add column, type promotion) by @rahil-c in #17904
- feat(schema): Minor cleanup of Avro schema usage by @the-other-tim-brown in #18043
- feat: support partition pruner in Flink hudi source v2 by @HuangZhenQiu in #18074
- refactor: apply lombok for flink source v2 related classes by @HuangZhenQiu in #18122
- refactor: Add Lombok annotations to hudi-common module (part 6) by @voonhous in #17880
- [MINOR] Preload file listing for partitions in BloomIndex to avoid repeated listings by @prashantwason in #17462
- fix: (table-services) When using multiwriter do not delete pending roll… by @kbuci in #18093
- feat(spark): Add guardrail to prevent writes when Spark speculative execution is enabled by @prashantwason in #18045
- fix: interrupt storage LP when heartbeat fails by @alexr17 in #17870
- fix: correct unsigned int conversion in TestProtoConversionUtil by @suryaprasanna in #18120
- feat: add flink stream read metrics for hudi source v2 by @HuangZhenQiu in #18130
- [MINOR] Fix HoodieLockMetrics.createTimerForMetrics to not share metric timer by @lokeshj1703 in #18097
- feat(schema): Consolidate null type handling by @the-other-tim-brown in #18163
- [HUDI-9730] RFC-99 Hudi Type System by @bvaradar in #13743
- fix: flink source v2 serializability by @HuangZhenQiu in #18165
- feat: Add metadata record_index lookup command to Hudi CLI by @suryaprasanna in #17940
- test: add unit test for multiple partition filters on same column by @suryaprasanna in #17934
- feat: Adding Presto to Hudi Notebooks by @rangareddy in #18078
- [MINOR] Publish HUDI version metrics as integers by @prashantwason in #17466
- refactor: Add Lombok annotations to hudi-common module (part 5) by @voonhous in #17878
- test(concurrency): add tests for write conflicts with different conflict resolution strategies by @suryaprasanna in #17501
- fix: Include metadata file cache size option in the configuration for… by @cshuo in #18175
- fix(spark): Fix TestSparkSchemaUtils failing with Spark 3.3 due to timestamp_ntz by @prashantwason in #17917
- fix(flink): include exception stacktrace in error logs by @prashantwason in #18091
- feat: Publish commits to process metrics for deltastreamer by @suryaprasanna in #17929
- fix: Use local engine context for clean planning on metadata and non-partitioned tables by @suryaprasanna in #17942
- perf(common): Make ThreadLocal variables in HoodieAvroDataBlock static by @prashantwason in #18023
- fix(metadata-table): exclude failed deletes when updating MDT with clean metadata by @prashantwason in #18035
- chore: Fix flakey test by ensuring unsigned values in Proto conversion are positive by @the-other-tim-brown in #18186
- feat(blob): update approach to remove reliance on column groups, break down plan by @the-other-tim-brown in #18013
- fix: Empty write should not cause spark analysis errors with pre-commit validators by @kbuci in #18128
- fix: throw correct exception when reading hoodie.properties file without access by @suryaprasanna in #18176
- refactor: Remove redundancy in index validation logic in HoodieIndexU… by @voonhous in #17911
- fix: SimpleAvro-, NonpartitionedAvro- and ComplexAvroKeyGenerator are also valid for writing by Spark when meta-fields are disabled by @wombatu-kun in #18187
- feat(flink): lookup join with retry and async capabilities by @wombatu-kun in #18193
- fix: revert (feat: support mini batch split reader) by @HuangZhenQiu in #18200
- fix(flink): Use blocking instant generation when CDC is enabled by @cshuo in #18206
- refactor: Remove not used classes from org.apache.hudi.spark.internal by @geserdugarov in #18211
- chore: Add .claude and .codex directories to .gitignore by @vinothchandar in #18213
- fix(trino): Fix Docker initialization issue in the Trino plugin by @vamsikarnika in #18220
- docs(spark): Update description of modules related to integration with Spark by @geserdugarov in #18219
- fix: Handle case when 0 byte completed commit files present in the timeline by @suryaprasanna in #18210
- feat(blob): Blob schema definition by @the-other-tim-brown in #18108
- chore(ci): Add Codecov coverage report in GitHub actions by @yihua in #18230
- feat: support predicate push down in hoodie flink source v2 by @HuangZhenQiu in #18212
- feat(flink): Off-heap lookup join cache backed by RocksDB by @wombatu-kun in #18231
- fix: Remove trailing colon from incomplete error message in HoodieTableMetadataUtil by @shangxinli in #18233
- fix: Fix typos across codebase by @shangxinli in #18232
- fix: Fix SHOW PARTITIONS commands functionality for slash-separated date partitioning by @suryaprasanna in #18195
- fix: Fix string handling on bloom index Metadata Payload by @the-other-tim-brown in #18240
- chore(ci): cleanup for print statements, showing tables/schemas by @the-other-tim-brown in #17771
- fix: Use correct lastCompletedTransactionMetadata while acquiring lock for clustering by @suryaprasanna in #18198
- feat(spark): add HoodieSparkSQLUtils APIs and tests by @suryaprasanna in #18202
- feat(spark-datasource): support spark.hoodie.* read config overrides by @suryaprasanna in #18205
- test: Add Scala test for record index rebootstrap on non-Hoodie partitions by @suryaprasanna in #18208
- fix: Fail metadata bootstrap early in presence of 0 byte file by @suryaprasanna in #18209
- feat(metadata-table): Add count validation for record index bootstrap by @prashantwason in #18029
- refactor: move source assign package under split by @HuangZhenQiu in #18253
- perf: Adding support for LatestBaseFilesPathFilter to Spark File Index by @suryaprasanna in #18136
- fix: add all fields in HoodieSourceSplitSerializer by @HuangZhenQiu in #18243
- fix: [HUDI-CLUSTERING] Optimize binary copy performance with lazy loading, bulk reads, and double buffering by @gudladona in #18241
- fix(flink): Use timestamp based partitioning in AutoRowDataKeyGen by @prashantwason in #18090
- feat(flink): collect event time in HoodieRowDataCreateHandle for min/max event time metrics by @jianchun in #18250
- feat(table-services): Emit archival metrics for monitoring and debugging by @nada-attia in #18133
- feat(table-services): Add config to filter partitions during full clean by @prashantwason in #17550
- feat(metrics): emit metric for rollback failures by @nada-attia in #18148
- feat: Notebooks to support multiple hudi versions by @rangareddy in #18255
- perf: eliminate unnecessary timeline loading for Flink append only write path by @danny0405 in #18264
- feat: Use PartitionValueExtractor interface in Spark reader path by @suryaprasanna in #17850
- feat(vector): add VECTOR type to HoodieSchema by @rahil-c in #18146
- fix: infer record merge mode for pre-v9 tables in generateRequiredSchema by @vamsikarnika in #18106
- test(common): add JVM memory reporting test for environment diagnostics by @suryaprasanna in #18207
- fix(table-services): When applying rollback metadata to metadata table (v6) do not rollback a metadata table deltacommit if it has been already rolled back by post-commit rollback by @kbuci in #18160
- refactor: Hudi Flink source v2 with better context management by @HuangZhenQiu in #18269
- feat(table-services): Allow users to not parallelize each partition with engine context during clustering planning by @kbuci in #18191
- feat(client): Add pre-write validator framework by @nada-attia in #18239
- feat(vector): Add further research for supporting VECTOR type to RFC-99 by @rahil-c in #18184
- feat(table-services): Support clustering file groups with earlier instants times first by @kbuci in #18174
- feat(spark): ZooKeeper node should hold spark app id (for helping debug when lock is held for long time) by @kbuci in #18123
- fix(flink): Don't perform table service during mdt initialization if … by @cshuo in #18283
- fix: Remove noisy logging when table partition is empty by @yihua in #18290
- fix: Improve config docs of enabling column stats in metadata table by @yihua in #18289
- feat(vector): add converters from spark to hoodieSchema for vectors by @rahil-c in #18190
- fix(flink): enable integration test for Hudi Flink Source V2 by @HuangZhenQiu in #18287
- fix: Databricks Spark 3.4 Runtime compatibility for reading Hudi tables by @yihua in #18292
- feat(flink): Add Kafka offset tracking to Flink Hudi commits by @shangxinli in #18127
- perf(table-services): Incremental clean planning (for COW) should ignore partitions from instants with only new file groups by @kbuci in #18016
- feat(flink): Add helper functions to parse Kafka offset differences b… by @shangxinli in #18125
- fix(spark): SparkSQL write queries should correctly infer HUDI write configs from spark.hoodie.* configs in spark conf by @kbuci in #18297
- fix(table-services): When single clustering group config is disabled, clustering should not create clustering groups with same number of input/output files by @kbuci in #18172
- feat: add support for touch partitions in HiveSyncTool by @nada-attia in #18064
- feat(flink): Support create table DDL without primary key by @prashantwason in #18086
- fix: sort partitions after filtering for clustering planning by @prashantwason in #18092
- refactor: rewrite executors tests to avoid code duplication by @yaojiejia in #18005
- fix(common): Handle zero byte properties file and ensure atomic writes during modification by @prashantwason in #18058
- [HUDI-7503] Compaction execution should fail if another active writer is already executing the same plan by @kbuci in #18012
- feat(common): Add Policy for cleanup/rollback before each write by @kbuci in #18197
- fix(metadata): Allow metadata table bootstrap when pending commits are being rolled back by @prashantwason in #18033
- fix(common): Filter stray files when loading partitions in AbstractTableFileSystemView by @prashantwason in #18047
- fix(clustering): When inferring wether an instant is clustering, do not fail if replacecommit was rolled back already (by a concurrent writer) by @kbuci in #18288
- docs: RFC-102 - Spark Vector Search in Apache Hudi by @rahil-c in #14218
- feat(conflict-resolution): Allow PreferWriterConflictResolutionStrategy to abort clustering if there is an ongoing write that is in requested state. by @kbuci in #18280
- feat(hudi-sync): Publish HUDI version to Hive metastore (allowing users to infer which HUDI client jar to use for a given dataset) by @kbuci in #18307
- chore(ci): Add test jobs and Codecov integration in GitHub Actions by @yihua in #18225
- chore(ci): Simplify test combinations on Spark in Github actions by @yihua in #18336
- chore(ci): Add codecov coverage from tests running on Spark 4.0 by @yihua in #18335
- feat(metasync): Support HMS 4.x in JDBC sync mode via automatic Thrift fallback by @bvaradar in #18227
- feat(flink): Support write buffer based on flink managed memory by @cshuo in #18319
- feat(lance): Support bloom filter in Lance writer and reader by @wombatu-kun in #18304
- fix: Use explicit Throwable type in AvroConversionUtils catch clause by @yihua in #18342
- docs: Update the build instructions by mentioning profiles in README by @rangareddy in #18310
- feat(utilities): add DELETE operation support for HudiStreamer by @prashantwason in #18088
- feat(metadata-table): add config to disable automatic deletion of MDT partitions by @prashantwason in #18181
- fix(concurrency): detect rollback conflicts with ongoing commit operations by @prashantwason in #18089
- feat(common): add core pre-commit validation framework - Phase 1 by @shangxinli in #18068
- fix: Fix flaky test TestProtoConversionUtil#allFieldsSet_wellKnownTyp… by @cshuo in #18352
- fix(flink): enable batch read it for flink source v2 by @HuangZhenQiu in #18325
- fix: modify the incorrect Hive configuration in hoodie hive catalog by @yangxiao0320 in #18365
- feat: support read commits limit in Hudi Flink Source V2 by @HuangZhenQiu in #18369
- feat(hive-sync): add Spark-catalog based metastore client implementation to avoid Hive-on-Spark classloader issues by @suryaprasanna in #18203
- fix(common): fix typos commited -> committed, commiting -> committing by @shangxinli in #18363
- feat: support read splits limit in Hudi Flink Source V2 by @HuangZhenQiu in #18370
- feat(flink): Support bootstrap from RLI to local RocksDB for flink bu… by @cshuo in #18254
- perf: Skip unnecessary clean planning for MOR metadata table file-version cleaning by @suryaprasanna in #17943
- feat: add graceful handling for post-commit failures with metrics by @suryaprasanna in #18196
- feat(flink): Support more efficient customized serializer for HoodieRecordGlobalLocation by @cshuo in #18326
- feat(metadata): Defer RLI initialization for fresh tables to optimize file group allocation by @nsivabalan in #18353
- feat(flink): add pre-commit validation framework for Flink - Phase 2 by @shangxinli in #18362
- feat: add Flink source reader function for cdc splits by @HuangZhenQiu in #18361
- feat(vector): Support writing VECTOR to parquet and avro formats using Spark by @rahil-c in #18328
- fix: Optimizing internal schema lookup in TableSchemaResolver by @nsivabalan in #18387
- [HUDI-7030] Commit-based Clustering Plan Strategy by @prashantwason in #18251
- fix: Fixed the issue of incorrect opName values in Flink bulk insert writing by @empcl in #18313
- fix(flink): Improve splits distribution strategy for mor table w/ bucket index by @Joy-2000 in #18103
- feat: Add Unshredded Variant read & write support by @voonhous in #17833
- chore: include table name in recursive listing Spark job descriptions by @suryaprasanna in #18416
- refactor: modularize long test methods in TestHoodieClientOnCopyOnWriteStorage by @yaojiejia in #18377
- test(lance): Add test of bloomFilter support to TestLanceDataSource by @wombatu-kun in #18388
- fix: Use target schema for non-FileBased/SchemaRegistry providers in SourceFormatAdapter by @suryaprasanna in #17946
- perf: Improve Serialization Performance of BufferedRecord by @cshuo in #18418
- feat(utilities): add option to make all schema columns nullable for backwards compatibility by @prashantwason in #17777
- feat(blob): Create blobs in Spark SQL by @the-other-tim-brown in #18347
- refactor: remove HoodieWriteConfig.getOrcCompressionCodec() function by @skywalker0618 in #18422
- fix: [HUDI-3055] Fix hardcoded GZIP compression codec in HFileUtils by @ZZZxDong in #18263
- feat(lance): Implement canWrite() in HoodieSparkLanceWriter with configurable max file size for Lance by @wombatu-kun in #18341
- refactor: Clean up imports by @voonhous in #18428
- feat: support limit push down in Hudi Flink Source V2 by @HuangZhenQiu in #18406
- fix(spark): validate and normalize incremental start/end instants by @yaojiejia in #18426
- feat(vector): Add guard for user creating nested VECTOR by @rahil-c in #18431
- fix(spark): Ignore duplicate fields when merging schema in IncrementalRelation by @prashantwason in #17776
- feat(spark): implement column pruning for incremental queries by @suryaprasanna in #17514
- perf(table-services): Only attempt scheduling log compaction if number of deltacommits is at least LogCompactionBlocksThreshold by @kbuci in #18306
- fix(common): close parquet reader iterator on EOF by @suryaprasanna in #18407
- feat(metrics): Add table-specific metrics registry support for multi-tenant scenarios by @prashantwason in #18179
- feat(table-services): Support hoodie.clustering.enable.expirations to allow cleanup of failed clustering plans (intended for PreferWriterConflictResolutionStrategy) by @kbuci in #18302
- feat: improve bucket assignment for MOR with bucket index by @HuangZhenQiu in #18444
- fix(flink): Trigger a failover after pending instants recommitted to b… by @cshuo in #18434
- refactor: consolidate common utility classes for Flink CDC read by @HuangZhenQiu in #18436
- feat(common): Add API to fetch log files created on or before given instant time by @nada-attia in #18142
- fix: do not shutdown distruptor thread in snapshotState in flink connnector by @skywalker0618 in #18446
- perf(metadata): avoid recursive calls for partition listing using catalog by @suryaprasanna in #18265
- feat(vector): Add Spark VECTOR Search TVF with intial KNN algorithm by @rahil-c in #18432
- feat(schema): Add support to write shredded variants for HoodieRecordType.SPARK by @voonhous in #18036
- fix(flink): Reject deferred RLI initialization for flink writer by @cshuo in #18399
- feat: use ScanOperation for Spark 3.3 and 3.4 partition pruning by @suryaprasanna in #17936
- fix: remove unused code by @ychris78 in https://github.com/apache/hudi/pull/18473
- feat(lance): throwing exception/guard for users trying to read Lance from non-spark engines by @wombatu-kun in https://github.com/apache/hudi/pull/18481
- fix(hfile): use Hadoop WritableUtils VarInt encoding in HFile block index writer by @officialasishkumar in https://github.com/apache/hudi/pull/18465
- fix: avoid duplicate archived timeline instants from leftover merge files by @suryaprasanna in https://github.com/apache/hudi/pull/18408
- Fix BufferedReader resource leak in InputStreamConsumer by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18469
- refactor(flink): Refactor Flink compaction/clean pipeline with compos… by @cshuo in https://github.com/apache/hudi/pull/18477
- perf(core): optimize rollback listing calls on metadata table by @nbalajee in https://github.com/apache/hudi/pull/18279
- fix(flink): Handle bootstrap write metadata correctly after job resca… by @cshuo in https://github.com/apache/hudi/pull/18485
- chore(ci): Clean up env variable leak in TestSqlConf by @geserdugarov in https://github.com/apache/hudi/pull/18486
- feat(sync): Map VECTOR type to binary for metastore sync support by @voonhous in https://github.com/apache/hudi/pull/18480
- chore(deps): bump org.apache.logging.log4j:log4j-core from 2.25.3 to 2.25.4 by @dependabot[bot] in https://github.com/apache/hudi/pull/18490
- feat(common): add log reader scan metrics and logging for log block processing by @suryaprasanna in https://github.com/apache/hudi/pull/18412
- feat(flink): Add metrics for RocksDB index backend in bucket assigner by @cshuo in https://github.com/apache/hudi/pull/18484
- feat(sync): Map BLOB type to struct in Hive and BigQuery sync by @voonhous in https://github.com/apache/hudi/pull/18482
- chore: Allow versions to be specified in build_docker_images.sh by @voonhous in https://github.com/apache/hudi/pull/17948
- feat(sync): Map VARIANT type to struct in Hive, Spark, and BigQuery sync by @voonhous in https://github.com/apache/hudi/pull/18483
- fix(payload): support sentinel no-op updates in DefaultHoodieRecordPayload by @suryaprasanna in https://github.com/apache/hudi/pull/18413
- feat: Support to cap max commits to clean in one round of clean commit by @nsivabalan in https://github.com/apache/hudi/pull/18322
- fix(common): FutureUtils:allOf should always throw root cause exception by @kbuci in https://github.com/apache/hudi/pull/18456
- feat: Adding rolling extra metadata support by @nsivabalan in https://github.com/apache/hudi/pull/18421
- fix: Scanner resource leak in SqlFileBasedSource.fetchNextBatch by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18467
- fix: fix Scanner file handle leak in HiveIncrementalPuller.executeIncrementalSQL by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18457
- feat: Add ReverseOrderHoodieRecordPayload and configurable ordering behavior by @suryaprasanna in https://github.com/apache/hudi/pull/17928
- feat(spark): refresh parquet tools clustering strategy for current master by @suryaprasanna in https://github.com/apache/hudi/pull/18409
- fix: Fix BufferedReader resource leak in FileIOUtils.readAsUTFStringLines by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18470
- [HUDI-14922] Fix Windows path separator in DFSPropertiesConfiguration.getConfPathFromEnv by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18454
- chore: cleanup docker-compose files by @voonhous in https://github.com/apache/hudi/pull/17950
- fix(docker): fix docker image build with Java 11 and Hive 2.3.10 by @yihua in https://github.com/apache/hudi/pull/18519
- chore: Add Java 17 Hadoop base image and Spark 4.0.1 docker compose s… by @voonhous in https://github.com/apache/hudi/pull/18520
- feat(metadata): Allow users to safely execute compaction plans on metadata table concurrently through a table service platform (rather than only inline during write) by @kbuci in https://github.com/apache/hudi/pull/18295
- feat(docker): add --multi-arch flag for cross-platform image builds by @yihua in https://github.com/apache/hudi/pull/18522
- chore: add timing logs for file index partition and file listing by @suryaprasanna in https://github.com/apache/hudi/pull/18417
- chore(docker): bump integ-test docker-compose to Hive 2.3.10 by @voonhous in https://github.com/apache/hudi/pull/18525
- feat: Add Azure-based storage lock by @chrevanthreddy in https://github.com/apache/hudi/pull/17951
- refactor: introduce static helper method to remove clones by @aaaZayne in https://github.com/apache/hudi/pull/18533
- fix: whitelist Flink _2.12 artifacts in scala-2.13 enforcer rule by @voonhous in https://github.com/apache/hudi/pull/18508
- chore(docker): fix Hadoop entrypoint.sh property bugs in all base modules by @voonhous in https://github.com/apache/hudi/pull/18527
- chore(common): Consolidate MapUtils into CollectionUtils by @voonhous in https://github.com/apache/hudi/pull/18529
- perf(common): avoid stream allocation in CollectionUtils.createImmuta… by @voonhous in https://github.com/apache/hudi/pull/18530
- feat(flink): Implement continuous sorting feature for append write by @prashantwason in https://github.com/apache/hudi/pull/18083
- feat(utilities): add external HudiHiveSyncJob for on-demand Hive sync by @suryaprasanna in https://github.com/apache/hudi/pull/18204
- feat(blob): Read Blobs in Spark SQL by @the-other-tim-brown in https://github.com/apache/hudi/pull/18098
- fix: HoodieStorage resource leak in FileSystemBasedLockProvider.close() by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18461
- perf(common): Avoid double-iterating log files in file-system-view fi… by @voonhous in https://github.com/apache/hudi/pull/18531
- feat(vector): Add Spark SQL DDL CREATE TABLE support for VECTOR type by @voonhous in https://github.com/apache/hudi/pull/18488
- [MINOR] Bump lance to 4.0.0 and lance-spark to 0.4.0 by @rahil-c in https://github.com/apache/hudi/pull/18498
- feat: Adding support to block archival on last known ECTR for v6 tables by @nsivabalan in https://github.com/apache/hudi/pull/18380
- fix: prevent parseTypeDescriptor crash for VARIANT by @voonhous in https://github.com/apache/hudi/pull/18510
- fix: VARIANT Hive sync error when performing CREATE table DDL by @voonhous in https://github.com/apache/hudi/pull/18511
- feat: Add support for exclusive rollbacks with multi writer by @lokeshj1703 in https://github.com/apache/hudi/pull/18448
- feat(blob): followup fixes for blob reader by @rahil-c in https://github.com/apache/hudi/pull/18538
- chore(docker): add Hadoop 3.4.0 / Hive 2.3.10 / Spark 4.0.2 compose s… by @voonhous in https://github.com/apache/hudi/pull/18550
- fix: Parquet small-precision decimals decode ClassCastException by @skywalker0618 in https://github.com/apache/hudi/pull/18552
- fix: JDBC connection leak in HiveIncrementalPuller.saveDelta() by @mailtoboggavarapu-coder in https://github.com/apache/hudi/pull/18460
- chore(spark): bump spark4.version to 4.0.2 by @voonhous in https://github.com/apache/hudi/pull/18549
- fix(lance): Add Hive InputFormat stubs and fix Spark SQL for Lance file format by @rahil-c in https://github.com/apache/hudi/pull/18162
- feat(flink): Introduces dictionary encoding of payload partition path for RocksDBIndexBackend by @cshuo in https://github.com/apache/hudi/pull/18560
- feat(lance): round-trip Hudi VECTOR columns as native Lance fixed-size lists by @rahil-c in https://github.com/apache/hudi/pull/18497
- fix(vector): Register VECTOR HMS column as BINARY on Spark CREATE by @voonhous in https://github.com/apache/hudi/pull/18545
- fix(variant): allow VariantType writes through Hudi's V1 DataSource on Spark 4 by @voonhous in https://github.com/apache/hudi/pull/18564
- fix: ProtoConversionUtil$AvroSupport static init under Avro 1.12 by @tiennguyen-onehouse in https://github.com/apache/hudi/pull/18571
- fix: FileGroupReader drops mandatory partition columns from dataSchema by @tiennguyen-onehouse in https://github.com/apache/hudi/pull/18570
- feat: Adding support to inject custom configs to parquet writer by @nsivabalan in https://github.com/apache/hudi/pull/18379
- feat(clean): Adding empty clean support to hudi by @nsivabalan in https://github.com/apache/hudi/pull/18337
- fix(vector): Pass plain FIXED through to VECTOR projection on Hive read by @voonhous in https://github.com/apache/hudi/pull/18582
- fix(clean): address review comments on empty clean support (#18337) by @yihua in https://github.com/apache/hudi/pull/18587
- fix(ci): bump surefire test heap from 3g to 4g by @yihua in https://github.com/apache/hudi/pull/18589
- feat(lance): support simplified path for lance blob inline reading by @rahil-c in https://github.com/apache/hudi/pull/18575
- feat(blob): add support for lance blob inline descriptor reading by @rahil-c in https://github.com/apache/hudi/pull/18586
- feat(ci): enable auto-merge and required status checks on master by @yihua in https://github.com/apache/hudi/pull/18594
- feat(flink): Vendor Flink 2.1 Dremel nested-reader support classes by @skywalker0618 in https://github.com/apache/hudi/pull/18567
- fix(vector): Preserve VECTOR/BLOB metadata on SQL INSERT path by @voonhous in https://github.com/apache/hudi/pull/18540
- feat(spark): add Spark 4.1 support by @yihua in https://github.com/apache/hudi/pull/17674
- fix: Curator class conflict in ZookeeperBasedLockProvider by @ychris78 in https://github.com/apache/hudi/pull/18593
- fix(schema): Allow nested projection on BLOB and VARIANT columns in p… by @voonhous in https://github.com/apache/hudi/pull/18566
- feat: Create JsonKinesisSource by @linliu-code in https://github.com/apache/hudi/pull/18224
- feat(lance): fix lance writer/reader regarding arrow memory limit issue by @rahil-c in https://github.com/apache/hudi/pull/18613
- feat(common): When inferring checkpoint/schema from timeline, check non-ingestion write commits (in case they have metadata rolled-over) by @kbuci in https://github.com/apache/hudi/pull/18576
- feat: Add variant support description to RFC-99 by @voonhous in https://github.com/apache/hudi/pull/18274
- feat(flink): Cherry-pick Flink dynamic bucket streaming and global RLI fixes into release-1.2.0 by @cshuo in https://github.com/apache/hudi/pull/18788
- fix(flink): Trigger a failover after pending instants recommitted for… by @cshuo in https://github.com/apache/hudi/pull/18789
New Contributors
- @XianghuiBai made their first contribution in #14087
- @ratuldawar11 made their first contribution in #14225
- @sumi-mathew made their first contribution in #13358
- @gggyd123 made their first contribution in #13843
- @VahidRamezaniDB made their first contribution in #17645
- @gudladona made their first contribution in #18241
- @jianchun made their first contribution in #18250
- @nada-attia made their first contribution in #18133
- @ZZZxDong made their first contribution in #18263
- @mailtoboggavarapu-coder made their first contribution in https://github.com/apache/hudi/pull/18469
- @chrevanthreddy made their first contribution in https://github.com/apache/hudi/pull/17951
- @aaaZayne made their first contribution in https://github.com/apache/hudi/pull/18533
Full Changelog: release-1.1.0...release-1.2.0