github apache/hudi hoodie-0.4.5
Release 0.4.5

latest releases: release-0.14.1, release-0.14.1-rc2, release-0.14.1-rc1...
4 years ago

Highlights

  • Dockerized demo with support for different Hive versions
  • Smoother handling of append log on cloud stores
  • Introducing a global bloom index, that enforces unique constraint across partitions
  • CLI commands to analyze workloads, manage compactions
  • Migration guide for folks wanting to move datasets to Hudi
  • Added Spark Structured Streaming support, with a Hudi sink
  • In-built support for filtering duplicates in DeltaStreamer
  • Support for plugging in custom transformation in DeltaStreamer
  • Better support for non-partitioned Hive tables
  • Support hard deletes for Merge on Read storage
  • New slack url & site urls
  • Added presto bundle for easier integration
  • Tons of bug fixes, reliability improvements

Full PR List

  • @bhasudha - Create hoodie-presto bundle jar. fixes #567 #571
  • @bhasudha - Close FSDataInputStream for meta file open in HoodiePartitionMetadata . Fixes issue #573 #574
  • @yaoqinn - handle no such element exception in HoodieSparkSqlWriter #576
  • @vinothchandar - Update site url in README
  • @yaooqinn - typo: bundle jar with unrecognized variables #570
  • @bvaradar - Table rollback for inflight compactions MUST not delete instant files at any time to avoid race conditions #565
  • @bvaradar - Fix Hoodie Record Reader to work with non-partitioned dataset ( ISSUE-561) #569
  • @bvaradar - Hoodie Delta Streamer Features : Transformation and Hoodie Incremental Source with Hive integration #485
  • @vinothchandar - Updating new slack signup link #566
  • @yaooqinn - Using immutable map instead of mutables to generate parameters #559
  • @n3nash - Fixing behavior of buffering in Create/Merge handles for invalid/wrong schema records #558
  • @n3nash - cleaner should now use commit timeline and not include deltacommits #539
  • @n3nash - Adding compaction to HoodieClient example #551
  • @n3nash - Filtering partition paths before performing a list status on all partitions #541
  • @n3nash - Passing a path filter to avoid including folders under .hoodie directory as partition paths #548
  • @n3nash - Enabling hard deletes for MergeOnRead table type #538
  • @msridhar - Add .m2 directory to Travis cache #534
  • @artem0 - General enhancements #520
  • @bvaradar - Ensure Hoodie works for non-partitioned Hive table #515
  • @xubo245 - fix some spell errorin Hudi #530
  • @leletan - feat(SparkDataSource): add structured streaming sink #486
  • @n3nash - Serializing the complete payload object instead of serializing just the GenericRecord in HoodieRecordConverter #495
  • @n3nash - Returning empty Statues for an empty spark partition caused due to incorrect bin packing #510
  • @bvaradar - Avoid WriteStatus collect() call when committing batch to prevent Driver side OOM errors #512
  • @vinothchandar - Explicitly handle lack of append() support during LogWriting #511
  • @n3nash - Fixing number of insert buckets to be generated by rounding off to the closest greater integer #500
  • @vinothchandar - Enabling auto tuning of insert splits by default #496
  • @bvaradar - Useful Hudi CLI commands to debug/analyze production workloads #477
  • @bvaradar - Compaction validate, unschedule and repair #481
  • @shangxinli - Fix addMetadataFields() to carry over 'props' #484
  • @n3nash - Adding documentation for migration guide and COW vs MOR tradeoffs #470
  • @leletan - Add additional feature to drop later arriving dups #468
  • @bvaradar - Fix regression bug which broke HoodieInputFormat handling of non-hoodie datasets #482
  • @vinothchandar - Add --filter-dupes to DeltaStreamer #478
  • @bvaradar - A quickstart demo to showcase Hudi functionalities using docker along with support for integration-tests #455
  • @bvaradar - Ensure Hoodie metadata folder and files are filtered out when constructing Parquet Data Source #473
  • @leletan - Adds HoodieGlobalBloomIndex #438

Don't miss a new hudi release

NewReleases is sending notifications on new releases.