github delta-io/delta v4.2.0
Delta Lake 4.2.0

9 hours ago

We are excited to announce the release of Delta Lake 4.2.0! This release includes significant new features, improved safety and compatibility, and important bug fixes.

Highlights

  • [Spark] Unity Catalog Managed Table enhancements: REPLACE TABLE / RTAS and Dynamic Partition Overwrite support, automatic table schema/properties sync to catalog on table creation.
  • [Spark] Delta Spark V2 connector - streaming read (experimental): enhance streaming read capabilities for catalog-managed table by supporting critical options like startingTimestamp and skipChangeCommits.
  • [Flink] New Kernel-based Flink connector (experimental): a brand-new Kernel-based delta-flink connector that enables Apache Flink to read, write, and interact with catalog-managed Delta tables.
  • [Kernel] Geospatial, Variant GA, and Collations table feature: Delta Kernel can now read and write tables using geometry/geography types with bounding-box data skipping, generally available Variant columns, and collated string types.
  • [Security] The Delta project has undergone a substantial hardening effort across multiple surface areas, including stronger validation and dependency security scanning to proactively reduce supply-chain risk.

Delta Spark

Delta Spark 4.2.0 is built on Apache Spark 4.1.0 and Apache Spark 4.0.1. Similar to Apache Spark, we have released Maven artifacts for Scala 2.13.

The key features of this release are:

  • Delta Spark V2 Streaming read: Enhance streaming read capabilities for catalog-managed tables. The Delta V2 Spark connector now supports key options including startingVersion, startingTimestamp, maxBytesPerTrigger, maxFilesPerTrigger, excludeRegex, skipChangeCommits, ignoreDeletes, ignoreChanges, and ignoreFileDeletion.
  • REPLACE TABLE/RTAS/DPO support: Previously, operations like REPLACE TABLE, RTAS, and Dynamic Partition Overwrite (DPO) were not supported for catalog-managed Delta tables. This change will enable these operations to be a single, atomic action, significantly improving the safety and reliability of users' catalog-managed tables.
  • Server-Side Planning: OAuth Support (preview): The server-side planning client now supports OAuth-based authentication when delegating scan planning to an external catalog server.
  • INSERT BY NAME with Schema Evolution: SQL INSERT ... BY NAME statements now support automatic schema evolution, adding missing columns to the target table when delta.schemaAutoMerge.enabled is set. This brings INSERT BY NAME behavior in line with INSERT SELECT with schema evolution.
  • Force Statistics Collection: A new table property delta.stats.skipping.forceOptimizeStatsCollection enables forcing file statistics collection during query optimization. This ensures accurate data skipping for tables where statistics may be absent or stale, without requiring a manual OPTIMIZE run.
  • Allow CDF Writes for Non-Data-Changing Operations: When Change Data Feed is enabled, write operations that produce no data changes — such as add-only or remove-only commits — are now permitted. This reduces unnecessary write failures in CDC pipelines that perform metadata-only or compaction operations.
  • Variant Type in Schema Conversion: Delta Spark now correctly handles Spark's VariantType during Delta schema conversion, enabling seamless schema operations on tables with Variant columns.

Other notable changes:

Delta Kernel

The Delta Kernel project is a set of Java and Rust libraries for building Delta connectors that can read and write to Delta tables without the need to understand the Delta protocol details.

The key features of this release are:

  • Geospatial table feature: Delta Kernel now supports reading and writing tables with geometry and geography columns, including bounding-box data skipping via the StGeometryBoxesIntersect predicate.
  • Variant GA table feature: The Variant data type is now generally available in Delta Kernel.
  • Lazy schema parsing: Tables with unsupported column types in kernel (e.g. VOID) can now be loaded without error. The schema is only parsed when explicitly accessed, so connectors that don't need the schema can still read metadata, configuration, and other table properties from these tables.
  • Improved CommitInfo compatibility: Tables whose commits were written by external engines that omit engineInfo, operation, or txnId fields can now be read without errors.
  • Add vacuumProtocolCheck to UC-managed table creation: Tables created via Kernel with Unity Catalog now automatically include the vacuumProtocolCheck table feature to ensure proper vacuum behavior.
  • Collations table feature: Protocol-level support for collated string types.

Delta Flink

This release introduces a brand-new Kernel-based delta-flink connector as an experimental feature that enables Apache Flink to read, write, and interact with Catalog-managed Delta tables.

Maven artifacts:

The Delta Flink connector continues to evolve with Kernel-based implementations. The key features of this release are:

  • Catalog-Managed Table Support: The new Flink connector supports reading from and writing to catalog-managed Delta tables.
  • KernelTable Implementation: Delta tables are exposed to Flink as DynamicTableSource / DynamicTableSink instances backed by Delta Kernel, providing a clean integration with Flink's Table API and SQL.
  • Sink Writer and Committer: A fully functional Flink Sink with Writer and Committer ensures reliable, transactionally consistent writes to the Delta table.
  • SQL API Support: Delta tables can now be created and written to via Flink SQL DDL and DML statements, enabling no-code integrations for SQL-based Flink pipelines.

Delta UniForm

Delta UniForm's delta-iceberg and delta-hudi are the modules that automatically keep Apache Iceberg and Apache Hudi metadata in sync with Delta commits, enabling Iceberg and Hudi readers to query Delta tables without data duplication.

Delta UniForm continues to be supported only for Spark 4.0 in this release. For now, both Hudi and Iceberg remain incompatible with Spark 4.1, as support depends on upcoming releases from those projects providing Spark 4.1-compatible integration bundles. This is unchanged from Delta 4.1.0.

Notable changes:

Delta Sharing

Delta Sharing is a Spark DataSource that lets clients run batch, streaming, CDF, and time-travel reads on tables shared via the Delta Sharing protocol. Depends on Spark and the delta-sharing-client library. The 2.13 suffix means Scala 2.13.

The key features of this release are:

Compatibility

  • OPTIMIZE and REORG must be executed through the catalog for catalog-managed tables. They are now blocked when run directly against catalog-managed tables via filesystem paths, consistent with the existing VACUUM restriction introduced in Delta 4.1.
  • Spark 4.0 clients are now blocked from writing to variant tables to prevent data correctness issues.
  • Delta Kernel now supports three new table features: Geospatial, Variant (GA), and Collations. Tables using these features require connectors that understand them.
  • DeltaLog.tableId has been deprecated in favor of DeltaLog.unsafeVolatileTableId. The old accessor still works but will be removed in a future release. Connector and plugin authors that reference DeltaLog.tableId should update.

Credits

Alex Moschos, Andrei Tserakhau, Anoop Johnson, Bilal Akhtar, Brooks Walls, ChengJi, Chirag Singh, Cuong Nguyen, Dhruv Arya, Drake Lin, Eames Trinh, Fokko Driesprong, Gengliang Wang, Hao Jiang, Harsh Motwani, Johan Lasperas, Juliusz Sompolski, Kaiqi Jin, Lars Kroll, Leon Windheuser, Leonid Lygin, Liang-Chi Hsieh, Marko Ilić, Milan Stefanovic, Min Yang, Murali Ramanujam, Omar Elhadidy, Prakhar Jain, Rahul Potharaju, Scott Sandre, Sebastien Biollo, Shlok Jhawar, Tathagata Das, Thang Long Vu, Timothy Wang, Vitalii Li, Wei Luo, Xin Huang, Yi Li, You Zhou, Zhen Li, Zheng Hu, Zhipeng Mao, Zihao Xu, Zikang Han, Ziya Mukhtarov, anniedde, emkornfield, giovanni-sorice, littlegrasscao, openinx, richardc-db, seewishnew, songhang, yyanyy

Don't miss a new delta release

NewReleases is sending notifications on new releases.