delta-io/delta v4.2.0 on GitHub

We are excited to announce the release of Delta Lake 4.2.0! This release includes significant new features, improved safety and compatibility, and important bug fixes.

Highlights

[Spark] Unity Catalog Managed Table enhancements: REPLACE TABLE / RTAS and Dynamic Partition Overwrite support, automatic table schema/properties sync to catalog on table creation.
[Spark] Delta Spark V2 connector - streaming read (experimental): enhance streaming read capabilities for catalog-managed table by supporting critical options like startingTimestamp and skipChangeCommits.
[Flink] New Kernel-based Flink connector (experimental): a brand-new Kernel-based delta-flink connector that enables Apache Flink to read, write, and interact with catalog-managed Delta tables.
[Kernel] Geospatial, Variant GA, and Collations table feature: Delta Kernel can now read and write tables using geometry/geography types with bounding-box data skipping, generally available Variant columns, and collated string types.
[Security] The Delta project has undergone a substantial hardening effort across multiple surface areas, including stronger validation and dependency security scanning to proactively reduce supply-chain risk.

Delta Spark

Delta Spark 4.2.0 is built on Apache Spark 4.1.0 and Apache Spark 4.0.1. Similar to Apache Spark, we have released Maven artifacts for Scala 2.13.

Maven artifacts for Spark 4.1.0:
Maven artifacts for Spark 4.0.1:
Backward compatibility artifacts - no spark version in name and work with Spark 4.1.0:
Python artifacts: https://pypi.org/project/delta-spark/4.2.0/

The key features of this release are:

Delta Spark V2 Streaming read: Enhance streaming read capabilities for catalog-managed tables. The Delta V2 Spark connector now supports key options including startingVersion, startingTimestamp, maxBytesPerTrigger, maxFilesPerTrigger, excludeRegex, skipChangeCommits, ignoreDeletes, ignoreChanges, and ignoreFileDeletion.
REPLACE TABLE/RTAS/DPO support: Previously, operations like REPLACE TABLE, RTAS, and Dynamic Partition Overwrite (DPO) were not supported for catalog-managed Delta tables. This change will enable these operations to be a single, atomic action, significantly improving the safety and reliability of users' catalog-managed tables.
Server-Side Planning: OAuth Support (preview): The server-side planning client now supports OAuth-based authentication when delegating scan planning to an external catalog server.
INSERT BY NAME with Schema Evolution: SQL INSERT ... BY NAME statements now support automatic schema evolution, adding missing columns to the target table when delta.schemaAutoMerge.enabled is set. This brings INSERT BY NAME behavior in line with INSERT SELECT with schema evolution.
Force Statistics Collection: A new table property delta.stats.skipping.forceOptimizeStatsCollection enables forcing file statistics collection during query optimization. This ensures accurate data skipping for tables where statistics may be absent or stale, without requiring a manual OPTIMIZE run.
Allow CDF Writes for Non-Data-Changing Operations: When Change Data Feed is enabled, write operations that produce no data changes — such as add-only or remove-only commits — are now permitted. This reduces unnecessary write failures in CDC pipelines that perform metadata-only or compaction operations.
Variant Type in Schema Conversion: Delta Spark now correctly handles Spark's VariantType during Delta schema conversion, enabling seamless schema operations on tables with Variant columns.

Other notable changes:

Fix: Variant Stats Preservation During DML with Deletion Vectors: Fixed a bug where Variant column statistics were silently dropped during UPDATE and MERGE operations when Deletion Vectors were enabled on the table, for Spark 4.1.
MERGE/INSERT Struct Null Expansion Fix Now Default: The fix for incorrect struct null field expansion in MERGE and INSERT operations (previously opt-in via a configuration flag) is now enabled by default for all users.
File Size Histogram in Version Checksum: Version checksum files now record a histogram of data file sizes, enabling richer table health and compaction diagnostics.
Fix: Timestamp Overflow in Data Skipping: Fixed a bug where timestamp column values near the type's maximum value could overflow during data skipping computation, causing files to be incorrectly included or excluded.
Fix: Decimal IN Predicate Crash: Fixed a crash when pushing down an IN predicate containing BigDecimal values where the precision is less than the scale.
Fix: CDC Non-Constant Expression Detection: Fixed incorrect detection of correlated subquery expressions in Change Data Feed non-constant argument validation, which could cause valid queries to be rejected.
Sanity Check for Zero-Byte Parquet Files on Commit: Delta now rejects commits that contain zero-byte Parquet files, preventing subtle table corruption from incomplete or failed writes.
Conflict Detection for Concurrent Feature Additions: When two concurrent transactions attempt to add conflicting table protocol features simultaneously, Delta now detects and rejects the conflict, preventing protocol inconsistencies.

Delta Kernel

The Delta Kernel project is a set of Java and Rust libraries for building Delta connectors that can read and write to Delta tables without the need to understand the Delta protocol details.

Maven artifacts:

The key features of this release are:

Geospatial table feature: Delta Kernel now supports reading and writing tables with geometry and geography columns, including bounding-box data skipping via the StGeometryBoxesIntersect predicate.
Variant GA table feature: The Variant data type is now generally available in Delta Kernel.
Lazy schema parsing: Tables with unsupported column types in kernel (e.g. VOID) can now be loaded without error. The schema is only parsed when explicitly accessed, so connectors that don't need the schema can still read metadata, configuration, and other table properties from these tables.
Improved CommitInfo compatibility: Tables whose commits were written by external engines that omit engineInfo, operation, or txnId fields can now be read without errors.
Add vacuumProtocolCheck to UC-managed table creation: Tables created via Kernel with Unity Catalog now automatically include the vacuumProtocolCheck table feature to ensure proper vacuum behavior.
Collations table feature: Protocol-level support for collated string types.

Delta Flink

This release introduces a brand-new Kernel-based delta-flink connector as an experimental feature that enables Apache Flink to read, write, and interact with Catalog-managed Delta tables.

Maven artifacts:

delta-flink

The Delta Flink connector continues to evolve with Kernel-based implementations. The key features of this release are:

Catalog-Managed Table Support: The new Flink connector supports reading from and writing to catalog-managed Delta tables.
KernelTable Implementation: Delta tables are exposed to Flink as DynamicTableSource / DynamicTableSink instances backed by Delta Kernel, providing a clean integration with Flink's Table API and SQL.
Sink Writer and Committer: A fully functional Flink Sink with Writer and Committer ensures reliable, transactionally consistent writes to the Delta table.
SQL API Support: Delta tables can now be created and written to via Flink SQL DDL and DML statements, enabling no-code integrations for SQL-based Flink pipelines.

Delta UniForm

Delta UniForm's delta-iceberg and delta-hudi are the modules that automatically keep Apache Iceberg and Apache Hudi metadata in sync with Delta commits, enabling Iceberg and Hudi readers to query Delta tables without data duplication.

Maven artifacts:
- delta-iceberg_2.13
- delta-hudi_2.13

Delta UniForm continues to be supported only for Spark 4.0 in this release. For now, both Hudi and Iceberg remain incompatible with Spark 4.1, as support depends on upcoming releases from those projects providing Spark 4.1-compatible integration bundles. This is unchanged from Delta 4.1.0.

Notable changes:

Synchronous UniForm metadata generation and Unity Catalog Integration: UniForm metadata generation is currently asynchronous which leads to consistency issues for downstream clients. This design moves Iceberg metadata generation directly into the commit transaction rather than a post-commit hook to ensure consistency.
Deprecation of HMS Support: HMS does not support catalog-managed tables. Given the improvements to make metadata generation synchronous, UniForm is moving away from support for tables managed by the legacy Hive Metastore (HMS).
- Use UnityCatalog as Iceberg committing catalog instead of HiveCatalog for Delta UniForm

Delta Sharing

Delta Sharing is a Spark DataSource that lets clients run batch, streaming, CDF, and time-travel reads on tables shared via the Delta Sharing protocol. Depends on Spark and the delta-sharing-client library. The 2.13 suffix means Scala 2.13.

Maven artifacts for Spark 4.1.0:
- delta-sharing-spark_4.1_2.13
Maven artifacts for Spark 4.0.1:
- delta-sharing-spark_4.0_2.13
Backward compatibility artifacts - no spark version in name and work with Spark 4.1.0:
- delta-sharing-spark_2.13

The key features of this release are:

Auto-resolve response format for streaming: Delta Sharing streaming now automatically detects whether the server supports Delta-format responses and chooses the best streaming source accordingly, removing the need for manual configuration.
Improved streaming offset tracking: Streaming queries on shared tables now use DeltaSharingSourceOffset for more accurate offset tracking across restarts.
Enable variant shredding in sharing client: Allow reading tables with the variantShredding table feature in the Sharing client.

Compatibility

OPTIMIZE and REORG must be executed through the catalog for catalog-managed tables. They are now blocked when run directly against catalog-managed tables via filesystem paths, consistent with the existing VACUUM restriction introduced in Delta 4.1.
Spark 4.0 clients are now blocked from writing to variant tables to prevent data correctness issues.
Delta Kernel now supports three new table features: Geospatial, Variant (GA), and Collations. Tables using these features require connectors that understand them.
DeltaLog.tableId has been deprecated in favor of DeltaLog.unsafeVolatileTableId. The old accessor still works but will be removed in a future release. Connector and plugin authors that reference DeltaLog.tableId should update.

Credits

Alex Moschos, Andrei Tserakhau, Anoop Johnson, Bilal Akhtar, Brooks Walls, ChengJi, Chirag Singh, Cuong Nguyen, Dhruv Arya, Drake Lin, Eames Trinh, Fokko Driesprong, Gengliang Wang, Hao Jiang, Harsh Motwani, Johan Lasperas, Juliusz Sompolski, Kaiqi Jin, Lars Kroll, Leon Windheuser, Leonid Lygin, Liang-Chi Hsieh, Marko Ilić, Milan Stefanovic, Min Yang, Murali Ramanujam, Omar Elhadidy, Prakhar Jain, Rahul Potharaju, Scott Sandre, Sebastien Biollo, Shlok Jhawar, Tathagata Das, Thang Long Vu, Timothy Wang, Vitalii Li, Wei Luo, Xin Huang, Yi Li, You Zhou, Zhen Li, Zheng Hu, Zhipeng Mao, Zihao Xu, Zikang Han, Ziya Mukhtarov, anniedde, emkornfield, giovanni-sorice, littlegrasscao, openinx, richardc-db, seewishnew, songhang, yyanyy

delta-io/delta v4.2.0 Delta Lake 4.2.0 on GitHub

Highlights

Delta Spark

Delta Kernel

Delta Flink

Delta UniForm

Delta Sharing

Compatibility

Credits

delta-io/delta v4.2.0
Delta Lake 4.2.0

on GitHub