We are excited to announce the release of Delta Lake 4.3.0, which delivers new features, performance improvements, and protocol updates across Delta Spark, Kernel, UniForm, Sharing, and Flink. See the highlights below for the marquee changes.
Highlights
- [Spark] Unity Catalog Delta REST API integration: Spark now supports the UC Delta API, using Unity Catalog as the source of truth for managed Delta tables. With server-side commit validation, server-advertised table features, and intent-based metadata updates, this integration provides consistent, safe access for Spark today and sets the foundation for future support across Flink, Trino, and other engines.
- [Spark] Selective data replacement with
replaceOnandreplaceUsingDataFrame APIs: Spark now supports selectively replacing table data with the result of a DataFrame. UsereplaceUsingto replace rows that match on specified columns, orreplaceOnto replace rows that satisfy a user-defined condition. - [UniForm] Atomic + incremental Iceberg conversion and Spark 4.1 support: UniForm now writes Iceberg metadata atomically with the Delta commit, incrementally converts only the changed log range.
- [Sharing] Streaming and CDF Support: Delta Sharing in 4.3.0 improves Spark Structured Streaming and Batch CDF support with automatic Delta response resolution, Parquet-to-Delta streaming conversion, shared DeltaFormattable streaming CDF support, and Trigger.AvailableNow support for shared tables.
Delta Spark
Delta Spark 4.3.0 is built on Apache Spark 4.1.0 and Apache Spark 4.0.1. As with Apache Spark, we publish Maven artifacts for Scala 2.13.
- Maven artifacts for Spark 4.1.0:
- Maven artifacts for Spark 4.0.1:
- Backward-compatibility artifacts (no Spark version in the name; default to Spark 4.1.0):
- Python artifact:
The key features of this release are:
- Unity Catalog Delta REST API integration : Delta Spark now uses the new UC Delta REST API by default for UC-managed Delta tables. Managed Delta operations, including table loads, CREATE / CTAS, REPLACE, and all other metadata-changing writes such as DML, schema evolution, auto-merge, and supported
ALTER TABLEupdates, are routed through the new API. Non-Delta tables and external tables, including name-based and path-based access, continue to use the legacy delegate. - Delta DSv2 Connector with Delta Kernel (Experimental) : adds new features like support for batch writes to Catalog-Managed tables, Streaming source support including all read options and Catalog-driven batch CDC (SELECT … CHANGES FROM VERSION/TIMESTAMP, DV-aware) - gated behind spark.databricks.delta.changelogV2.enabled.
- V2 Checkpoint performance hardening for large tables: V2 checkpoints now default to 50,000 actions per sidecar, so sidecar files are automatically split into multiple parts and checkpoint writes parallelize better out of the box.
- Selectively replace data with 'replaceOn' and 'replaceUsing' DataFrame APIs: Use these options to replace part of the table with the result of a DataFrame. ‘replaceOn’ replaces rows that match a user-defined condition. ‘replaceUsing’ replaces rows where specified columns are equal. See Delta Lake API doc.
- Implicit casting for DataFrame by-name writes: DataFrame writes that match by column name, except save() and saveAsTable().mode(“overwrite”), now apply Spark's implicit casts to align source values with the target schema, matching SQL INSERT BY NAME behavior.
- Variant column statistics on write: Delta Spark now collects min/max statistics for Variant columns at write time, enabling data skipping on Variant-shredded tables.
- REPLACE TABLE / RTAS / DPO production hardening: concurrency, source-materialization, and operational-metric coverage now spans the full set of REPLACE-style DataFrame writes introduced in Delta 4.2.
Other notable features and bug-fixes include:
- Delta DSv2 streaming reads:
- DV-enabled / catalog-managed correctness: streaming over NULLs and complex types (ARRAY / MAP / STRUCT / VARIANT) , DELETE with VARIANT / INTERVAL / ARRAY columns and deletion vectors functions are handled correctly.
- Schema ordering & timestamp resolution: partition columns declared mid-schema (e.g. (id, part, col3)) and stream restart on those tables are handled correctly. A bug fix in startingTimestamp on mid-history ICT tables now resolves to the correct commit.
- MERGE / DELETE — clearer errors, consistent metrics: MERGE INTO an empty-schema target with mergeSchema=false now errors out. Case-variant duplicate columns in MERGE … INSERT on generated/identity tables raise AnalysisException instead of an internal assertion. DELETE matching no rows reports 0 instead of None and replaceWhere / replaceOn / replaceUsing reject subquery predicates.
- Catalog-managed & commit-protocol hardening: REPLACE-style enablement of catalogManaged on a non-catalog-managed table throws DELTA_REPLACE_TABLE_WITH_CATALOG_MANAGED_NOT_SUPPORTED (SQLSTATE 0A000); V2 Checkpoint enablement is conflict-free with concurrent transactions; optional duplicate-action sanity check via DELTA_DUPLICATE_ACTION_CHECK_ENABLED.
- Other correctness fixes: Dynamic Partition Overwrite on mixed-format timestamp partitions no longer loses data (optimistic transactions now use the session timezone consistently); byte / short → decimal widening uses precision ≥ 10; OPTIMIZE on timestamp partitions normalizes values before binning.
Delta Kernel
The Delta Kernel project is a set of Java libraries for building Delta connectors that read and write Delta tables without needing to understand the Delta protocol directly.
- Maven artifacts:
The key features of this release are:
- Incremental version-checksum construction: tableSizeBytes and numFiles are updated incrementally instead of via full log replay, making table-health diagnostics fast on large tables.
- Open tables with a missing _last_checkpoint: Kernel falls back to a log-scan replay when the checkpoint pointer is absent or stale, so tables whose _last_checkpoint was lost or never written can still be loaded.
- Typed clustering descriptors via Snapshot.getClusteringColumnInfos(): connectors get typed clustering info directly from the snapshot — no more parsing domainMetadata blobs.
- Write-context for column-mapping-enabled tables: getWriteContext now works on tables with column mapping, unblocking writes against renamed / dropped-column tables.
- delta.parquet.compression.codec at table creation: the table property is honored on CREATE, so connectors can pick a Parquet codec without engine-specific glue.
- Geospatial + IcebergCompatV3 / WriterCompatV3: Kernel can write tables that combine the Geospatial table feature with the new IcebergCompatV3.
- Staged-commit validation API for UC integrations: a new parseCommitFile utility lets UC-integrated connectors validate staged commit files before publishing them.
Other notable changes include:
- Log-replay & commit correctness: last-checkpoint search across windows no longer abandons the first window on .crc files and misses valid checkpoints; DeletionVectorDescriptor.getUniqueId() emits the protocol-correct
u<path>@4instead ofu<path>@Optional[4], fixing DV-deduplication mismatches between Java Kernel and Spark/Scala writers. - Engine integration: Kernel's EngineException is no longer double-wrapped, producing cleaner stack traces; thread interrupts in ActionsIterator now throw UncheckedIOException(InterruptedIOException) so engines like Spark recognize them as clean shutdown instead of a spurious StreamingQueryException.
- Geospatial / Variant alignment: GeographyType edge-interpolation algorithm parsing is now case-insensitive, matching the Geospatial spec; variantType is treated as a required dependency of variantShredding, preventing inconsistent table protocols; delta.enableVariantShredding=true auto-enables variantShredding-preview, keeping Java Kernel aligned with Delta Spark and the Rust delta-kernel.
Delta UniForm
Delta UniForm's delta-iceberg and delta-hudi modules automatically keep Apache Iceberg and Apache Hudi metadata in sync with Delta commits, so Iceberg and Hudi readers can query Delta tables without data duplication.
- Maven artifacts:
UniForm now supports both Spark 4.0 and Spark 4.1 with Iceberg-spark 1.11.0.
The key features of this release are:
- Atomic + incremental Iceberg conversion: large commits are atomically converted into Iceberg metadata as part of the Delta transaction, closing the consistency gap on bulk-commit paths; end-to-end incremental conversion regenerates only the changed Delta-log range instead of the full snapshot on every commit.
- Spark 4.1 support: UniForm is built against Iceberg-spark 1.11.0 for both Spark 4.0 and Spark 4.1.
- Per-column Iceberg manifest stats for non-column-mapping tables: manifest stats are emitted per column for Delta tables without column mapping, improving query planning for Iceberg readers.
Other notable changes include:
- Iceberg metadata correctness: DataFile.recordCount reports the physical (pre-DV) row count instead of the logical count, restoring Iceberg's position-oriented invariant and re-aligning row-lineage with Delta's baseRowId (see Compatibility); nested field IDs are reused on overwriteSchema=true writes with unchanged schema, avoiding spurious Iceberg current-schema-id / last-column-id bumps for downstream tooling.
- Iceberg to Delta conversion correctness: duplicate Delta column-mapping IDs are no longer produced when converting Iceberg identity partitions whose spec field name differs from the source column; duplicate-column detection in IcebergTable is now correct.
Delta Sharing
Delta Sharing is a Spark DataSource that lets clients run batch, streaming, CDF, and time-travel reads on tables shared via the Delta Sharing protocol. Built on Spark and the delta-sharing-client library; the 2.13 suffix indicates Scala 2.13.
- Maven artifacts for Spark 4.1.0:
- Maven artifacts for Spark 4.0.1:
- Backward-compatibility artifacts (no Spark version in the name; default to Spark 4.1.0):
The key features of this release are:
- CDF streaming + Trigger.AvailableNow for Delta Format Sharing: streaming queries on shared Delta-format tables can now read Change Data Feed (offset management aligned with the underlying Delta source) and run backfill-then-stop pipelines via Trigger.AvailableNow.
- delta-sharing-client upgraded to 1.4.0: picks up the latest fixes and protocol updates from the Delta Sharing client.
Delta Flink
The Kernel-based delta-flink connector (experimental) continues to evolve.
- Maven artifact:
The key features of this release are:
- Writer and Committer metrics: the Flink Sink reports throughput and latency metrics, making catalog-managed-table writes observable in Flink dashboards.
Protocol
- delta.parquet.format.version table property: a new optional table property that selects the Parquet data-page format. Valid values are 1.0.0 (DataPageV1) and any 2.x.x (DataPageV2); writers default to 1.0.0 when absent. Readers do not need to consult this property — Parquet pages are self-describing — but for cross-engine interoperability we recommend 1.0.0. The property covers both data and checkpoint files.
- Variant Shredding RFC accepted, with a clarification that Variant stats paths do not need to be present in both minValues and maxValues.
- Materialize Partition Columns RFC updated with a dedicated table property and clarified stats semantics.
Compatibility
Note: Review this section carefully before upgrading to delta 4.3.0.
- MERGE INTO an empty-schema target now errors out with mergeSchema=false: previously this could silently rewrite the target with the source schema. To restore the old behavior, set delta.schemaAutoMerge.enabled=true or align the schemas before MERGE.
- Iceberg DataFile.recordCount is now physical (pre-DV) on UniForm + DV tables: not the logical (post-DV) count. Tooling that read it as a logical count must apply the deletion vector to recover that number. Re-aligns Iceberg row-lineage (first_row_id / nextRowId) with Delta's baseRowId.
- Kernel: enableVariantShredding enables variantShredding-preview: keeps Java Kernel aligned with Delta Spark and the Rust delta-kernel. To opt into the Production-Ready feature, enable it explicitly via delta.feature.variantShredding=supported.
- variantType will become required for variantShredding after RFC ratification: enforcement is gated behind a future Production-Ready flip; until then, variantShredding-preview tables are unaffected. New writers should already include both features.
Credits
Alden Lau, Alex Moschos, Amogh Jahagirdar, Anshul Baliga, Bilal Akhtar, Brooks Walls, ChengJi, Chirag Singh, Cuong Nguyen, Dhruv Arya, Divjot Arora, Eames Trinh, Eduard Tudenhoefner, Felipe Pessoto, GH-JamesD, Hao Jiang, Harsh Motwani, Hua Shi, Johan Lasperas, Kaiqi Jin, Leon Windheuser, Leonid Lygin, Liang-Chi Hsieh, Marco Kroll, Matthis Gördel, Murali Ramanujam, Omar Elhadidy, Pratham Manja, Sandro Sp, Sanuj Basu, Scott Sandre, Shivam Tiwari, Stevo Mitric, Thang Long Vu, Timothy Wang, Tom Zhu, Uros Bojanic, Wei Luo, Xin Huang, Yi Li, You Zhou, Yousof Hosny, Zihao Xu, Zikang Han, anniedde, littlegrasscao, Zheng Hu, Rakesh Veeramacheneni, Vishnu Chandrashekhar, songhang, yyanyy