We are excited to announce the preview release of Delta Lake 2.4.0 on Apache Spark 3.4. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
- Documentation: https://docs.delta.io/2.4.0rc1/
- Maven artifacts: https://oss.sonatype.org/content/repositories/iodelta-1080
- Python artifacts: https://test.pypi.org/project/delta-spark/2.4.0rc1/
The key features in this release are as follows:
- Support for Apache Spark 3.4.
- Support writing Deletion Vectors for the
DELETEcommand. Previously, when deleting rows from a Delta table, any file with at least one matching row would be rewritten. With Deletion Vectors these expensive rewrites can be avoided. See What are deletion vectors? for more details.
- Support for all write operations on tables with Deletion Vectors enabled.
PURGEto remove Deletion Vectors from the current version of a Delta table by rewriting any data files with deletion vectors. See the documentation for more details.
- Support reading Change Data Feed for tables with Deletion Vectors enabled.
REPLACE WHEREexpressions in SQL to selectively overwrite data. Previously “replaceWhere” options were only supported in the DataFrameWriter APIs.
WHEN NOT MATCHED BY SOURCEclauses in SQL for the Merge command.
- Support omitting generated columns from the column list for SQL
INSERT INTOqueries. Delta will automatically generate the values for any unspecified generated columns.
- Support the
TimestampNTZdata type added in Spark 3.3. Using
TimestampNTZrequires a Delta protocol upgrade; see the documentation for more information.
- Other notable changes
- Increased resiliency for S3 multi-cluster reads and writes.
- Allow changing the column type of a
varcharcolumn to a compatible type in the
ALTER TABLEcommand. The new behavior is the same as in Apache Spark and allows upcasting from
- Block using
overwriteSchemawith dynamic partition overwrite. This can corrupt the table as not all the data may be removed, and the schema of the newly written partitions may not match the schema of the unchanged partitions.
- Return an empty
DataFramefor Change Data Feed reads when there are no commits within the timestamp range provided. Previously an error would be thrown.
- Fix a bug in Change Data Feed reads for records created during the ambiguous hour when daylight savings occurs.
- Fix a bug where querying an external Delta table at the root of an S3 bucket would throw an error.
- Remove leaked internal Spark metadata from the Delta log to make any affected tables readable again.
Note: the Delta Lake 2.4.0 release does not include the Iceberg to Delta converter because
iceberg-spark-runtime does not support Spark 3.4 yet. The Iceberg to Delta converter is still supported when using Delta 2.3 with Spark 3.3.
How use the preview release
For this preview we have published the artifacts to a staging repository. Here’s how you can use them:
- spark-submit: Add
–-repositories https://oss.sonatype.org/content/repositories/iodelta-1080/to the command line arguments. For example:
spark-submit --packages io.delta:delta-core_2.12:2.4.0rc1 --repositories https://oss.sonatype.org/content/repositories/iodelta-1080/ examples/examples.py
- Currently Spark shells (PySpark and Scala) do not accept the external repositories option. However, once the artifacts have been downloaded to the local cache, the shells can be run with Delta 2.4.0rc1 by just providing the
- Maven project:
<repositories> <repository> <id>staging-repo</id> <url> https://oss.sonatype.org/content/repositories/iodelta-1080/</url> </repository> </repositories> <dependency> <groupId>io.delta</groupId> <artifactId>delta-core_2.12</artifactId> <version>2.4.0rc1</version> </dependency>
- SBT project:
libraryDependencies += "io.delta" %% "delta-core" % "2.4.0rc1" resolvers += "Delta" at https://oss.sonatype.org/content/repositories/iodelta-1080/
pip install -i https://test.pypi.org/simple/ delta-spark==2.4.0rc1
Alkis Evlogimenos, Allison Portis, Andreas Chatzistergiou,
Anton Okolnychyi, Bart Samwel, Bo Gao, Carl Fu, Chaoqin Li, Christos Stavrakakis, David Lewis, Desmond Cheong, Dhruv Shah, Eric Maynard, Fred Liu, Fredrik Klauss, Haejoon Lee, Hussein Nagree, Jackie Zhang, Jintian Liang, Johan Lasperas, Lars Kroll, Lukas Rupprecht, Matthew Powers, Ming DAI, Ming Dai, Naga Raju Bhanoori, Paddy Xu, Prakhar Jain, Rahul Shivu Mahadev, Rui Wang, Ryan Johnson, Sabir Akhadov, Satya Valluri, Scott Sandre, Shixiong Zhu, Tom van Bussel, Venki Korukanti, Vitalii Li, Wenchen Fan, Xi Liang, Yaohua Zhao, Yuming Wang