We are excited to announce the release of Delta Lake 0.8.0, which introduces the following key features.
-
Unlimited
MATCHED
andNOT MATCHED
clauses for merge operations in Scala, Java, and Python - merge operations now support any number ofwhenMatched
andwhenNotMatched
clauses. In addition, merge queries that unconditionally delete matched rows no longer throw errors on multiple matches. See the documentation for details. -
MERGE operation now supports schema evolution of nested columns - Schema evolution of nested columns now has the same semantics as that of top-level columns. For example, new nested columns can be automatically added to a StructType column. See Automatic schema evolution in Merge for details.
-
MERGE INTO and UPDATE operations now resolve nested struct columns by name - Update operations UPDATE and MERGE INTO commands now resolve nested struct columns by name. That is, when comparing or assigning columns of type StructType, the order of the nested columns does not matter (exactly in the same way as the order of top-level columns). To revert to resolving by position, set the Spark configuration
”spark.databricks.delta.resolveMergeUpdateStructsByName.enabled”
to”false”
. -
Check constraints on Delta tables - Delta now supports
CHECK
constraints. When supplied, Delta automatically verifies that data added to a table satisfies the specified constraint expression. To addCHECK
constraints, use theALTER TABLE ADD CONSTRAINTS
command. See the documentation for details. -
Start streaming a table from a specific version (#474) - When using Delta as a streaming source, you can use the options
startingTimestamp
orstartingVersion
to start processing the table from a given version and onwards. You can also setstartingVersion
tolatest
to skip existing data in the table and stream from the new incoming data. See the documentation for details. -
Ability to perform parallel deletes with VACUUM (#395) - When using
VACUUM
, you can set the session configuration"spark.databricks.delta.vacuum.parallelDelete.enabled"
to“true”
in order to use Spark to perform the deletion of files in parallel (based on the number of shuffle partitions). See the documentation for details. -
Use Scala implicits to simplify read and write APIs - You can import
io.delta.implicits._
to use thedelta
method with Spark read and write APIs such asspark.read.delta(“/my/table/path”)
. See the documentation for details.
Credits
Adam Binford, Alan Jin, Alex liu, Ali Afroozeh, Andrew Fogarty, Burak Yavuz, David Lewis, Gengliang Wang, HyukjinKwon, Jacek Laskowski, Jose Torres, Kian Ghodoussi, Linhong Liu, Liwen Sun, Mahmoud Mahdi, Maryann Xue, Michael Armbrust, Mike Dias, Pranav Anand, Rahul Mahadev, Scott Sandre, Shixiong Zhu, Stephanie Bodoff, Tathagata Das, Wenchen Fan, Wesley Hoffman, Xiao Li, Yijia Cui, Yuanjian Li, Zach Schuermann, contrun, ekoifman, Yi Wu