We are excited to announce the release of Delta Lake 2.0.1 on Apache Spark 3.2. This release contains important bug fixes to 2.0.0 and it is recommended that users update to 2.0.1. Similar to Apache Spark™, we have released Maven artifacts for both Scala 2.12 and Scala 2.13.
- Documentation: https://docs.delta.io/2.0.1/index.html
- Maven artifacts: delta-core_2.12, delta-core_2.13, delta-contribs_2.12 delta_contribs_2.13, delta-storage, delta-storage-s3-dynamodb
- Python artifacts: https://pypi.org/project/delta-spark/2.0.1/
This release includes the following bug fixes and improvements:
- Fix for a bug in the DynamoDB-based S3 multi-cluster mode configuration. The previous version wrote an incorrect timestamp which was used by DynamoDB’s TTL feature to cleanup expired items. This timestamp value has been fixed and the table attribute renamed from
commitTime
toexpireTime
. If you already have TTL enabled, please follow the migration steps here. - Fix a duplicate CDF rows issue in some cases in MERGE operation.
- Fix for accidental protocol downgrades with RESTORE command. Until now, RESTORE TABLE may downgrade the protocol version of the table, which could have resulted in inconsistent reads with time travel. With this fix, the protocol version is never downgraded from the current one.
- Improve performance of the DELETE command by optimizing the step to search touched files to trigger column pruning.
- Fix for NotSerializableException when running RESTORE command in Spark SQL with Hadoop2.
- Fix incorrect stats collection issue in data skipping stats tracker.