github vmware/versatile-data-kit v1.2
Versatile Data Kit 1.2

latest releases: v1.4, v1.3
8 months ago

Major features include:

New Github Landing Page

The new landing page of our open-source project. The new landing page aims to allow users to see and understand what is VDK and what they can do with VDK much easier by showing them.

Check it out at https://github.com/vmware/versatile-data-kit

Control Service improvements

Operators can set builder image per Python version

Operator can easily control the image of

  • The operator-managed VDK (system) library,
  • The base image used to build the user data job
  • And now the builder image with which the user data job is build
deploymentSupportedPythonVersions:
3.9:
     baseImage: "registry.hub.docker.com/versatiledatakit/data-job-base-python-3.7:latest"
     vdkImage: "registry.hub.docker.com/versatiledatakit/quickstart-vdk:release"
     builderImage: "registry.hub.docker.com/versatiledatakit/job-builder:latest"

More information can be found in the Control Service Helm Chart documentation

Operator can configured to automatically ignore files on deploy

When users deploy job operator can control which files are actually accepted and either return error or simply ignore them:
This allows much better security while also allowing flexibility of operators to change without impacting users directly:

# Instead to allow only sql and ini text files specify "text/x-sql,text/x-ini"
# Full list of file types are documented in https://tika.apache.org
# If set to empty, then all file types are allowed.
uploadValidationFileTypesAllowList: ""

# List of file extensions that are allowed to be uploaded. Comma separated list e.g: "py,csv,sql"
# only files with extensions that are present in this list will be allowed to be uploaded.
# if the list is empty all extensions are allowed.
uploadValidationFileExtensionsAllowList: ""

# Works as the uploadValidationFileTypesAllowList above, only it deletes the files instead of failing
# the job upload. Runs before the allow list, therefore if only files of the same types are present in
# both lists, job upload will succeed.
uploadValidationFileTypesFilterList: ""

# List of file extensions that are automatically deleted from data job source code before upload.
# Comma separated list e.g: "pyc,exe,sh". If the list is empty no files will be deleted.
# Files are first deleted before the allow list performs its checks.
uploadValidationFileExtensionsFilterList: ""

More information can be found in the Control Service Helm Chart documentation

New initiative: VDK Run Logs: Simplified And Readable

Take a look at the VEP which would simplify troubleshooting and development using VDK .

We are focused on those goals:

  • Data job run logs provide progress-tracking information
  • User logs stand out
  • Long-running operations (like DAGs) are traceable in the logs
  • The root cause is immediately visible from the logs.
  • Clean Error Handling

Versatile Data Kit Architecture.md

Design architecture of Versatile Data Kit outlining all main interfaces and how they work can be seen at architecture.md

Notebook UI improvements

Add UI element indicating a VDK operation is running

Provides visual feedback to the user when a VDK operation is in progress.

Status button Hover

Add icons to vdk operation result dialogs

Enhances user experience by adding icons to result dialog boxes

Screenshot 2023-08-09 at 12 54 00 Screenshot 2023-08-09 at 12 53 49 Screenshot 2023-08-09 at 12 53 41

VDK Login UI: Semi-automated authentication workflow in the Jupyter Notebook

New database POC plugin vdk-duckdb

Check out more at vkd-duckdb

What's Changed

New Contributors

Full Changelog: v1.0.1...v1.2

Don't miss a new versatile-data-kit release

NewReleases is sending notifications on new releases.