github vmware/versatile-data-kit v0.9
Versatile Data Kit 0.9

latest releases: v1.4, v1.3, v1.2...
17 months ago

Summary

Major features include:

vdk-meta-jobs new plugin

Using this plugin you can specify dependencies between data jobs as a direct acyclic graph (DAG).

For example

def run(job_input):
    jobs = [
        {
        "job_name": "name-of-job",
        "team_name": "team-of-job",
        "fail_meta_job_on_error": True or False,
        "depends_on": [name-of-job1, name-of-job2]
        },
        ...
    ]
    MetaJobInput().run_meta_job(jobs)

See more details in the plugin home page

Control Service security hardening

  • Options for jobs to run in read-only file system
  • Provide credentials configuration for using private images during by the Control Service
  • Use a separate file system for storing temporary user-supplied files by Control Service
  • Enhanced job upload validation for zip exploits and unallowed files

Data Job Upload validation allow list

During the installation of Control Service administrators can limit what type of files can be uploaded as part of a data job.
A new configuration option is added called uploadValidationFileTypesAllowList .
uploadValidationFileTypesAllowList is comma separated list with file types.

For example Setting

uploadValidationFileTypesAllowList=image/png,text/plain

then only png images and plain text files can be uploaded. Otherwise, upload requests will fail.

See more details in helm chart documentation

vdk-logging-format - new plugin

This plugin allows for the configuration of the format of VDK logs.

Before there were separate plugins for each format, but they are not deprecated in favour of this one.

The plugin introduces a new configuration option LOGGING_FORMAT with possible values 'json', 'ltsv', 'text'

export LOGGING_FORMAT=JSON

Control Service helm chart support for Postgres

For embedded DB for control-service metadata storage, the Bitnami-available chart of PostgreSQL is added.

Now user can install it with

helm install vdk-control-service --postgresql.enabled=true cockroachdb.enabled=false

Package versions

See installation instructions here.
The versions of VDK components released under VDK 0.7 are:

Main components

control-service 1.5.707959356
vdk-core==0.3.692414765

Plugins

vdk-logging-json==0.1.693641831
vdk-meta-jobs==0.1.684477187
vdk-postgres== 0.0.692283840
vdk-trino== 0.4.703555598

What's Changed

  • control-service: Container read-only file system by @gageorgiev in #1291
  • control-service: Expose LOGGING_FORMAT through helm chart by @gageorgiev in #1329
  • control-service: a directory can be manually set as a location to store databjobs when processing them to git. by @murphp15 in #1290
  • control-service: add empty dir storage by @murphp15 in #1293
  • control-service: add support for allowlist in helm chart. by @murphp15 in #1283
  • control-service: add tests for some zip exploits by @tozka in #1266
  • control-service: builder base image in helm by @murphp15 in #1359
  • control-service: builder images load secrets from k8s by @murphp15 in #1358
  • control-service: create the secret in the correct namespace. by @murphp15 in #1318
  • control-service: deprecated jobsList endpoint cleanup by @ivakoleva in #1296
  • control-service: fix helm template by @murphp15 in #1295
  • control-service: fix ingress template by @murphp15 in #1277
  • control-service: helm chart for private builder by @murphp15 in #1336
  • control-service: namespace can be null by @murphp15 in #1349
  • control-service: postgresql embedded by @ivakoleva in #1273
  • control-service: refactor db query to mitigate race condition by @mrMoZ1 in #1269
  • control-service: release newer version of job builder by @murphp15 in #1362
  • control-service: set registry name correctly. by @murphp15 in #1323
  • control-service: test cleanup with the goal of making tests easier to run locally by @murphp15 in #1343
  • control-service: upload validation by @tozka in #1268
  • vdk-jupyter: Expand details on extensions design by @duyguHsnHsn in #1304
  • quickstart-vdk: Include vdk-logging-format by @gageorgiev in #1313
  • vdk-audit: set python requires >= 3.8 by @tozka in #1289
  • vdk-control-api-auth: Fix error message formatting by @gageorgiev in #1303
  • vdk-control-cli: fix cicd by @mrMoZ1 in #1327
  • vdk-control-cli: update doc for deployment of multiple jobs w/single command by @mrMoZ1 in #1325
  • vdk-core: Allow for modification of dynamic params by @doks5 in #1267
  • vdk-core: resolve library error classification on startup by @mrMoZ1 in #1241
  • vdk-events: add presentation slides of DSC event by @tozka in #1335
  • vdk-jupyter: introduce JupterLab extension by @duyguHsnHsn in #1338
  • vdk-logging-format: Fix path to readme in setup.py by @gageorgiev in #1322
  • vdk-logging-format: Join JSON and LTSV logging plugins into one by @gageorgiev in #1312
  • vdk-logging-json, vdk-logging-ltsv: Delete deprecated plugins by @gageorgiev in #1319
  • vdk-meta-jobs: Initial implementation by @tozka in #1249
  • vdk-postgres: add ingest plugin by @tozka in #1314
  • vdk-trino: Fix typo in the documentation by @tozka in #1340

New Contributors

Full Changelog: v0.8...v0.9

Don't miss a new versatile-data-kit release

NewReleases is sending notifications on new releases.