github vmware/versatile-data-kit v0.3
Versatile Data Kit 0.3

latest releases: v1.4, v1.3, v1.2...
2 years ago

Summary

Major features include:

Support for Kerberos Authentication provider in the Control Service

Alongside support for Oauth2, now organizations can integrate with their Kerberos infrastructure.
Users can specify Kerberos as an authentication provider for accessing VDK Control Service.

For more information on how to configure Kerberos see VDK helm documentation here

A new plugin: vdk-lineage (alpha)

VDK Lineage plugin provides lineage data (input data -> job -> output data) information for any SQL query (regardless of the database) executed using VDK and sends it to a pre-configured destination using OpenLineage standard

We also have introduced a utility command vdk marquez-server --start which starts Marquez UI locally so that lineage is visualized.

For more information check out vdk-lineage plugin documentation

Support for Kuberentes 1.23

Now VDK Control Service can work seamlessly with the newest versions of Kubernetes and make use of its features:

  • VDK Control Service can now work with CronJob controller V2 (alongside V1).
  • With TTL Controller, any jobs launched by VDK Control Service can be cleaned up after preconfigured time.

Users can override the VDK version of a deployed data jobs

Users can now specify the vdk version both using API or CLI when deploying a Data Job.
For example, with CLI it's as simple as vdk deploy --update --vdk-version old-vdk-version

This would enable canary deployments or rolling deployments of VDK.

Introducing VEP (VDK Enhancement Proposal) process and first VEP

Versatile Data Kit has a process in place for proposing and adding large changes in an efficient and consistent manner.

For more information check the process here.

We also have used the process for our first major feature change - Apache Airflow Integration

Package versions

See installation instructions here.
The versions of VDK components released under VDK 0.3 are:

Main components

control-service 1.5.520417292

vdk-control-cli==1.3.520417292
vdk-core==0.2.520417292
vdk-heartbeat==0.6.520417292

Plugins

vdk-trino==0.3.520417292
vdk-lineage==0.2.520417292
vdk-kerberos-auth==0.3.520417292
vdk-impala==0.3.520417292

What's Changed

  • VEP-554: Apache Airflow Integration by @mivanov1988 in #748 and @doks5 in #786
  • airflow-provider-vdk: Initial Airflow provider structure by @gageorgiev in #772
  • airflow-provider-vdk: Job execution status and logs method by @gageorgiev in #796
  • airflow-provider-vdk: Start and cancel job execution methods by @gageorgiev in #778
  • airflow-provider-vdk: VDKSensor initial structure by @gageorgiev in #800
  • control-service: Adopt kubernetes-client 14.0.1 by @gageorgiev in #761
  • control-service: add kerberos auth properties to helm chart by @mrMoZ1 in #764
  • control-service: Adopt use of the V1CronJob API by @gageorgiev in #767
  • control-service: Bump pipelines-control-service version by @doks5 in #762
  • control-service: Set TTLAfterFinished period for K8s CronJobs by @gageorgiev in #776
  • control-service: Update CHANGELOG.md by @doks5 in #760
  • control-service: add OAuth2 enable/disable flag by @mrMoZ1 in #765
  • control-service: add kerberos auth provider by @mrMoZ1 in #755
  • control-service: builder job configurable security context by @mivanov1988 in #708
  • control-service: configurable builder job service account by @mivanov1988 in #791
  • control-service: fix builder security context by @mivanov1988 in #784
  • control-service: fix concatAddresses NPE by @mivanov1988 in #782
  • control-service: fix job builder unit tests by @mivanov1988 in #792
  • control-service: fix log link to set endTime always by @tozka in #735
  • vdk-control-cli: Adopt click version 8 by @ivakoleva in #770
  • vdk-control-cli: set vdk version and enabled when deploying new job by @tozka in #752
  • vdk-core: JobInput get_name and get_job_directory implementation by @ivakoleva in #745
  • vdk-core: Verify payload after pre-processing it by @YanaZhivkova in #777
  • vdk-core: clarify run descriptions on --arguments option by @tozka in #731
  • vdk-core: ensure sql args are subsituted in correct priority by @tozka in #749
  • vdk-core: lowercase env variables are inferred as configuration by @tozka in #751
  • vdk-core: minor refactoring in managed_cursor to reduce long method by @tozka in #803
  • vdk-core: print query duration by @mrMoZ1 in #804
  • vdk-core: refactor test to use job_path method by @tozka in #747
  • vdk-core: update plugin hook diagrams by @tozka in #775
  • vdk-core: Adopt click version 8.0 by @doks5 in #769
  • vdk-heartbeat: Fix initial job executions with specific vdk version by @YanaZhivkova in #758
  • vdk-heartbeat: Handle execution end_time not string by @doks5 in #750
  • vdk-impala: unify names of templates betwen trino and impala by @tozka in #787
  • vdk-kerberos-auth: support kerberos auth for all CLI commands by @tozka in #774
  • vdk-kerberos-auth: upgrade minikerberos and requests-kerberos to latest by @ivakoleva in #742
  • vdk-lineage: introducing POC (pre-alpha) implementation by @tozka in #783
  • vdk-plugins: Introduce vdk-control-api-auth plugin by @doks5 in #801
  • vdk-snowflake: Enable support for Python 3.10 by @gageorgiev in #746
  • vdk-trino: add link to template examples by @tozka in #788
  • vdk-trino: collect lineage for select/insert and rename table only by @philip-alexiev in #756
  • vdk-trino: fix ingesting value with bool type failing by @tozka in #753
  • vdk: add VDK enhancement proposal (VEP) spec template by @tozka in #727
  • versatile-data-kit: Update CONTRIBUTING.md with links to coding standard by @tozka in #794

New Contributors

Full Changelog: 0.2...v0.3

Don't miss a new versatile-data-kit release

NewReleases is sending notifications on new releases.