github vmware/versatile-data-kit v0.7

latest releases: v1.4, v1.3, v1.2...
2 years ago

Summary

Major features include:

VDK Template running state detection capability

Since template executions are autonomous data job runs, we need to be able to determine if a template is running at any time.
For example, to distinguish between root data job finalization, and template finalization

For example if we want to send telemetry somewhere:

    @hookimpl
    def finalize_job(self, context: JobContext) -> None:
        template = context.core_context.state.get(ExecutionStateStoreKeys.TEMPLATE_NAME)
        if template: 
           telemetry.send(phase="finalize_template", template_name = template) 
        else: 
           telemetry.send(phase="finalize_job", job_name=context.name)

New Logging configuration LOG_LEVEL_MODULE

Enable users to override logs per module, temporarily (e.g for debugging or prototyping reasons to increase the verbosity of certain
module).

For example assuming default log level is INFO we can enable verbose logs for 2 modules "vdk.api" and "custom.module":

export LOG_LEVEL_MODULE="vdk.api=DEBUG;custom.module=DEBUG" 
vdk run job-name 

Or in specific job config.ini:

[vdk]
log_level_module=vdk.api=DEBUG;custom.module=DEBUG

New plugin backend for Properties: from local file system

A simplistic plugin, that allows a developer or presenter to quickly store properties on the local FS.

It can be used to store secrets/configuration for a dev/demo session, that does not require a prerequisite of the entire Control Service installed and running.
It can be used to test a job run locally only without updating the state of the deployed job.

Example:

export PROPERTIES_DEFAULT_TYPE="fs-properties-client"

or in specific job config.ini

[vdk]
properties_default_type=fs-properties-client

Now properties are stored in a local file. The file location can be further configured using FS_PROPERTIES_FILENAME and FS_PROPERTIES_DIRECTORY

Coockiecutter for new plugins

Create new plugin skeleton very easy

cookiecutter https://github.com/tozka/cookiecutter-vdk-plugin.git

and follow the instructions

Add the ability to cancel remaining job steps

Now a job (or a template) can be canceled from any step and all remaining steps in the job (or template) will be skipped.
For example, it can be used if a data job depends on processing data from a source that has indicated no new entries since the last run, then we can skip the remaining steps.

Example:

def run(job_input: IJobInput): 
    data = get_last_delta()
    if not data:
        job_input.skip_remaining_steps()

Package versions

See installation instructions here.
The versions of VDK components released under VDK 0.7 are:

Main components

control-service 1.5.622899758

vdk-control-cli==1.3.626767210
vdk-core==0.3.652866366

Plugins

vdk-properties-fs==0.0.651770458
vdk-kerberos-auth==0.3.631374202
vdk-impala==0.4.651849986

What's Changed

  • vdk-control-cli: Drop requirement pluggy to be 0.* by @gageorgiev in #1116
  • vdk-core: Add log before query result fetch by @doks5 in #1195
  • vdk-core: Fix issue with serializing Decimal values during payload check by @gageorgiev in #946
  • vdk-core: add ability to cancel remaining job steps by @mrMoZ1 in #1188
  • vdk-core: add new configuration log_level_module by @tozka in #1167
  • vdk-core: added default values to write termination message method by @mivanov1988 in #1185
  • vdk-core: avoid circular references in print results by @tozka in #1176
  • vdk-core: extend classification error test by @tozka in #1180
  • vdk-core: fix error classification of vdk code by @tozka in #1173
  • vdk-core: fix flakey test in test checking logs output by @murphp15 in #1194
  • vdk-core: template running state detection capability by @ivakoleva in #941
  • vdk-csv: Updates on vdk-csv README by @duyguHsnHsn in #952
  • vdk-impala: Add validation for queries that doesn't provide lineage info by @kostoww in #1175
  • vdk-impala: fix error classification in impala by @tozka in #1178
  • vdk-impala: fix impala template empty source view usr err by @mrMoZ1 in #1189
  • vdk-impala: fixed platform error missclasified when running template by @mrMoZ1 in #944
  • vdk-impala: improve vdk-impala documentation by @tozka in #948
  • vdk-kerberos-auth: Pinned minikerberos in vdk-kerberos-auth plugin by @mivanov1988 in #1168
  • vdk-kerberos-auth: add KerberosClient for authenticating API calls by @tozka in #879
  • vdk-plugins: improve plugin project creation with cookiecutter by @tozka in #942
  • vdk-properties-fs: new plugin for local FS properties storage by @ivakoleva in #1190
  • vep: Jupyter Notebook Integration Goals and Requirements by @duyguHsnHsn in #1165
  • vep: Jupyter Notebook Integration by @duyguHsnHsn in #1113
  • versatile-data-kit: Without and with VDK image by @zverulacis in #1184
  • versatile-data-kit: set automatic java formatter by @tozka in #757
  • versatile-data-kit: simplify release process by @tozka in #951
  • versatile-data-kit: update contact instructions by @tozka in #1172

New Contributors

Full Changelog: v0.6...v0.7

Don't miss a new versatile-data-kit release

NewReleases is sending notifications on new releases.