github vmware/versatile-data-kit v0.8
Versatile Data Kit 0.8

latest releases: v1.4, v1.3, v1.2...
23 months ago

Summary

Major features include:

New plugin: VDK Audit

This plugin provides the ability to audit and potentially limit user operations. These operations can be deep within the Python runtime or standard libraries, such as dynamic code compilation, module imports, or OS command invocations.

If we want to forbid some os.* operations we can do it like this:

export AUDIT_HOOK_ENABLED=true
export AUDIT_HOOK_FORBIDDEN_EVENTS_LIST='os.removexattr;os.rename;os.rmdir;os.scandir'
export AUDIT_HOOK_EXIT_ON_FORBIDDEN_EVENT=true

vdk run <job-name>

See more details in the vdk-audit plugin page

Any version of python in VDK Control Service

Deployed jobs by Control Service can now use any version of Python and not just 3.7 automatically.

Insert only impala load template

This template can be used to load raw data from Data Lake to target Table in Data Warehouse. In summary, it appends all records from the source table to the target table. Similar to all other SQL modeling templates there is also schema validation, table refresh and statistics are computed when necessary.

Example:

def run(job_input):
    # . . .
    template_args = {
        'source_schema': 'source',
        'source_view': 'view_source',
        'target_schema': 'target',
        'target_table': 'destination_table'
    }
    job_input.execute_template('insert', template_args)

See more details in the template documentation page

Package versions

See installation instructions here.
The versions of VDK components released under VDK 0.7 are:

Main components

control-service 1.5.671965442
vdk-core==0.3.662978536

Plugins

vdk-ingest-http==0.2.670842377
vdk-impala==0.4.672320306

What's Changed

  • control-service: CVE fix - upgrade commons-text by @tozka in #1255
  • control-service: Dynamic python site-packages directory detection by @mivanov1988 in #1247
  • control-service: fix cicd deployment by @tozka in #1226
  • control-service: fix integration tests by @tozka in #1211
  • control-service: fix race condition in test by @murphp15 in #1227
  • control-service: refactor job cancellation method due to 404 errors by @mrMoZ1 in #1114
  • control-service: remove executables from secure job builder by @mivanov1988 in #1202
  • control-plane: better error logging for transient error in tests by @murphp15 in #1222
  • control-service: improve docs and local runability of integration tests by @murphp15 in #1217
  • control-service: upgrade java client k8s version by @murphp15 in #1216
  • vdk-core: errors occurred and the state (handled or not) context missing by @ivakoleva in #1182
  • vdk-core: errors occurred and the state (handled or not) context missing by @tozka in #1212
  • vdk-core: platform error no longer logged when skipping execution steps by @mrMoZ1 in #1223
  • vdk-impala: Fix parsing while analysing profile for lineage information by @kostoww in #1206
  • vdk-impala: Refactor query classifier for data lineage by @kostoww in #1239
  • vdk-impala: improve explanation in readme by @tozka in #1248
  • vdk-impala: stop using errors.get_exception_message by @tozka in #1224
  • vdk-impala: update documentation with link by @tozka in #1237
  • vdk-ingest-http: Adopt simplejson in place of json by @doks5 in #1229
  • vdk-ingest-http: Move data conversion above size calc by @doks5 in #1245
  • vdk-ingest-http: fix default value for backoff factor, add retry test by @dakodakov in #1218
  • vdk-plugins: fix broken link by @tozka in #1204
  • vdk-plugins: introduced vdk-audit plugin by @mivanov1988 in #1221
  • vdk-plugins: run tests on release of vdk-core by @tozka in #1210
  • vdk-plugins: set dind tempalte job for default build of plugins by @tozka in #1225
  • versatile-data-kit: required approving reviewers update by @ivakoleva in #1220
  • versatile-data-kit: update contributing.md by @tozka in #1214

New Contributors

Full Changelog: v0.7...v0.8

Don't miss a new versatile-data-kit release

NewReleases is sending notifications on new releases.