github vmware/versatile-data-kit v1.0
Versatile Data Kit 1.0

latest releases: v1.4, v1.3, v1.2...
14 months ago

Major features include:

VDK Operations UI

VDK Operations UI is a browser application that allows users to manage and monitor data jobs. It ships as part of quickstart-vdk and is available to users who run quickstart-vdk locally.

Users can now:

  • View the overall health of their data jobs
  • Enable/disable/re-run data jobs
  • Have a list of their data jobs and view their deployment status, latest execution status, success rate, etc.
  • Have easy access to individual data job details, such as description, schedule, notifications, and data job source code
  • View details for each execution of a data job, e.g. the number of executions, job versions for each execution, execution duration, etc.

For more information about the architecture, check out VEP-1507.

See the UI in action:

Control Service Secrets API

With the release of Secrets API, users can now securely store sensitive data such as passwords, credentials, tokens, ensuring compliance with industry standards and reducing the risk of unauthorized access and data breaches.

The new Secrets API allows users to configure a Vault instance in the Control Service, enabling the storage and retrieval of secrets for data jobs. Data jobs can now easily set and retrieve secrets during runtime, enhancing security and enabling seamless integration with third-party systems.

To store and retrieve secrets, we have introduced new API methods under the path

/data-jobs/for-team/{team_name}/jobs/{job_name}/deployments/{deployment_id}/secrets

Users can make GET requests to retrieve secrets and PUT requests to update secrets for a specific data job deployment.

For more details on API usage and examples, please refer to our documentation.

vdk-impala: Introduce checks for snapshot and insert template

With the introduction of snapshot and insert template checks, we can now ensure the quality and correctness of the data before it is inserted into the target table.

Previously, the processing step checks were unable to validate the semantics of the data, potentially allowing erroneous data to be inserted. With the new checks in place, we have better control over the data integrity and can prevent unwanted behavior.

Here's an example of how to use the checks:

    def sample_check(tmp_table_name):
        return False if "bad" in tmp_table_name else True 

    template_args["check"] = sample_check 
    job_input.execute_template(
        template_name="snapshot",
        template_args=template_args,
    )

What's Changed

  • control-service: better error logging allowing to understand failing test by @murphp15 in #2184
  • control-service: Python image based on Photon OS by @mivanov1988 in #2243
  • control-service: ability to send authenticated email notifications by @mrMoZ1 in #2294
  • control-service: add secrets API by @dakodakov in #2171
  • control-service: add tmp dir path to image deployer's env variables by @mrMoZ1 in #2244
  • control-service: data jobs points to correct namespace by @murphp15 in #2268
  • control-service: fix failing pipelines by @murphp15 in #2296
  • control-service: infer correct namespace if not set by @tozka in #2277
  • control-service: install kubectl by @murphp15 in #2290
  • control-service: introduce latest and stable tags for docker images by @DeltaMichael in #2138
  • control-service: make kubernetes service easy to test. by @murphp15 in #2249
  • control-service: move cron jobs methods to the data jobs class by @murphp15 in #2291
  • control-service: move cron jobs methods to the data jobs class by @murphp15 in #2293
  • control-service: multiple namespaces in testing by @murphp15 in #2269
  • control-service: produce secure base job images for python 3.8-3.11 by @mivanov1988 in #2208
  • control-service: remove spammy logs by @tozka in #2278
  • control-service: remove unneeded methods by @murphp15 in #2260
  • control-service: remove unused properties by @murphp15 in #2262
  • control-service: secrets service implementation by @dakodakov in #2241
  • control-service: secrets service integration test by @dakodakov in #2289
  • control-service: secrets service unit tests by @dakodakov in #2276
  • control-service: use real class when testing instead of mock by @murphp15 in #2261
  • examples: Add Supported Python Versions Example by @doks5 in #2288
  • frontend: add null checks for optional configs by @DeltaMichael in #2193
  • frontend: disable stable tagging for ui docker images by @DeltaMichael in #2240
  • frontend: ping frontend on docker image release by @DeltaMichael in #2101
  • specs: VEP-2272 Complete Data Job Configuration Persistence Part 2 by @mivanov1988 in #2302
  • specs: VEP-2272 Complete Data Job Configuration Persistence by @mivanov1988 in #2287
  • vdk-control-cli: Allow extensions to specify a sample job by @gageorgiev in #2177
  • vdk-control-cli: Test only on 3.7 and 3.11 by @gageorgiev in #2230
  • vdk-core: Accept string as job_path in JobConfig by @doks5 in #2251
  • vdk-core: Add python version disparity warning by @doks5 in #2242
  • vdk-core: Add python_version configuration to config-help by @doks5 in #2271
  • vdk-core: Improve log message for python version disparity by @doks5 in #2250
  • vdk-core: Update JobConfig to match vdk-control-cli JobConfig by @doks5 in #2226
  • vdk-core: adapt to recent pluggy changes by @dakodakov in #2317
  • vdk-core: add configurable write directory value by @mrMoZ1 in #2206
  • vdk-core: add vdk sdk secrets api - part I by @dakodakov in #2318
  • vdk-core: add vdk sdk secrets api - part III by @dakodakov in #2325
  • vdk-impala: Introduce checks for insert template by @sbuldeev in #2198
  • vdk-impala: Introduce checks for snapshot template by @sbuldeev in #2040
  • vdk-jupyter: Allow for creating a job with a notebook step by @gageorgiev in #2172
  • vdk-jupyter: Fix job creation by @gageorgiev in #2245
  • vdk-jupyter: fix build by pinning every package to a specific version by @duyguHsnHsn in #2186
  • vdk-jupyter: installation and build by @duyguHsnHsn in #2319
  • vdk-jupyter: pin jupyterlab to 3.6.3 in pyproject.toml by @duyguHsnHsn in #2292
  • vdk-jupyter: pin tsc to specific version by @duyguHsnHsn in #2220
  • vdk-jupyter: small fixtures on the ui by @duyguHsnHsn in #2161
  • vdk-notebook: handle job with mixed .ipynb, .py, .sql files use-case by @duyguHsnHsn in #2279
  • vdk-plugin-control-cli: better error logging by @murphp15 in #2185
  • vdk-test-utils: add vdk sdk secrets api - part 2 by @dakodakov in #2320
  • versatile-data-kit: Update .gitlint by @tozka in #2266
  • versatile-data-kit: add pr title checker by @tozka in #2270
  • versatile-data-kit: ignore patch updates in dependabot by @tozka in #2328

Full Changelog: v0.14...v1.0

Don't miss a new versatile-data-kit release

NewReleases is sending notifications on new releases.