Major features include:
Introduce data quality checks (pre-alpha) (for scd1 template)
Allow quality checks to be made before the data is inserted into the target table.
Currently, the checks done on the processing step are not covering if the semantics of the data is correct. Therefore, bad data could went into the target table which could be unwanted behavior.
Example:
def sample_check_true(tmp_table_name):
return False if "bad" in tmp_table_name else True
template_args["check"] = sample_check
job_input.execute_template(
template_name="load/dimension/scd1",
template_args=template_args,
)
Jobs Query API (GraphQL) wildcard matching filter for team and job names
When querying information about jobs now users of the Jobs QUery API can use wildcard matches :
wildcard matching for example *search*
in graphQl filters for job name
and team name
as well as before exact matching of search strings
Provide User Agent when using VDK CLI
Users are looking to be able to determine where requests originated from when analyzing and browsing the telemetry data about VDK Control Service usage.
export VDK_CONTROL_SERVICE_USER_AGENT = foo
or in config.ini
[vdk]
vdk_control_service_user_agent=foo
If not set it would default to "vdk-control-cli/{version} ({os.name}; {sys.platform})" + {python version}
New plugin: vdk-notebook
A new VDK plugin that supports running data jobs which consists of .ipynb files. You can see VDK Notebook plugin page for more information.
vdk-ipython
This extension introduces a magic command for Jupyter. The command enables the user to load job_input for his current data job and use it freely while working with Jupyter.
You can see VDK ipython plugin page for more information.
Installation
Check the installation page
What's Changed
- control service: remove deprecated dependency by @murphp15 in #1589
- control-service: Remove dependency on old docker image which is not needed by @murphp15 in #1548
- control-service: Fixed data job status in case of OOM by @mivanov1988 in #1586
- control-service: base-job-image: automatic image cleanup by @mivanov1988 in #1636
- control-service: Cronjob API backwards compatibility by @doks5 in #1580
- control-service: Fix release step in pipeline by @murphp15 in #1550
- control-service: Remove duplicated CICD job runs by @mivanov1988 in #1596
- control-service: cleanup tests to ease testing on control service (v2) by @murphp15 in #1607
- control-service: cleanup tests to ease testing on control service by @murphp15 in #1604
- control-service: configurable job initContainer resources by @mivanov1988 in #1599
- control-service: graphql revert part of the wildcard filter matching by @mrMoZ1 in #1615
- control-service: handle init container OOM by @mivanov1988 in #1658
- control-service: integration tests refactoring by @mivanov1988 in #1562
- control-service: java 17 by @murphp15 in #1439
- control-service: only delete file in test path by @murphp15 in #1545
- control-service: remove deprecated classes in codebase. by @murphp15 in #1611
- control-service: remove old kerberous test dependency by @murphp15 in #1539
- control-service: upgrade gradle. by @murphp15 in #1543
- control-service: use latest docker image by @murphp15 in #1538
- frontend: data-pipelines (#1) bundle root by @ivakoleva in #1626
- frontend: data-pipelines (#2) lib bundle root by @ivakoleva in #1629
- frontend: data-pipelines (#3) lib bundle sources by @ivakoleva in #1630
- frontend: data-pipelines (#4) ui bundle root by @ivakoleva in #1631
- frontend: data-pipelines (#5) ui bundle sources by @ivakoleva in #1632
- frontend: open source shared components package by @DeltaMichael in #1618
- job-base-image-secure: remove unused parameter from publication script by @mivanov1988 in #1650
- job-builder: address docker image vulnerabilities by @mivanov1988 in #1523
- job-builder: fix ci/cd steps by @mivanov1988 in #1555
- job-builder: introduced secure base-job-image by @mivanov1988 in #1546
- job-builder: remove toybox from the base job image by @mivanov1988 in #1552
- vdk-cicd: cleanup cicd rules by @murphp15 in #1554
- vdk-cicd: during a scheduled run publish_artifacts and release shouldn't run by @murphp15 in #1551
- vdk-control-service: fix null dereferences by @dakodakov in #1512
- vdk-control-service: fix possible NPE by @dakodakov in #1522
- vdk-control-service: potential resource leak fixes by @dakodakov in #1513
- vdk-core: track configuration sensitivity by @DeltaMichael in #1579
- vdk-frontend: docker image for running end to end tests in gitlab by @murphp15 in #1563
- vdk-frontend: include readmes for the data-pipelines folders by @murphp15 in #1598
- vdk-frontend: open source readmes by @murphp15 in #1537
- vdk-impala: Introduce checks for scd1 template by @sbuldeev in #1472
- vdk-jobs-troubleshooting: Run troubleshooting server as deamon thread by @dakodakov in #1499
- vdk-jupyter: add create job command to jupyter by @duyguHsnHsn in #1581
- vdk-jupyter: add download job command to jupyter by @duyguHsnHsn in #1492
- vdk-jupyter: create iPython extension by @duyguHsnHsn in #1483
- vdk-jupyter: fixes on tsconfig and bad file naming by @duyguHsnHsn in #1594
- vdk-jupyter: improve error handling on the UI by @duyguHsnHsn in #1528
- vdk-jupyter: make VEP more accessible and informative by @duyguHsnHsn in #1635
- vdk-jupyter: modify the way we read notebooks in notebook plugin by @duyguHsnHsn in #1520
- vdk-jupyter: modify the way we work with notebooks in notebook plugin by @duyguHsnHsn in #1564
- vdk-jupyter: ui end-to-end testing by @duyguHsnHsn in #1617
- vdk-jupyter: vdk-notebook README improvements by @duyguHsnHsn in #1642
- vdk-meta-jobs: Better error message for misspelled job name by @gageorgiev in #1592
- vdk-snowflake: upgrade to Python 3.11 by @tozka in #1609
- vdk-spec: cleanup template by @murphp15 in #1518
- vdk-spec: describe package publishing by @murphp15 in #1536
- vdk-spec: folder structure by @murphp15 in #1525
- vdk-spec: remove api section because the frontend will have no impact on the api by @murphp15 in #1524
- vdk-spec: summary, glossary, motivation by @murphp15 in #1521
- vdk-test-utils: add cli_assert_output_contains by @tozka in #1540
- versatile-data-kit: update changelog instructions by @tozka in #1541
- versatile-data-kit: Meta Job example by @gageorgiev in #1640
- versatile-data-kit: copyright notice year update by @ivakoleva in #1634
- versatile-data-kit: git pre-commit hooks config by @ivakoleva in #1625
- versatile-data-kit: remove unnecessary changelog files by @tozka in #1549
- versatile-data-kit: update readme with mail list by @tozka in #1561
New Contributors
- @sbuldeev made their first contribution in #1472
- @gary-tai made their first contribution in #1535
- @DeltaMichael made their first contribution in #1579
Full Changelog: v0.10...v0.11