Major features include:
New Github Landing Page
The new landing page of our open-source project. The new landing page aims to allow users to see and understand what is VDK and what they can do with VDK much easier by showing them.
Check it out at https://github.com/vmware/versatile-data-kit
Control Service improvements
Operators can set builder image per Python version
Operator can easily control the image of
- The operator-managed VDK (system) library,
- The base image used to build the user data job
- And now the builder image with which the user data job is build
deploymentSupportedPythonVersions:
3.9:
baseImage: "registry.hub.docker.com/versatiledatakit/data-job-base-python-3.7:latest"
vdkImage: "registry.hub.docker.com/versatiledatakit/quickstart-vdk:release"
builderImage: "registry.hub.docker.com/versatiledatakit/job-builder:latest"
More information can be found in the Control Service Helm Chart documentation
Operator can configured to automatically ignore files on deploy
When users deploy job operator can control which files are actually accepted and either return error or simply ignore them:
This allows much better security while also allowing flexibility of operators to change without impacting users directly:
# Instead to allow only sql and ini text files specify "text/x-sql,text/x-ini"
# Full list of file types are documented in https://tika.apache.org
# If set to empty, then all file types are allowed.
uploadValidationFileTypesAllowList: ""
# List of file extensions that are allowed to be uploaded. Comma separated list e.g: "py,csv,sql"
# only files with extensions that are present in this list will be allowed to be uploaded.
# if the list is empty all extensions are allowed.
uploadValidationFileExtensionsAllowList: ""
# Works as the uploadValidationFileTypesAllowList above, only it deletes the files instead of failing
# the job upload. Runs before the allow list, therefore if only files of the same types are present in
# both lists, job upload will succeed.
uploadValidationFileTypesFilterList: ""
# List of file extensions that are automatically deleted from data job source code before upload.
# Comma separated list e.g: "pyc,exe,sh". If the list is empty no files will be deleted.
# Files are first deleted before the allow list performs its checks.
uploadValidationFileExtensionsFilterList: ""
More information can be found in the Control Service Helm Chart documentation
New initiative: VDK Run Logs: Simplified And Readable
Take a look at the VEP which would simplify troubleshooting and development using VDK .
We are focused on those goals:
- Data job run logs provide progress-tracking information
- User logs stand out
- Long-running operations (like DAGs) are traceable in the logs
- The root cause is immediately visible from the logs.
- Clean Error Handling
Versatile Data Kit Architecture.md
Design architecture of Versatile Data Kit outlining all main interfaces and how they work can be seen at architecture.md
Notebook UI improvements
Add UI element indicating a VDK operation is running
Provides visual feedback to the user when a VDK operation is in progress.
Add icons to vdk operation result dialogs
Enhances user experience by adding icons to result dialog boxes
VDK Login UI: Semi-automated authentication workflow in the Jupyter Notebook
New database POC plugin vdk-duckdb
Check out more at vkd-duckdb
What's Changed
- control-service: Ability to disable VDK UI in Helm by @antoniivanov in #2545
- control-service: ability to set builder image per python version by @mivanov1988 in #2490
- control-service: add file filter before job upload by @mrMoZ1 in #2540
- control-service: add libffi to secure base job image by @mivanov1988 in #2515
- control-service: allow delete-all-secrets command by @dakodakov in #2523
- control-service: enable WebHook authentication by @mivanov1988 in #2551
- control-service: exclude pyc files from dj validation by @mivanov1988 in #2492
- control-service: fine-tune the job-builder-secure by @mivanov1988 in #2497
- control-service: fix secure base image by @mivanov1988 in #2517
- control-service: fix webhooks authentication helm chart by @mivanov1988 in #2560
- control-service: introduce data job deployment entity by @mivanov1988 in #2613
- control-service: release job builder secure 1.3.0 by @mivanov1988 in #2496
- control-service: release secure job builder by @mivanov1988 in #2534
- control-service: remove hardcoded image pull policy from job deployer by @mrMoZ1 in #2557
- control-service: support for pyodbc by @mivanov1988 in #2524
- control-service: update supported python version example by @mivanov1988 in #2494
- frontend: fix pushing images by @antoniivanov in #2491
- frontend: Fix bug in Cypress plugin by @gorankokin in #2486
- frontend: Fix for e2e test by @gorankokin in #2559
- frontend: bump cicd-base-gui image version by @DeltaMichael in #2366
- frontend: set favicon for vdk by @antoniivanov in #2433
- specs: VEP-2420: Getting started with your Data by @murphp15 in #2519
- specs: VEP-2448: VDK Run Logs: Simplified And Readable by @DeltaMichael in #2456
- specs: add architecture.md by @antoniivanov in #2265
- specs: try to make it clear what deliverables should be by @antoniivanov in #2495
- specs: update Notebook integration with Oauth2 authentication by @antoniivanov in #2533
- specs: update VEPs metadata by @antoniivanov in #2532
- specs: vep-2448 detailed design section by @DeltaMichael in #2558
- specs: vep-2448 high-level design by @DeltaMichael in #2520
- vdk-audit: [bug fix] Fix incorrectly detected event by @doks5 in #2548
- vdk-control-api-auth: add better error message for refresh token failure by @antoniivanov in #2607
- vdk-control-api-auth: add get_authenticated_username by @antoniivanov in #2518
- vdk-control-api-auth: vdk credentials cache refactoring by @antoniivanov in #2606
- vdk-control-cli: Add python_version to sample config.ini by @doks5 in #2555
- vdk-control-cli: add --set-prompt option for secrets by @dakodakov in #2514
- vdk-core: Add flag to JobConfig in case config file is required by @doks5 in #2521
- vdk-core: add vdk sql-query command by @antoniivanov in #2512
- vdk-core: adopt pluggy 1.3 by @antoniivanov in #2614
- vdk-core: make sure standalone data job doesn't run steps by @antoniivanov in #2609
- vdk-core: pass vdk run arguments in standalone job mode as well by @antoniivanov in #2601
- vdk-core: revert "vdk-core: add vdk sql-query command (#2512)" by @DeltaMichael in #2549
- vdk-duckdb: Introducing a new database plugin by @Maximiliaan72 in #2561
- vdk-greenplum: adopt testcontainers by @antoniivanov in #2498
- vdk-impala, vdk-postgres: adopt testcontainers by @antoniivanov in #2503
- vdk-impala: Handle errors on refresh/invalidate metadata by @doks5 in #2511
- vdk-impala: Introduce COMPUTE STATS statements by @sbuldeev in #2584
- vdk-impala: Make error retries and backoff configurable by @doks5 in #2509
- vdk-ipython: infer correctly job name and add more tests by @antoniivanov in #2602
- vdk-ipython: use package structure for file by @murphp15 in #2499
- vdk-jobs-troubleshooting: document plugin by @dakodakov in #2530
- vdk-jupyter: add UI element indicating a VDK operation is running by @yonitoo in #2505
- vdk-jupyter: add icons to vdk operation result dialogs by @yonitoo in #2538
- vdk-jupyter: add oauth2 authentication implementation by @antoniivanov in #2590
- vdk-jupyter: add run result to the deployment result dialog by @duyguHsnHsn in #2605
- vdk-jupyter: add testing to CI/CD by @yonitoo in #2541
- vdk-jupyter: automate create options by @duyguHsnHsn in #2506
- vdk-jupyter: impelement manual oauth2 login workflow by @antoniivanov in #2591
- vdk-jupyter: improve init message by @antoniivanov in #2618
- vdk-jupyter: introduce cicd/build.sh by @antoniivanov in #2501
- vdk-jupyter: make run before deploy optional by @duyguHsnHsn in #2525
- vdk-jupyter: optimize ConvertToJobNotebook notes by @antoniivanov in #2603
- vdk-jupyter: remove cell outputs before deployment by @duyguHsnHsn in #2579
- vdk-jupyter: remove the status button border by @yonitoo in #2536
- vdk-jupyter: remove unnecessary imports that cause errors by @antoniivanov in #2612
- vdk-jupyter: vdk jupyter deploy job fix by @duyguHsnHsn in #2474
- vdk-kerberos-auth: enable auth to work inside a running asyncio event loop by @antoniivanov in #2600
- vdk-sqlite: support whitespace in column names by @DeltaMichael in #2510
- versatile-data-kit: Replace/update README.md file by @zverulacis in #2608
- versatile-data-kit: add poster image to all the videos by @antoniivanov in #2623
- versatile-data-kit: main README by @antoniivanov in #2502
- versatile-data-kit: remove global image from Gitlab CI by @antoniivanov in #2466
New Contributors
- @Maximiliaan72 made their first contribution in #2561
Full Changelog: v1.0.1...v1.2