github datahub-project/datahub v0.9.6
DataHub v0.9.6

latest releases: v0.13.1, v0.13.1rc2, v0.13.1rc1...
15 months ago

Release Highlights

User Experience

  • We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.

image

  • [Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity

image

  • Improved error messaging for bulk editing via the UI

Metadata Ingestion

  • Update to data profiling to allow configurable number of sample values to be returned
  • Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
  • Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
  • Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
  • Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!

Developer Experience

  • Fixes quickstart/docker compose issues for M1 machines
  • Improvements in reliability and performance of the Restli Service endpoints for ingestion:
    • Scale Restli Service thread pool based on CPU
    • Add retry (exp backoff) to Restli Entity Client
    • MCE no longer relies on GMS for Restli service
    • Converted Restli Service from standalone servlet to Spring injectable
    • Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
    • Frontend asset generation refactor (causing tests to fail intermittently)

What's Changed

  • feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
  • chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
  • Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
  • refactor(restli-mce-consumer) by @david-leifker in #6744
  • fix(ci): reduce smoke test run time by @anshbansal in #6841
  • fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
  • feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
  • feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
  • refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
  • fix(kafka): datahub-upgrade job by @david-leifker in #6864
  • feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
  • chore(ingest): loosen requirements file by @hsheth2 in #6867
  • feat(ingest): upgrade pydantic version by @cccs-eric in #6858
  • fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
  • chore(ingest): loosen additional requirements by @hsheth2 in #6868
  • feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
  • docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
  • feat(CI): add venv caching by @szalai1 in #6843
  • feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
  • fix(runid): always update runid, except when queued by @david-leifker in #6876
  • fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
  • chore(ci): update dependencies docs-website by @anshbansal in #6871
  • feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
  • docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
  • test(mce-consumer): mockbeans by @david-leifker in #6878
  • feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
  • refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
  • refactor(smoke): use env variables by @anshbansal in #6866
  • fix(lint): pin pydantic version by @anshbansal in #6886
  • refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
  • fix(ingest): okta undefined variable error by @anshbansal in #6882
  • fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
  • fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
  • test(misc): misc test updates by @david-leifker in #6890
  • deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
  • chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
  • test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
  • fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
  • fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
  • fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
  • fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
  • feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
  • chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
  • docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
  • fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
  • fix(es-setup): create data stream on non-aws by @szalai1 in #6926
  • Adding missing Platform logos by @maggiehays in #6892
  • feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
  • Fix compose context for kafka-setup by @szalai1 in #6923
  • feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
  • chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
  • chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
  • fix(ci): managed ingestion test fix by @anshbansal in #6946
  • feat(ingest): add include_table_location_lineage flag for SQL common by @hsheth2 in #6934
  • feat(ingest): allow extracting snowflake tags by @frsann in #6500
  • chore(ingest): unpin pydantic dep by @hsheth2 in #6909
  • chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
  • fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
  • chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
  • fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
  • chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
  • fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
  • chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
  • fix(ci): restrict GE to fix build issues by @anshbansal in #6967
  • feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
  • feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
  • feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
  • fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
  • Docs fixes week of 12 22 by @laulpogan in #6963
  • fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
  • chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
  • chore: misc fixes by @anshbansal in #6966
  • feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
  • fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
  • feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
  • fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
  • chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
  • fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
  • config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
  • feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
  • chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
  • feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
  • feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
  • docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
  • fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
  • fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
  • fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
  • fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
  • fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
  • fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
  • fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
  • fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
  • fix(ingest): remove dead code from tests by @hsheth2 in #7005
  • feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
  • feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
  • docs(website): DPG WIP by @maggiehays in #6998
  • docs: resize datahub logo by @hsheth2 in #7014
  • fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
  • fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
  • feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
  • fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
  • fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
  • docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
  • fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
  • chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
  • feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
  • fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
  • fixing typos by @maggiehays in #7030

New Contributors

Full Changelog: v0.9.5...v0.9.6

What's Changed

  • feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
  • chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
  • Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
  • refactor(restli-mce-consumer) by @david-leifker in #6744
  • fix(ci): reduce smoke test run time by @anshbansal in #6841
  • fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
  • feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
  • feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
  • refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
  • fix(kafka): datahub-upgrade job by @david-leifker in #6864
  • feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
  • chore(ingest): loosen requirements file by @hsheth2 in #6867
  • feat(ingest): upgrade pydantic version by @cccs-eric in #6858
  • fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
  • chore(ingest): loosen additional requirements by @hsheth2 in #6868
  • feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
  • docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
  • feat(CI): add venv caching by @szalai1 in #6843
  • feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
  • fix(runid): always update runid, except when queued by @david-leifker in #6876
  • fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
  • chore(ci): update dependencies docs-website by @anshbansal in #6871
  • feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
  • docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
  • test(mce-consumer): mockbeans by @david-leifker in #6878
  • feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
  • refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
  • refactor(smoke): use env variables by @anshbansal in #6866
  • fix(lint): pin pydantic version by @anshbansal in #6886
  • refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
  • fix(ingest): okta undefined variable error by @anshbansal in #6882
  • fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
  • fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
  • test(misc): misc test updates by @david-leifker in #6890
  • deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
  • chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
  • test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
  • fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
  • fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
  • fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
  • fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
  • feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
  • chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
  • docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
  • fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
  • fix(es-setup): create data stream on non-aws by @szalai1 in #6926
  • Adding missing Platform logos by @maggiehays in #6892
  • feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
  • Fix compose context for kafka-setup by @szalai1 in #6923
  • feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
  • chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
  • chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
  • fix(ci): managed ingestion test fix by @anshbansal in #6946
  • feat(ingest): add include_table_location_lineage flag for SQL common by @hsheth2 in #6934
  • feat(ingest): allow extracting snowflake tags by @frsann in #6500
  • chore(ingest): unpin pydantic dep by @hsheth2 in #6909
  • chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
  • fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
  • chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
  • fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
  • chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
  • fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
  • chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
  • fix(ci): restrict GE to fix build issues by @anshbansal in #6967
  • feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
  • feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
  • feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
  • fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
  • Docs fixes week of 12 22 by @laulpogan in #6963
  • fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
  • chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
  • chore: misc fixes by @anshbansal in #6966
  • feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
  • fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
  • feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
  • fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
  • chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
  • fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
  • config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
  • feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
  • chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
  • feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
  • feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
  • docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
  • fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
  • fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
  • fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
  • fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
  • fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
  • fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
  • fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
  • fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
  • fix(ingest): remove dead code from tests by @hsheth2 in #7005
  • feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
  • feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
  • docs(website): DPG WIP by @maggiehays in #6998
  • docs: resize datahub logo by @hsheth2 in #7014
  • fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
  • fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
  • feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
  • fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
  • fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
  • docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
  • fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
  • chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
  • feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
  • fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
  • fixing typos by @maggiehays in #7030
  • feat(ingest): Ingest Previews for Looker Charts, Dashboards, and Explores by @jjoyce0510 in #6941
  • fix(graphql):fix issue: autorender aspect could not be displayed on t… by @yangjiandan in #6993
  • fix(config): adding quotes by @david-leifker in #7038
  • fix(config): adding quotes by @david-leifker in #7040
  • fix(ingest/bigquery): Turning some usage warning message to debug log as it caused confusion by @treff7es in #7024
  • feat(ingest/vertica): Adding Vertica as source in Datahub UI by @Rajasekhar-Vuppala in #7010
  • Removed a double set for two fields by @bda618 in #7037

New Contributors

Full Changelog: v0.9.5...v0.9.6

Don't miss a new datahub release

NewReleases is sending notifications on new releases.