Release Highlights
User Experience
- We now support embedding Dashboards, Charts, and Datasets. This allows us to do things like directly embed Looker / Tableau / Mode / Redash Looks, Dashboards, Explores into the Dataset pages themselves.
- [Experimental] You can now customize the number of queries displayed on the Query tab of a Dataset entity
- Improved error messaging for bulk editing via the UI
Metadata Ingestion
- Update to data profiling to allow configurable number of sample values to be returned
- Postgres ingestion now supports emitting lineage edges for Views - shoutout to @LucasRoesler for the contribution!
- Snowflake ingestion now supports extracting tags - shoutout to @frsann for the contribution!
- Vertica ingestion now supports projections and lineage- thanks for the contribution, @vishalkSimplify!
- Glue ingestion now emits an s3 lineage edge when data was written with an s3a/s3n client - thanks for the contribution, @danielli-ziprecruiter!
Developer Experience
- Fixes quickstart/docker compose issues for M1 machines
- Improvements in reliability and performance of the Restli Service endpoints for ingestion:
- Scale Restli Service thread pool based on CPU
- Add retry (exp backoff) to Restli Entity Client
- MCE no longer relies on GMS for Restli service
- Converted Restli Service from standalone servlet to Spring injectable
- Docker build externalized (significantly faster on m1, <7 minute build times, based on this)
- Frontend asset generation refactor (causing tests to fail intermittently)
What's Changed
- feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
- chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
- Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
- refactor(restli-mce-consumer) by @david-leifker in #6744
- fix(ci): reduce smoke test run time by @anshbansal in #6841
- fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
- feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
- feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
- refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
- fix(kafka): datahub-upgrade job by @david-leifker in #6864
- feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
- chore(ingest): loosen requirements file by @hsheth2 in #6867
- feat(ingest): upgrade pydantic version by @cccs-eric in #6858
- fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
- chore(ingest): loosen additional requirements by @hsheth2 in #6868
- feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
- docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
- feat(CI): add venv caching by @szalai1 in #6843
- feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
- fix(runid): always update runid, except when queued by @david-leifker in #6876
- fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
- chore(ci): update dependencies docs-website by @anshbansal in #6871
- feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
- docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
- test(mce-consumer): mockbeans by @david-leifker in #6878
- feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
- refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
- refactor(smoke): use env variables by @anshbansal in #6866
- fix(lint): pin pydantic version by @anshbansal in #6886
- refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
- fix(ingest): okta undefined variable error by @anshbansal in #6882
- fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
- fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
- test(misc): misc test updates by @david-leifker in #6890
- deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
- chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
- test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
- fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
- fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
- fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
- fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
- feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
- chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
- docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
- fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
- fix(es-setup): create data stream on non-aws by @szalai1 in #6926
- Adding missing Platform logos by @maggiehays in #6892
- feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
- Fix compose context for kafka-setup by @szalai1 in #6923
- feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
- chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
- chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
- fix(ci): managed ingestion test fix by @anshbansal in #6946
- feat(ingest): add
include_table_location_lineage
flag for SQL common by @hsheth2 in #6934 - feat(ingest): allow extracting snowflake tags by @frsann in #6500
- chore(ingest): unpin pydantic dep by @hsheth2 in #6909
- chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
- fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
- chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
- fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
- chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
- fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
- chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
- fix(ci): restrict GE to fix build issues by @anshbansal in #6967
- feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
- feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
- feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
- fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
- Docs fixes week of 12 22 by @laulpogan in #6963
- fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
- chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
- chore: misc fixes by @anshbansal in #6966
- feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
- fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
- feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
- fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
- chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
- fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
- config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
- feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
- chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
- feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
- feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
- docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
- fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
- fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
- fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
- fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
- fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
- fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
- fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
- fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
- fix(ingest): remove dead code from tests by @hsheth2 in #7005
- feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
- feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
- docs(website): DPG WIP by @maggiehays in #6998
- docs: resize datahub logo by @hsheth2 in #7014
- fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
- fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
- feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
- fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
- fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
- docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
- fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
- chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
- feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
- fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
- fixing typos by @maggiehays in #7030
New Contributors
- @marvin-roesch made their first contribution in #6873
- @stijndehaes made their first contribution in #6719
- @ccpypy made their first contribution in #6583
- @LucasRoesler made their first contribution in #6953
- @vishalkSimplify made their first contribution in #6785
- @wngus606 made their first contribution in #6959
- @jaegwonseo made their first contribution in #6943
- @blankon123 made their first contribution in #7007
Full Changelog: v0.9.5...v0.9.6
What's Changed
- feat(ingest): add pydantic helper for removed fields by @hsheth2 in #6853
- chore(0.9.5): Bump defaults for release v0.9.5 by @jjoyce0510 in #6856
- Revert "fix(ci): remove warnings due to deprecated action" by @anshbansal in #6857
- refactor(restli-mce-consumer) by @david-leifker in #6744
- fix(ci): reduce smoke test run time by @anshbansal in #6841
- fix(security): require signed/encrypted jwt tokens by @david-leifker in #6565
- feat(ingest): update profiling to fetch configurable number of sample values by @mayurinehate in #6859
- feat(ingest/airflow): support raw dataset urns in airflow lineage by @hsheth2 in #6854
- refactor(graphql): make graphqlengine easier to use by @anshbansal in #6865
- fix(kafka): datahub-upgrade job by @david-leifker in #6864
- feat(ingest): pass timeout config in kafka admin client api calls by @mayurinehate in #6863
- chore(ingest): loosen requirements file by @hsheth2 in #6867
- feat(ingest): upgrade pydantic version by @cccs-eric in #6858
- fix(elasticsearch): fixes out of order runId writes by @david-leifker in #6845
- chore(ingest): loosen additional requirements by @hsheth2 in #6868
- feat(ingest): bigquery/snowflake - Store last profile date in state by @treff7es in #6832
- docs(google-analytics): Correct grammatical error in README.md by @jx2lee in #6870
- feat(CI): add venv caching by @szalai1 in #6843
- feat(ingest/snowflake): handle failures gracefully and raise permission failures by @mayurinehate in #6748
- fix(runid): always update runid, except when queued by @david-leifker in #6876
- fix(ingest): conditionally include env in assertion guid by @hsheth2 in #6811
- chore(ci): update dependencies docs-website by @anshbansal in #6871
- feat(ui) - Add a custom error message for bulk edit to add clarity by @mkamalas in #6775
- docs(adding users): Refreshing the docs for adding new DataHub Users by @jjoyce0510 in #6879
- test(mce-consumer): mockbeans by @david-leifker in #6878
- feat(ingest): avoid embedding serialized json in metadata files by @hsheth2 in #6742
- refactor(gradle): move the local docker registry to common location by @david-leifker in #6881
- refactor(smoke): use env variables by @anshbansal in #6866
- fix(lint): pin pydantic version by @anshbansal in #6886
- refactor(docs): Correctly spell elasticsearch in docs by @jjoyce0510 in #6880
- fix(ingest): okta undefined variable error by @anshbansal in #6882
- fix(ci): reduce flakiness in add_users, siblings smoke test by @anshbansal in #6883
- fix(ingest): fall back to default table comment method for all Trino query errors by @marvin-roesch in #6873
- test(misc): misc test updates by @david-leifker in #6890
- deprecate(ingest): bigquery - Removing bigquery-legacy source by @treff7es in #6851
- chore(ingest): remove inferred args to MCPW, part 1 by @hsheth2 in #6819
- test(ingest/kafka-connect): make docker setup more reliable by @hsheth2 in #6902
- fix(ingest): profiling (bigquery) - Address biquery profiling query error due to timestamp vs data mismatch by @treff7es in #6874
- fix(cli): Make datahub quickstart work with latest docker compose in M1 by @pedro93 in #6891
- fix(cli): fix delete urn cli bug + stricter type annotations by @hsheth2 in #6903
- fix(ingest/airflow): reorder imports to avoid cyclical dependencies by @stijndehaes in #6719
- feat: remove jq requirement + tweak modeldocgen args by @hsheth2 in #6904
- chore(ingest): loosen pyspark and pydeequ deps by @hsheth2 in #6908
- docs(ingest/looker): fix typos + update lookml github action example by @hsheth2 in #6910
- fix(ingest/metabase): use card_id in dashboard to chart lineage by @ccpypy in #6583
- fix(es-setup): create data stream on non-aws by @szalai1 in #6926
- Adding missing Platform logos by @maggiehays in #6892
- feat(ingestion): PowerBI# Improve PowerBI source ingestion by @mohdsiddique in #6549
- Fix compose context for kafka-setup by @szalai1 in #6923
- feat(backend): Supporting Embeddable Previews for Dashboards, Charts, Datasets by @jjoyce0510 in #6875
- chore(deps): bump json5 from 2.2.1 to 2.2.3 in /docs-website by @dependabot in #6930
- chore(deps): bump json5 from 1.0.1 to 1.0.2 in /datahub-web-react by @dependabot in #6931
- fix(ci): managed ingestion test fix by @anshbansal in #6946
- feat(ingest): add
include_table_location_lineage
flag for SQL common by @hsheth2 in #6934 - feat(ingest): allow extracting snowflake tags by @frsann in #6500
- chore(ingest): unpin pydantic dep by @hsheth2 in #6909
- chore(ingest): partially revert pyspark dep from #6908 by @hsheth2 in #6954
- fix(ingest): use branch info when cloning git repos by @hsheth2 in #6937
- chore(ingest): remove inferred args to MCPW, part 2 by @hsheth2 in #6905
- fix(ingest/unity): simplify MCP generation and reporting by @hsheth2 in #6911
- chore(ci): parallelise build and test workflow to reduce time by @anshbansal in #6949
- fix(frontend): sasl.client.callback.handler.class by @szalai1 in #6962
- chore(react): remove outdated cypress tests and dependency by @anshbansal in #6948
- fix(ci): restrict GE to fix build issues by @anshbansal in #6967
- feat(queries): [Experimental] Allow customization of # of queries in Query tab via env var by @gabe-lyons in #6964
- feat(ingest/postgres): emit lineage for postgres views by @LucasRoesler in #6953
- feat(ingest/vertica): support projections and lineage in vertica by @vishalkSimplify in #6785
- fix(ingest): add missing dep for powerbi by @hsheth2 in #6969
- Docs fixes week of 12 22 by @laulpogan in #6963
- fix(ingest): unfreeze bigquery/snowflake column dataclass by @mayurinehate in #6921
- chore(frontend) Remove unused dependencies from package.json by @chriscollins3456 in #6974
- chore: misc fixes by @anshbansal in #6966
- feat(ingest/glue): emit s3 lineage for s3a and s3n schemes by @danielli-ziprecruiter in #6788
- fix(kafka-setup): Make kafka-setup run with multiple threads by @pedro93 in #6970
- feat(ingest): mark database_alias and env as deprecated by @hsheth2 in #6901
- fix(docs): Updating Tag, Glossary Term docs to point to correct GraphQL methods by @jjoyce0510 in #6965
- chore(deps): bump certifi from 2020.12.5 to 2022.12.7 in /metadata-ingestion/src/datahub/ingestion/source/feast_image by @dependabot in #6979
- fix(ingest): profiling - Fixing issue with the wrong timestamp stored in check by @treff7es in #6978
- config(quickstart): enable auto-reindex for quickstart by @david-leifker in #6983
- feat(privileges) - Create a privilege to manage glossary children recursively by @mkamalas in #6731
- chore(ingest): finish removing feast-legacy by @hsheth2 in #6985
- feat(ingest): add import descriptions of two or more nested messages by @wngus606 in #6959
- feat(docs) Add feature guide for Manual Lineage by @chriscollins3456 in #6933
- docs(rfc): Serialising GMS Updates with Preconditions by @mattmatravers in #5818
- fix(ingest/kafka-connect) support newer version of debezium by @jaegwonseo in #6943
- fix(docs): build and broken snowflake docs fix by @anshbansal in #6997
- fix(ingest): bigquery - views in case more than 1 datasets with views by @anshbansal in #6995
- fix(docs): Renaming Business Glossary Doc by @jjoyce0510 in #7001
- fix(ingest/snowflake): fix type annotations + refactor get_connect_args by @hsheth2 in #7004
- fix(docs): Changing the platform event topic name in kafka custom topic docs by @blankon123 in #7007
- fix(docs): fix name of privilege referenced in posts doc by @aditya-radhakrishnan in #7002
- fix(SSO): Correctly redirect to originally requested URL in SSO by @jjoyce0510 in #7011
- fix(ingest): remove dead code from tests by @hsheth2 in #7005
- feat(ingestion): Tableau # Embed links by @mohdsiddique in #6994
- feat(auth) Update auth cookies to have same-site none for chrome extension by @chriscollins3456 in #6976
- docs(website): DPG WIP by @maggiehays in #6998
- docs: resize datahub logo by @hsheth2 in #7014
- fix(kafka-setup): Remove reference to non-existing topic by @pedro93 in #7019
- fix(ingest): powerbi # use display name field as title for powerbi report page by @looppi in #7017
- feat(auth) Allow session ttl to be configurable by env variable by @chriscollins3456 in #7022
- fix(ui): URL Encode all Entity Profile URLs by @jjoyce0510 in #7023
- fix(ui ingest): Fix test connection when stateful ingest is enabled by @jjoyce0510 in #7013
- docs(sso) move root user warning to earlier in SSO guides by @maggiehays in #7028
- fix(ingest/looker): add clarity in chart input parsing logs by @hsheth2 in #7003
- chore(ingest): remove duplicate data_platform.json file by @hsheth2 in #7026
- feat(ingestion): PowerBI # Remove corpUserInfo aspect ingestion by @mohdsiddique in #7034
- fix(metadata-models): remove unnecessary bin folder by @jjoyce0510 in #7035
- fixing typos by @maggiehays in #7030
- feat(ingest): Ingest Previews for Looker Charts, Dashboards, and Explores by @jjoyce0510 in #6941
- fix(graphql):fix issue: autorender aspect could not be displayed on t… by @yangjiandan in #6993
- fix(config): adding quotes by @david-leifker in #7038
- fix(config): adding quotes by @david-leifker in #7040
- fix(ingest/bigquery): Turning some usage warning message to debug log as it caused confusion by @treff7es in #7024
- feat(ingest/vertica): Adding Vertica as source in Datahub UI by @Rajasekhar-Vuppala in #7010
- Removed a double set for two fields by @bda618 in #7037
New Contributors
- @marvin-roesch made their first contribution in #6873
- @stijndehaes made their first contribution in #6719
- @ccpypy made their first contribution in #6583
- @LucasRoesler made their first contribution in #6953
- @vishalkSimplify made their first contribution in #6785
- @wngus606 made their first contribution in #6959
- @jaegwonseo made their first contribution in #6943
- @blankon123 made their first contribution in #7007
- @yangjiandan made their first contribution in #6993
- @Rajasekhar-Vuppala made their first contribution in #7010
Full Changelog: v0.9.5...v0.9.6