github datahub-project/datahub v0.10.0
DataHub v0.10.0

latest releases: v0.13.0, v0.12.1, v0.12.1rc2...
13 months ago

Release Highlights

Potential Downtime

This release introduces substantial improvements to search functionality which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

User Experience

We have some really exciting improvements to the DataHub user experience in this release!

Improved documentation editor, contributed by @ngamanda and the Grab Team.
This work provides a much more intuitive documentation editing experience within the UI, providing “what you see is what you get” formatting & removing the need for markdown expertise.

Additionally, you can easily:

  • Add links to other entities/users within DataHub
  • embed and resize tables & images
  • toggle between font sizes and formats
  • embed syntax-highlighted code blocks

Filter lineage graphs based on time windows
You can now easily see the full lineage graph of an entity at a specific point in time. This makes it much easier to understand how interdependencies have evolved over time and to troubleshoot data issues in the past.

Improvements in Search
As noted above, we have rolled out substantial improvements to Search functionality, making it easier than ever for end-user to find the entities that matter most. This release includes:

  • Stemm & Synonyms
  • Search by full or partial URN
  • Autocomplete improvements
  • Quoted search analyzer for exact & prefix match

Metadata Ingestion

Here are some of the most notable ingestion-related improvements:

  • Redshift: You can now extract lineage information from unload queries – thanks for the contrib, @mmmeeedddsss
  • PowerBI: Ingestion now maps Workspaces to DataHub Containers – thanks for the contrib, @looppi
  • BigQuery: You can now extract lineage metadata from the Catalog API – thanks for the crontrib, @PatrickfBraz
  • Glue: Ingestion now uses table name as the human-readable name – thanks for the contrib, @danielcmessias

Developer Experience

  • This release introduces DataHub Lite - a new experimental lightweight implementation of DataHub. It is intended to enable local developer tooling use-cases such as simple access to metadata for scripts and other tools. DataHub Lite is compatible with the DataHub metadata format and all the ingestion connectors that DataHub supports. Checkout the docs here.

Breaking Changes

#7103 This should only impact users who have configured explicit non-default names for DataHub's Kafka topics. The environment variables used to configure Kafka topics for DataHub used in the kafka-setup docker image have been updated to be in-line with other DataHub components, for more info see our docs on Configuring Kafka in DataHub . They have been suffixed with _TOPIC where as now the correct suffix is _TOPIC_NAME. This change should not affect any user who is using default Kafka names.

What's Changed

  • fix(ci): only scan on master branch by @anshbansal in #7047
  • fix(ci): use trivy offline scanning by @anshbansal in #7050
  • docs(get-started) Simplify copy on Get Started landing page by @maggiehays in #7043
  • fix(ingest/kafka): fix ResourceType import error for confluent_kafka<1.9.0 by @mayurinehate in #7046
  • docs(dbt): fix indentation in dbt meta mapping docs by @jx2lee in #7045
  • fix(ingest): temporarily disable vertica tests by @hsheth2 in #7059
  • feat(editor): improve documentation editor using Remirror by @ngamanda in #6631
  • fix(bootstrap): add EDIT_LINEAGE privilege to some default policies by @aditya-radhakrishnan in #7060
  • feat(ingest): add entity registry in codegen by @hsheth2 in #6984
  • feat(ingest): extract powerbi endorsements to tags by @looppi in #6638
  • feat(ingestion): pull metabase database, schema names from raw query and api by @remisalmon in #7039
  • fix(ingest): support multiple entity_registry sections by @hsheth2 in #7066
  • ci(ingest): add flag to skip tests but run codegen during release by @hsheth2 in #7067
  • fix(ingest): preserve dbt column name casing by @hsheth2 in #7063
  • fix(ingest/tableau): fix node limit exceeded error for workbooks query by @mayurinehate in #7068
  • fix(build/airflow): Fixing gradlew path by @treff7es in #7069
  • feat(ingest): support snapshots in dbt and dbt-cloud by @hsheth2 in #7062
  • fix(ui) Fix duplicate schema field rendering with siblings by @chriscollins3456 in #7057
  • refactor(ingest/athena): Replace s3_staging_dir parameter in Athena source with query_result_location by @bossenti in #7044
  • feat(ingest): fix handling of unions with aliases in post restli conversion by @hsheth2 in #7058
  • fix(ui) Make checkboxes in ingestion forms easier to see by @chriscollins3456 in #7061
  • fix(ingest): support git clone of non-github repos by @hsheth2 in #7065
  • feat(ingest): reporting revamp, part 1 by @hsheth2 in #7031
  • fix(secret-service): fix default encrypt key by @david-leifker in #7074
  • feat(datahub-lite): introduces a new experimental lightweight impleme… by @shirshanka in #7052
  • feat(datahub-lite): adding tab completion, small serialization fixes by @shirshanka in #7079
  • docs: add docs for managed DataHub v0.1.72 by @anshbansal in #7070
  • docs(readme): add inovex as adopter by @DSchmidtDev in #7077
  • docs: add warning about clearing cookies for login by @anshbansal in #7084
  • feat(cache): add hazelcast distributed cache option by @RyanHolstien in #6645
  • docs(datahub-lite): small improvement for zsh tab completion by @shirshanka in #7085
  • fix(ingest/bigquery): clear stateful ingestion correctly by @hsheth2 in #7075
  • fix(graphql): Return with appropriate status code instead of stacktrace by @szalai1 in #7086
  • fix(sso): Clear cookies on SSO redirect error by @aditya-radhakrishnan in #7088
  • fix(docs): add missing mutation literal by @ruedigerblock in #7082
  • fix(ui): display the correct access token expiry in AccessTokenModal by @ngamanda in #7078
  • fix(cli/lite): fix datahub lite serve command by @hsheth2 in #7089
  • fix(profiling): Fix syntax for APPROX_COUNT_DISTINCT on bigquery and snowflake by @feljen in #7087
  • fix(ingest): fix logic error of google protobuf wrapper type. by @wngus606 in #7076
  • feat(ui): Documentation Editor Improvements by @jjoyce0510 in #7072
  • fix(uri): marks uri field as deprecated, removes problem code, and adds coercer for usages of URI typeref by @RyanHolstien in #7093
  • fix(build): postgres docker secret by @david-leifker in #7092
  • fix(ingest/snowflake): handle corrupted snowflake OCSP cache file by @hsheth2 in #7095
  • refactor(ingest): Refactoring container creation to common place by @treff7es in #6877
  • feat(ingest): move datahub-lite to optional dep and add shim when missing by @hsheth2 in #7097
  • fix(docker): support non amd64 dockerize in setup containers by @tonycsoka in #7091
  • test(ingest): fix kafka admin client mocking by @hsheth2 in #7098
  • fix(build): Fix postgres setup gha by @david-leifker in #7104
  • fix(ingest/profile): properly quoting approx_count_distinct by @treff7es in #7101
  • style(models): Replaces non-ASCII charactes in pdl files with ASCII c… by @nmbryant in #7105
  • feat(ingest): hide cartesian product warnings in GE profiler by @hsheth2 in #7096
  • feat(ingest): add removing partition pattern in spark lineage by @ssilb4 in #6605
  • feat(redshift): Fetch lineage from unload queries by @mmmeeedddsss in #7041
  • fix(ci): do not confirm on force for deletion by @anshbansal in #7106
  • fix(analytics): add missing usage events causing warning in logs by @anshbansal in #7109
  • feat(quickstart): Remove kafka-setup as a hard deployment requirement by @pedro93 in #7073
  • fix(tests): Fixing add_users smoke test by @jjoyce0510 in #7116
  • chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 in /docs-website by @dependabot in #7122
  • docs(gms): clarify behavior of soft deletion in UI by @aditya-radhakrishnan in #7117
  • fix(kafka-setup): Make topic name consistent with other images by @pedro93 in #7103
  • chore(deps): bump ua-parser-js from 0.7.32 to 0.7.33 in /datahub-web-react by @dependabot in #7123
  • feat(ingest): powerbi # add powerbi workspaces to containers by @looppi in #6532
  • fix(diffMode): prevent misconfiguration of diff mode by @RyanHolstien in #7127
  • fix(ui) Display glossary term name in analytics page properly by @chriscollins3456 in #7128
  • fix(ui): only use visible and enabled tabs for selected tab and routing in entity profiles by @Masterchen09 in #6629
  • fix(htrace): remove htrace jar by @szalai1 in #7126
  • feat(datahub-lite): simplify get response by @shirshanka in #7131
  • fix(doc/biquery): Updating bigquery capability doc by @treff7es in #7136
  • fix(ci): do not fail fast for matrix runs by @anshbansal in #7132
  • refactor(ui): refactor capitalization of platform name and sub types by @Masterchen09 in #7099
  • refactor(cli): extract method, change wording by @anshbansal in #7134
  • docs(lineage): Updating Lineage feature guide by @maggiehays in #6257
  • removing WIP by @laulpogan in #7140
  • docs(oidc): Updating + improving docs around OIDC configuration by @jjoyce0510 in #7141
  • fix(ingest): add message proto check by @tinolyu in #7130
  • fix(ingest): use snowflake median function in profiling by @hsheth2 in #6987
  • feat(ui): allow removing parentNodes of Glossary Nodes and Glossary Terms by @ngamanda in #7135
  • feat(ui) Add new embedded profile to be displayed in extension by @chriscollins3456 in #7113
  • feat(ingest): add --log-file option and show CLI logs in UI report by @hsheth2 in #7118
  • fix(misc): NPE and GraphQL case fixes by @david-leifker in #7149
  • fix(ingest/snowflake): fix regression in approx count distinct by @hsheth2 in #7146
  • [docs] fix typo / add missing line for docker compose / attach overwriting system action config for confluent. by @kdongho in #7142
  • reordering sidebar and adding homepage to apis by @laulpogan in #7139
  • fix(ingestion): powerbi # Not all arguments converted to string by @mohdsiddique in #7157
  • fix(ui): Sort top users by their query count in datasets stats tab by @jaykadambi in #7148
  • refactor(ui): Updates to Manual Lineage search by @jjoyce0510 in #7151
  • feat(ui) Build entity doesn't exist page for entity profiles by @chriscollins3456 in #7150
  • ci(ingest): fix broken CI workflow for metadata-ingestion by @hsheth2 in #7161
  • fix(ingest): azuread group mapping do not stop ingestion by @anshbansal in #7169
  • fix(docs): Fixes links to docs templates by @viniciusdsmello in #7171
  • refactor(ui ingest): Allow enabling / disabling ingestion schedule easily by @jjoyce0510 in #7162
  • fix(ingest): switch various sources to auto_stale_entity_removal helper by @hsheth2 in #7158
  • docs(townhall) Update Townhall History doc by @maggiehays in #7180
  • test(ingest/delta-lake): fix spurious directory creation by @hsheth2 in #7179
  • feat: add a linter for github actions workflows by @hsheth2 in #7178
  • fix(quickstart): adding back kafka-setup by @szalai1 in #7181
  • fix(docs) Fix broken links in ingestion docs by @chriscollins3456 in #7183
  • fix(ingest/GX): fix snowflake urn generated from connection string by @mayurinehate in #7173
  • feat(ingest): switch dbt to use auto_stale_entity_removal by @hsheth2 in #7160
  • fix(ingest): fix issue in glue tests by @hsheth2 in #7185
  • fix(log): logging timestamp in ISO8601 format instead of time by @anshbansal in #7188
  • feat(ingest): bigquery - extracts lineage metadata from catalog api by @PatrickfBraz in #7137
  • fix(ingest/tableau): show warning about token expiry for PATs by @hsheth2 in #7187
  • fix(ingest/vertica): Fixing missing container properties by @treff7es in #7197
  • chore(deps): bump Netty from 4.1.85.Final to 4.1.86.Final by @janhicken in #7191
  • docs(ingestion): powerbi # Add permission for DAX and mashup expressions by @mohdsiddique in #7195
  • feat(elasticsearch): Elasticsearch improvements by @david-leifker in #6894
  • fix(test): spark-lineage # build task as dependency of integrationTest by @mohdsiddique in #7189
  • chore(sample): add status removed aspect for sample data by @anshbansal in #7203
  • docs(managed datahub): release notes for v0.1.73 by @anshbansal in #7194
  • fix(bootstrapdata): update timestamp to be in the last 1 year by @szalai1 in #7206
  • fix(ingest/bigquery): quoting for APPROX_COUNT_DISTINCT in BigQuery by @mryorik in #7207
  • fix(versioning): Ensure that CLI version is always dot-delimited even in minor release versions by @jjoyce0510 in #7200
  • fix(test): missing variables in test causing error in logs by @anshbansal in #7210
  • feat(mlModel): mark downstream jobs as ml model downstreams lineage by @mayurinehate in #7205
  • ci(): fix datahub-upgrade quickstart regression by @hsheth2 in #7217
  • feat(ingest): Add custom properties to the ldap ingestion by @bda618 in #7125
  • fix(ingest): upgrade feast to avoid build issues by @hsheth2 in #7218
  • fix(ui) Increase the number of assertions that we query for in tab by @chriscollins3456 in #7215
  • fix(ci): trivy code scanning fix by @anshbansal in #7232
  • feat(glue): Use table name as human-readable name for Glue ingestion by @danielcmessias in #7213
  • feat(ui): Supporting display of columns and storage count in previews by @jjoyce0510 in #7198
  • fix(gms): Fixes delete references for single relationship aspects by @pedro93 in #7211
  • docs(ingest/lineage): clarify name field in entity config for file based lineage by @mayurinehate in #7225
  • fix(ui): typo 'Documenataion' by @vojtechneradatos in #7227
  • fix(cli/delete): skip references prompt if deleting an aspect by @hsheth2 in #7220
  • fix(ingest/tableau): implement workbook_page_size parameter by @hsheth2 in #7216
  • fix(gms): Corrects MCP generation in async mode by @pedro93 in #7214
  • fix(ingest): redshift # build late binding view lineage when sql written in upper case by @looppi in #7223
  • fix(siblings) Fix editing of schema fields for siblings with unequal schemas by @chriscollins3456 in #7199
  • fix(ingest-idp): emit empty GroupMembership when there are no groups by @aditya-radhakrishnan in #7196
  • feat(lineage): add time filtering for lineage edges by @aditya-radhakrishnan in #7159
  • chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /docs-website by @dependabot in #7230
  • refactor(docs): Minor language updates for kafka source doc header by @jjoyce0510 in #7237
  • docs(website): fix feature availability dark mode styles by @jeffmerrick in #7233
  • chore(log/docs): improve error log, docs by @anshbansal in #7239
  • fix(dev.sh): Add context to kafka-setup build by @szalai1 in #7234
  • feat(cli): improve docker quickstart by @hsheth2 in #7184
  • fix(elasticsearch): fix orphan index clean up pattern, consistent top… by @david-leifker in #7242
  • chore(deps): bump http-cache-semantics from 4.1.0 to 4.1.1 in /datahub-web-react by @dependabot in #7231
  • Update data_platforms.json by @RainerGa in #7244
  • fix(autocomplete): Use normal properties name instead of urn name in autocomplete by @jjoyce0510 in #7236
  • fix(frontend logs): Silencing harmless log messages (and adding path for future) by @jjoyce0510 in #7254
  • fix(docker): fix ability to use non-default reg by @david-leifker in #7250
  • logging(elasticsearch): improve messaging in orphan index detection by @david-leifker in #7246
  • chore(ci): update base image dependencies by @anshbansal in #7248
  • docs(graphql): remove reference of non-existent gms.graphql by @mayurinehate in #7240
  • Add graphql error and call metrics at startuptime by @szalai1 in #7226
  • docs(ingest): update kafka connect doc, simplify starter recipe by @mayurinehate in #7243
  • fix(cli): update message when pulling docker images by @mayurinehate in #7241
  • fix(ingest/tableau): handle missing query in tableau views by @hsheth2 in #7186
  • feat(ingest/s3): use latest file to infer schema metadata by @mayurinehate in #7202
  • fix(schema-blame): check if list of ChangeTransactions is empty before processing by @aditya-radhakrishnan in #7263
  • fix(change-events): guard against NPE's by @aditya-radhakrishnan in #7264
  • fix(docker): add env variable to control mysql setup image, sort dock… by @shirshanka in #7266
  • chore(logs): clean logs scanning location by @anshbansal in #7261
  • fix(profile): use department name if available by @anshbansal in #7257
  • fix(async ingest): Fix async ingest path by @pedro93 in #7269
  • fix(compose): fix override file missing container by @david-leifker in #7270
  • fix(ui): fix spacing on share buttons by @aditya-radhakrishnan in #7272

New Contributors

Full Changelog: v0.9.6...v0.10.0

Don't miss a new datahub release

NewReleases is sending notifications on new releases.