github datahub-project/datahub v0.10.5

latest releases: v0.14.1, v0.14.1rc2, v0.14.1rc1...
15 months ago

Release Highlights

NEW: Unified Search and Browse Experience

It’s here, it’s here! We are incredibly excited to roll out our re-designed, streamlined Search and Browse experience. End-users now have a one-stop-shop to search for specific data entities and browse across systems, making it easier than ever to find the most relevant and meaningful resources within DataHub.

Checkout the screenshot below and get a full walk-through in this video!

User Experience

  • Column-Level Lineage (CLL) visualization update: you can now visualize CLL relationships through DataJobs (i.e. Airflow DAGs)
  • Unique Glossary Terms: We now prevent creating duplicate Glossary Term names within a Term Group
  • Domains: You can now configure the Documentation tab to be the default landing page within a Domain
  • Formatting updates to Row Count to make large numbers more human readable (ie. 3283337 > 3.2M)
  • Stats Tab: Y-axis scale now dynamically set to reflect the minimum & maximum values, improving readability

Metadata ingestion

Ingestion Enhancements:

  • BigQuery: Set platform_instance using project_id
  • PowerBI: Ingest datasets not used in visualizations (tiles/pages
  • Kafka Connect: Ability to set platform_instance
  • Nifi: Support for basic auth
  • Presto on Hive: Extract all table properties from Hive Metastore
  • Elasticsearch: Support for basic profiling
  • Add advanced configuration for LDAP manager ingestion

Lineage Improvements:

  • Schema-aware SQL parsing to derive column-level lineage
  • Column-level lineage support for BigQuery, Tableau, and Snowflake View definitions
  • Snowflake: Extract Snowpipe S3 lineage

Developer Experience

  • Fine-grained ownership policies
  • PATCH support for DataJob Inputs/Outputs
  • New endpoints to extract size of time-series indices and truncate/cleanup time-series indices in Elasticsearch; support for bulk-deletes
  • Initial support for exception reporting via Sentry
  • New OpenAPI endpoint to get Task Status
  • SDK: Easily generate container URNs

Docs

  • Improvements to our File-Based Lineage doc, specifically focused on Fine-Grained Lineage config components (link)
  • Code examples of how to manage Posts within DataHub (link)
  • Guide to generating custom browse paths for the new search experience (link)

What's Changed

  • refractor(classification): datahub classifier init by @mayurinehate in #8193
  • fix(glue): fix typo in reported warning, report with flow_urn by @mayurinehate in #8138
  • fix(ingest/delta-lake): fix CI issues due to delta lake version bump by @mayurinehate in #8215
  • Upgrade kafka and its dependencies to 3.4 in docker compose by @jinlintt in #8161
  • chore(release): update default cli for managed ingestion by @pedro93 in #8226
  • fix(ownership): Corrects graphQL resolver for entity operations by @pedro93 in #8219
  • fix(cli/quickstart): handle docker hangs gracefully by @hsheth2 in #8211
  • fix(cli): make quickstart robust to docker race conditions by @hsheth2 in #8233
  • fix(search): tag/term should filter for both entity and field level by @anshbansal in #7881
  • docs(tests): document test eval endpoint by @anshbansal in #8227
  • feat(ingest/bigquery_v2): enable platform instance using project id by @asikowitz in #8216
  • feat(stats): make rowcount more human readable by @joshuaeilers in #8232
  • docs(es): Update aws deploy docs to correct ElasticSearch version by @iprentic in #8240
  • feat(sdk): support patches as MCPs in file source by @hsheth2 in #8220
  • fix(apiAuth): add resources where applicable and update docs by @RyanHolstien in #8234
  • feat(patch): support datajob input output by @RyanHolstien in #8190
  • feat(ingest/unity): Set external url for containers and datasets by @asikowitz in #8238
  • docs(airflow): add docs on custom operators by @matthew-coudert-cko in #7913
  • chore(release): update datahub upgrade docs by @pedro93 in #8228
  • fix(ingestion/tableau): Remove unused field documentViewId by @mohdsiddique in #8225
  • feat(ui): create fast path for immediate processing of ui sourced changes by @RyanHolstien in #8200
  • fix(ingest/druid) Handling gracefully if no table returned in a schema by @treff7es in #8203
  • fix(kafka-setup): bump kafka version by @david-leifker in #8245
  • feat(ingestion/powerbi): Ingest datasets not used in PowerBI visualization(tiles/pages) by @mohdsiddique in #8212
  • fix(sdk/dataflow): deprecate cluster and use env and platform_instance instead by @shubhamjagtap639 in #8201
  • fix(ingest): pass platform correctly to browse path v2 helper by @asikowitz in #8244
  • feat(search): Supporting Aggregations for hasX fields by @jjoyce0510 in #8241
  • fix(ingest): Call validator on the base urn as well as aspect components when ingesting by @iprentic in #8250
  • docs(website): adjust markprompt z-index so it's not covered by nav by @jeffmerrick in #8255
  • fix(patch): Fix exception when using default patch for patching missing aspects by @jjoyce0510 in #8221
  • fix(custom-search): revert underscore as quoted by @david-leifker in #8163
  • chore(ci): add back optional static sleep for tests by @anshbansal in #8258
  • chore(checkbox): darken all checkboxes by @joshuaeilers in #8248
  • chore(assertions): catch any exception on assertion delete by @joshuaeilers in #8247
  • feat(opensearch): Rollover usage events at a file size rather than time-based manner by @iprentic in #8182
  • fix(ingest/okta): Set default of okta_profile_to_username_attr to email by @asikowitz in #8263
  • feat(ui) Update Search & Browse to be a unified experience by @chriscollins3456 in #8235
  • fix(ingest/tableau): split table columns query from datasources query by @mayurinehate in #8217
  • fix(ingest/okta): Set default of okta connector to match OIDC defaults by @anshbansal in #8272
  • feat(elasticsearch): Add endpoint for getting the size of timeseries indices by @iprentic in #8265
  • feat(ingest/delete-cli): Add configurable batch size; update docs by @asikowitz in #8274
  • fix aggregation sorting in browsev2 sidebar by @joshuaeilers in #8276
  • Support de-selecting browse paths by @joshuaeilers in #8242
  • feat(cli): Initial support for sending exceptions to Sentry by @treff7es in #7172
  • fix(ingestion/powerbi): use admin api resolver to fetch modified workspaces by @mohdsiddique in #8273
  • fix: dbt-athena types mapping for complex types by @svdimchenko in #8264
  • feat(graphql) Prevent duplicate glossary term names within a group by @chriscollins3456 in #8187
  • Add retries to JavaEntityClient:deleteReferencesTo by @joshuaeilers in #8268
  • feat(ingest): Create zero usage aspects by @asikowitz in #8205
  • fix(docs) Update Chrome extension docs to reflect current reality by @chriscollins3456 in #8284
  • refactor(validations): Add URL-based Routing to Dataset Validations Tab by @jjoyce0510 in #8254
  • fix(metadata-io): retry transactions on serialization errors when using a PostgreSQL database by @Masterchen09 in #8278
  • docs(ingest/lineage): Update fine grained file lineage docs by @eboneil in #8283
  • docs(posts): add examples by @abiwill in #7688
  • chore(deprecate): remove legacy sql table by @david-leifker in #8253
  • fix(ingest/csv-enricher): Adding extra check in csv enricher to ignore non-urn urns by @treff7es in #8169
  • tests(urn): Add tests for more cases of invalid urns by @iprentic in #8285
  • feat(search): add search annotations for profile aspect by @anshbansal in #8282
  • fix(ingest/snowflake): snowflake profiling geometry type by @mayurinehate in #8279
  • refactor(unity): Remove databricks_cli and cleanup by @asikowitz in #8249
  • Sidebar local storage setting + toggle tooltip by @joshuaeilers in #8288
  • fix(ui) Fix UI issues with self-referencing column level lineage by @chriscollins3456 in #8296
  • feat(ui) Add ability to view CLL through DataJobs in lineage visualization by @chriscollins3456 in #8281
  • docs(business glossary) Update business glossary docs by @eboneil in #8287
  • docs(graphql): add developer guide for adding a new graphql endpoint by @iprentic in #8297
  • fix(test): consolidate mae-consumer test entity registry by @david-leifker in #8309
  • fix(ingestion) Fixes producing MAE events with browsePathsV2 aspect by @chriscollins3456 in #8304
  • fix(embed): set embed url to false for tableau config by @gabe-lyons in #8308
  • fix(embed): hide chart & dashboard previews if not for looker by @gabe-lyons in #8307
  • fix(ingest/unity): Pin databricks-sdk and update docs by @asikowitz in #8293
  • fix(ui) Only show search and browse V2 onboarding steps if flag is on by @chriscollins3456 in #8315
  • fix(ingest/looker): Fix typo on ViewField creation for measures by @asikowitz in #8318
  • docs(managed datahub): docs for v0.2.9 by @anshbansal in #8323
  • feat(ingest/snowflake): snowpipe s3 lineage by @mayurinehate in #8262
  • fix(ingest/postgres): fix profiling errors, skip json type column by @mayurinehate in #8291
  • tests(elasticsearch): Add fixture test for basic scroll functionality by @iprentic in #8321
  • feat(tableau): add config knobs for excluding external links from tableau by @gabe-lyons in #8314
  • fix(documentation): remove links from associatedUrn by @joshuaeilers in #8319
  • fix(browsev2): improved error handling by @joshuaeilers in #8326
  • fix(search) Add facets list to our cache key to avoid cache collisions by @chriscollins3456 in #8327
  • feat(elasticsearch): Add rest.li endpoint that does truncation cleanup of a timeseries index by @iprentic in #8277
  • Container link in browse v2 sidebar by @joshuaeilers in #8305
  • fix(browse): try to prevent overlapping pagination calls by @joshuaeilers in #8329
  • feat(usage): add max width to users tooltip by @gabe-lyons in #8335
  • feat(usagestats): Optimize elasticsearch query for usage stats aggregations by @iprentic in #8333
  • feat(ingest): add YamlFileUpdater utility by @hsheth2 in #8266
  • feat(ui) Show Acryl information with button and banner behind flag by @chriscollins3456 in #8330
  • test(ingest/trino): xfail test to unblock CI by @asikowitz in #8340
  • fix(restli): Add docs for get task status, and fix hostname regex by @iprentic in #8341
  • docs(lineage): add read lineage example by @eboneil in #8322
  • fix(async): submit additional default aspects only when not in async mode by @RyanHolstien in #8320
  • feat(auth): Fine grained ownership policies by @skrydal in #7499
  • fix(ingest/s3): Fix for flaky s3 test - uploading s3 files in consistent order by @treff7es in #8367
  • fix(ingest/airflow): Remove info log on import by @fjmacagno in #8246
  • fix(ui) Update copy of the demo site acryl banner by @chriscollins3456 in #8370
  • test(ingest/mysql): Configure sql_server tests for arm64 by @asikowitz in #8360
  • fix(browse): filter entities by whether they might exist in the instance by @joshuaeilers in #8355
  • ci(docs): add missing deps for lxml package for vercel by @hsheth2 in #8372
  • feat(browsepathv2): enable incremental update browsepath by @david-leifker in #8354
  • chore(smoke-test): use a more recent ingestion cli version in tests by @david-leifker in #8374
  • feat(stats): show size in bytes and scale at y=min by @joshuaeilers in #8375
  • fix(schema-registry): fix internal schema reg with custom duhe topic … by @david-leifker in #8371
  • fix(java) Add try catch block when backfilling browse v2 by @chriscollins3456 in #8377
  • feat(ingest): Add advanced configuration for LDAP manager ingestion by @bda618 in #7784
  • fix(ingest): update pydantic helpers to address unique name issue by @mayurinehate in #8324
  • fix(cli): local variable reference before assignment by @segun-s in #8222
  • feat(ingest): Turn on browse path v2 creation by @asikowitz in #8342
  • chore(ingest/delta-lake): cleanup import error handling by @hsheth2 in #8230
  • test(ingest/nifi): Configure nifi tests for arm64 by @asikowitz in #8363
  • build(ingest): Pin pydeequ to unblock CI by @asikowitz in #8381
  • fix(ingest/sql-common): Fix profile_table_level_only by @asikowitz in #8331
  • feat(ingest): schema-aware SQL parsing for column-level lineage by @hsheth2 in #8334
  • fix(config) Set search and browse flags default off by @chriscollins3456 in #8378
  • test(ingest/kafka): Configure kafka connect tests for arm64 by @asikowitz in #8362
  • fix(ui): fix a too much recursion error when column lineage is highlighted by @Masterchen09 in #8207
  • fix(ingest/s3): Deequ import rearragement by @treff7es in #8389
  • feat(ingest): Add disable flag for TopicRecordNameStrategy by @segun-s in #8224
  • refactor(graphql): make graphql engine extensible by @shirshanka in #8394
  • feat(ui) Allow a configurable default tab for domain entity profile page by @chriscollins3456 in #8316
  • test(ingest): Aspect level golden file comparison by @asikowitz in #8310
  • test(ingest/airflow): Fix test for airflow 2.6.3 by @asikowitz in #8393
  • feat(ingest/bigquery): support column-level lineage by @hsheth2 in #8382
  • build(ingest): Inline import testing utils for check cli by @asikowitz in #8400
  • refactor(ui): uniform ordering of items on the entities sidebar section by @sudhakarast in #8365
  • test(ingest/testing-utils): Add back delta info ignore path by @asikowitz in #8402
  • fix(ingest/bigquery): skip self-references when generating lineage by @hsheth2 in #8403
  • feat(ingest): datamodel to ingest organisation role metadata for a dataset by @sheeru in #8267
  • test(ingest/kafka-connect): Attempt to fix flaky test by @asikowitz in #8404
  • feat(ingest/dbt-cloud): reduce graphql query complexity by @hsheth2 in #8390
  • fix(ingest/snowflake): fix azure cloud region ids in external url by @mayurinehate in #8376
  • feat(elasticsearch): Implement optimization to use reindexing instead… by @iprentic in #8352
  • feat(ingest/presto-on-hive): Extracting all the table properties from Hive Metastore by @treff7es in #8348
  • feat(openapi): Add openapi endpoint for getting task status by @iprentic in #8391
  • feat(ingest/airflow): able to set platform_instance in Dataset by @dungdm93 in #8313
  • test(ingest/minio): Configure delta lake minio tests for arm64 by @asikowitz in #8364
  • docs(ingest): Add warning for Python 3.7 deprecation by @asikowitz in #8411
  • fix(ingest/tableau): graceful handling of get all datasources failure… by @mayurinehate in #8406
  • fix(owner): Corrects ownership aspect generation during update operations by @pedro93 in #8399
  • chore(stats): change default stats lookback by @anshbansal in #8408
  • feat(ingest/kafka-connect): allow setting platform_instance for kafka… by @mayurinehate in #8299
  • fix(ingestion/powerbi): increment msal version by @mohdsiddique in #8385
  • docs(perf-test) Update README by @eboneil in #8410
  • fix(ingest/s3): fix test flakiness by @treff7es in #8416
  • fix(ingest): tweak ingestion exit codes by @hsheth2 in #8418
  • build(ingest/boto3): Update boto3-stubs to fix CI by @asikowitz in #8425
  • feat(ingest/snowflake): View CLL from sql parsing of view definition by @asikowitz in #8419
  • fix(ingest/snowflake): Add sqlglot as snowflake dependency by @asikowitz in #8427
  • fix(schema-reg): allow other response codes from schema registry check by @david-leifker in #8302
  • fix: add docs on update description via graphQL by @yoonhyejin in #8289
  • docs(databricks/spark-lineage): Fix incorrect statement by @asikowitz in #8423
  • feat(browsev2): styling updates and select platform by @joshuaeilers in #8428
  • fix(ui ingestion): fixing issue where stale fields could stick around when changing recipes by @gabe-lyons in #8421
  • ci: workarounds for pyyaml installation by @hsheth2 in #8435
  • build(ingest/boto3): Update boto3-stubs to fix CI by @asikowitz in #8452
  • fix(ingestion-redshift): Fix Redshift ingestion logs by @arunvasudevan in #8454
  • fix(ingest/bigquery): make sql parsing more robust by @hsheth2 in #8450
  • fix(GreatExpections): AssertionRunEventClass does not match the examp… by @JifeiMei in #8243
  • chore(ingest): hide ignore old/new state options by @hsheth2 in #8438
  • docs(env): add env vars authentication by @david-leifker in #8436
  • feat(graphql-plugins): add ability for plugins to call back to core e… by @shirshanka in #8449
  • feat(io): refactor metadata-io module by @RyanHolstien in #8306
  • feat(ingest/mysql): Add estimate row count for mysql by @eboneil in #8420
  • ingest(elasticsearch): add basic profiling by @anshbansal in #8351
  • feat(ingest/lookml): fail when nothing was produced by @hsheth2 in #8464
  • chore(ingest): drop bigquery-beta and snowflake-beta aliases by @hsheth2 in #8451
  • feat(ingest/nifi): add support for basic auth in nifi by @mayurinehate in #8457
  • Fix query_tab test that was failing on CI run by @kkorchak in #8463
  • ingest(mysql): add storage bytes information by @anshbansal in #8294
  • fix(cache) Fix caching bug with new search filters by @chriscollins3456 in #8434
  • fix(browseV2) Escape forward slashes in browse v2 query by @chriscollins3456 in #8446
  • fix(ingestion/powerbi-report-srever): handle requests.exceptions.JSONDecodeError by @mohdsiddique in #8442
  • feat(sdk): easily generate container urns by @hsheth2 in #8198
  • Update presto-on-hive URN in data_platforms.json by @gabe-lyons in #8484
  • fix(mysql): getting table name correctly by @anshbansal in #8476
  • feat(ingest/elastic): reduce number of calls made by @anshbansal in #8477
  • refactor(search): Support searching multiple entities in search() as in scroll() by @iprentic in #8461
  • fix(ingest): Generate browse paths v2 for more sources; properly pass platform_instance by @asikowitz in #8501
  • chore(ingest): add example of training metric/hyper parameters by @anshbansal in #8491
  • feat(ingest): enable pipeline reporting by default by @hsheth2 in #8472
  • feat(docs) Add guide for generating browsePathsV2 aspects by @chriscollins3456 in #8448
  • fix(browsepathv2): default browse path with empty space by @anshbansal in #8503
  • docs: add docs on sqlglot lineage by @hsheth2 in #8482
  • feat(search ui): Adding support for pluggable filter rendering by @jjoyce0510 in #8455
  • fix(ingest): hint at --update-golden-files option when tests fail by @hsheth2 in #8507
  • ci: fix commandLine usage in build.gradle by @hsheth2 in #8510
  • fix(ui) Fix broken dataPlatformInstance references in browseV2 by @chriscollins3456 in #8485
  • fix(dataProduct) Show entity count excluding soft deleted entities by @chriscollins3456 in #8444
  • feat(ui): Adding support for rendering assertion health status in Dataset Search Card, Search Preview, etc. by @jjoyce0510 in #8460
  • docs(ingest/bigquery): add permissions to profile google drive backed… by @mayurinehate in #8490
  • chore(ingest/tableau): miscellaneous cleanup refractor by @mayurinehate in #8417
  • docs(ingest/lookml): clarify connection map config by @hsheth2 in #8508
  • config(ebean): add ebean retry configuration by @david-leifker in #8500
  • fix(ingest): respect max_threads for ingestion reporter by @hsheth2 in #8521
  • chore(ingest): bump sqllineage and sqlparse by @hsheth2 in #8481
  • fix(search): fix lightning cache enable logic by @david-leifker in #8522
  • docs(docker): document docker container dependency tree by @david-leifker in #8496
  • feat(lineage): Apply search flags to scroll query in LineageSearchService by @iprentic in #8518
  • feat(search): Throw exception instead of returning an empty response from scroll in an error case by @iprentic in #8517
  • fix(gms): GMS hang when upgrade image #8270 by @yangjiandan in #8271
  • fix(ui): Allows deselection of members in add members modal for a group by @Sukeerthi31 in #8349
  • fix(ui) Remove initial redirect logic from frontend by @chriscollins3456 in #8401
  • fix(sso) - Add redirect_uri to authenticate route on 401 error by @mkamalas in #8346
  • fix(auth): ignore case when comparing http headers by @lix-mms in #8356
  • fix(ui): use locale lowercase when filtering columns of an entity in the lineage by @Masterchen09 in #8213
  • feat(elasticsearch): allow bulk delete by @david-leifker in #8424
  • feat(metrics): add metrics for aspect write and bytes by @david-leifker in #8526
  • fix(ingest/build): Fix sagemaker mypy and flake8 issues by @treff7es in #8530
  • feat(siblings): hiding non-existant siblings in FE by @gabe-lyons in #8528
  • fix(ingest): pin boto3-stubs in CI by @hsheth2 in #8527
  • docs: small update to homepage by @shirshanka in #8483
  • fix(ingest): remove duplication of tags by @anshbansal in #8532
  • ci: reduce git fetch depth by @hsheth2 in #8473
  • feat(ingest/vertica): performance improvement and bug fixes by @vishalkSimplify in #8328
  • test(ingest): test case statements with sql parser by @hsheth2 in #8437
  • feat(ingestion/tableau): support column level lineage for custom sql by @mohdsiddique in #8466
  • fix(ingest/json-schema): convert non-string enums to strings by @benjamin-awd in #8479
  • feat(browseV2): add browseV2 logic to system update by @RyanHolstien in #8506
  • feat(cli): Adds ability to upload recipes to DataHub's UI by @pedro93 in #8317
  • feat(presto-on-hive): allow v1 fieldpaths in the presto-on-hive source by @gabe-lyons in #8474
  • fix(ui) Make multiple small updates to new search and browse by @chriscollins3456 in #8524
  • feat(search): Allow aggregating on facets that are not explicitly part of default filter set by @jjoyce0510 in #8540
  • fix(test): increase siblings.js test stability by @david-leifker in #8542

New Contributors

Full Changelog: v0.10.4...v0.10.5

Don't miss a new datahub release

NewReleases is sending notifications on new releases.