github datahub-project/datahub v0.9.3
DataHub v0.9.3

latest releases: v0.13.2, v0.13.1, v0.13.1rc2...
17 months ago

Release Highlights

User Experience

  • Column Level Lineage Impact Analysis is live! Read more about it here
  • You can now sort Dataset field names alphabetically - this is super handy for finding columns within wide datasets that may not have an easy-to-follow order by default

  • New - an “Explore All” button on the home page, making it easier to jump into the search experience

  • Plus! We now have a “Share” button on entity pages, making it easier for you to share DataHub links with others

  • [Community Contribution] You can now assign the same user as different owner types - thanks for the contrib, @rtekal!

  • [Community Contribution] You can now see recommendations for Recently Edited entities on the homepage! - thanks for the contrib, @CorentinDuhamel

Metadata Ingestion

  • Snowflake Automated PII Classification is here! We’re eager for feedback on the utility of this feature - check out this guide, take it for a spin, and let us know what you think!
  • NEW! dbt Cloud ingestion is ready for ya - check out the module details here
  • We’ve simplified the configs required to add stateful ingestion to an ingestion source - check out the updated docs here
  • Speaking of stateful ingestion, it’s now available with:
    • Looker & LookML ingestion sources
    • [Community Contribution] Container-level ingestion – thanks for the contrib, @wangsaisai!

Developer Experience

  • [Community Contribution] For those of you deploying DataHub with Neo4j, we now support Lineage Impact analysis via Neoj4 mulithop functionality. Thanks for the contrib, @djordje-mijatovic!
  • We’ve loosened our SQLAlchemy dependencies to support Airflow 2.3+

What's Changed

  • fix(spark-lineage): Smoke test fix + smoke test m1 support by @treff7es in #6372
  • feat(ingest): supports MCEs in domain transformer by @hsheth2 in #6364
  • feat(ingest): enable container stateful ingestion by @wangsaisai in #6343
  • build(ingest): pin mypy version by @hsheth2 in #6391
  • build: use acryl's gradle-avro-plugin by @hsheth2 in #6390
  • fix(ingest): unity - add missing date type by @ms32035 in #6385
  • fix(ingest): unity-catalog - Removing unneeded sqlalchemy dependency to fix install by @treff7es in #6379
  • feat(ingest/tableau): re-authenticate if the token expires by @hsheth2 in #6380
  • fix(ingest): use profiler config settings correctly by @hsheth2 in #6354
  • fix(ingest): handle error when query returns no columns in snowflake lineage by @mayurinehate in #6404
  • fix(ingest): fix missing snowflake lineage when table_pattern is set by @mayurinehate in #6410
  • feat(ingest): loosen sqlalchemy dep & support airflow 2.3+ by @hsheth2 in #6204
  • fix(ingest/s3): add status aspect for detected s3 datasets by @mayurinehate in #6402
  • fix(ingest/snowflake): loosen snowflake connector version requirement by @hsheth2 in #6418
  • fix(mysql): fix native data type for mysql set type by @mayurinehate in #6407
  • perf(ui): virtualized schema table rows by @stanbaker in #6287
  • fix(ui) Improve HoverEntityTooltip and truncate parent glossary nodes by @chriscollins3456 in #6417
  • feat(ingest): support incremental lineage to dbt node from external platform by @mayurinehate in #6392
  • fix(ingest): init dataset props if missing in transformer by @hsheth2 in #6429
  • fix(change-event): remove unnecessary dependencies on EntityChangeEventGeneratorRegistryFactory by @aditya-radhakrishnan in #6431
  • build(deps): bump moment-timezone from 0.5.34 to 0.5.35 in /datahub-web-react by @dependabot in #5783
  • feat(frontend): Adding support to show externalUrl and institutionalMemoryFields for MLModels by @lurecas in #6053
  • feat(model): adds properties, ownership, deprecated, institutional memory and tags as aspects for data platform instance entity by @sgomezvillamor in #5728
  • docs(ingest/airflow): clarify docs around 1.x compat by @hsheth2 in #6436
  • feat(recommendations): add last edited entities by @CorentinDuhamel in #6329
  • fix(ingest): correctly compute entity change percentage by @hsheth2 in #6438
  • docs(townhall) Updating Townhall History by @maggiehays in #6336
  • Neo4j multihop support by @djordje-mijatovic in #6104
  • fix(mae-consumer): Set proper variable expansion for JMX_OPTS and JAVA_OPTS in MAE docker by @skrydal in #6378
  • docs(ingest): move prerequisite section before the ingestion recipe example by @mayurinehate in #6341
  • fix(dataset): improve glossary term load performance for datasets by @Reilman79 in #6396
  • feat(lineage) Implement CLL impact analysis for inputFields by @chriscollins3456 in #6426
  • feat(ui) Add upgrade step to enable CLL impact analysis for existing data by @chriscollins3456 in #6427
  • Added functionality to copy fieldpath and urn of each column by @Ankit-Keshari-Vituity in #6398
  • fix(ingestion): add output converters for ODBC unsuported datatype in… by @LavinaVRovine in #6134
  • fix(ui) Fix parentNodes overfetching everywhere it's used by @chriscollins3456 in #6446
  • fix(ingest): snowflake - Fixing top query trimming in snowflake by @treff7es in #6447
  • feat(elasticsearch): Updates to elasticsearch configuration, dao, tests by @david-leifker in #6269
  • chore(ingest): fix mssql lint by @hsheth2 in #6453
  • fix(ingest): add cli info to ingestion reporter by @hsheth2 in #6451
  • fix(ui) Fix glossary side browser width fluctuating by @chriscollins3456 in #6457
  • fix(python): Fix python dependencies for doc generation by @david-leifker in #6460
  • docs(website): add homepage links by @jeffmerrick in #6458
  • build(ingest): loosen jinja2 dependency for superset by @KulykDmytro in #6433
  • fix(ingest): lowercase db name in mssql ingestion by @hsheth2 in #6448
  • fix(ingest): handle missing schema in transformer by @hsheth2 in #6445
  • feat(ingest): allow specific profiler config fields to override profile_table_level_only by @hsheth2 in #6366
  • docs(enrichment) updating enrichment landing page by @maggiehays in #6286
  • fix(home-page): remove redundant getAuthenticatedUser query by @aditya-radhakrishnan in #6464
  • feat(ingest): detect old or missing docker compose by @hsheth2 in #6466
  • feat(ingestion): powerbi # Power BI report support by @mohdsiddique in #6339
  • fix(ingest/dbt): disable incremental lineage by default by @hsheth2 in #6467
  • fix(loggin): print logging timestamp in ISO8601 format instead of jus… by @szalai1 in #6474
  • docs(ingest/trino): add example of http connection by @hsheth2 in #6461
  • refactor(ui): Simplify base glossary page toolbar by @jjoyce0510 in #6469
  • revert: mssql - lowercase db name in mssql ingestion by @hsheth2 in #6481
  • build: remove Jinja2 dependency from superset by @KulykDmytro in #6476
  • fix(roles): allows role service to unassign roles by @aditya-radhakrishnan in #6434
  • fix(docs): update the Okta and Azure AD docs to clarify the point of ingesting users by @aditya-radhakrishnan in #6465
  • Highlighted the description text on search by @Ankit-Keshari-Vituity in #6400
  • Ownership type is deprecated by @jakobhanna in #6477
  • feat(ui): Adding Explore all button on home page search by @jjoyce0510 in #6468
  • fix(ingest): fix athena and GE lint errors by @hsheth2 in #6482
  • refactor(ingest): simplify stateful ingestion config by @hsheth2 in #6454
  • docs(ingest/tableau): required permissions + doc formatting by @hsheth2 in #6484
  • feat(ingest): presto - Adding presto source by @treff7es in #6459
  • fix(ui) Fix lineage graph rendering with duplicate nodes by @chriscollins3456 in #6480
  • docs(cypress): adding local cypress running instructions by @gabe-lyons in #6492
  • fix(managed ingestion): updating snowflake schema pattern placeholder text by @gabe-lyons in #6493
  • feat(ui): Adding External URLs to search preview for Dataset, Container, DataFlow, DataJob by @jjoyce0510 in #6496
  • fix(ingest/tableau): check tableName existence on datasource response by @lustefaniak in #6478
  • fix(build): do not use neo4j for dev by @anshbansal in #6501
  • docs(gms): update search example, do not use deprecated clause by @mayurinehate in #6340
  • feat(ingest): add stateful ingestion support to looker and lookml source by @mayurinehate in #6443
  • feat(ingest): dbt cloud integration by @hsheth2 in #6323
  • fix(tableau): extra defensive error-handling by @hsheth2 in #6503
  • fix(ingest): remove redundant types by @hsheth2 in #6486
  • fix(ingest/snowflake): fix lineage allow/deny pattern typo by @hsheth2 in #6506
  • fix(docs): add missing docs for 0.9.1 by @anshbansal in #6515
  • feat(ui): Introducing Share Button on Entity Pages by @jjoyce0510 in #6450
  • Added I AM auth for Opensearch by @syedzoherer in #6370
  • fix(ingest): correctly handle transformer patch semantics by @hsheth2 in #6505
  • feat(ingest/csv-enrich): handle BOM character by @hsheth2 in #6509
  • feat(airflow): support kafka hook in the airflow plugin by @hsheth2 in #6508
  • fix(patch): cover case where patch is used to create an entity by @RyanHolstien in #6504
  • build(deps): bump loader-utils from 2.0.0 to 2.0.4 in /docs-website by @dependabot in #6452
  • fix(ingest): add alias for bigquery-beta by @hsheth2 in #6521
  • feat(ingest): add config for ingesting delta table without files by @mayurinehate in #6403
  • fix(ingest): fix typo in unique count profiling by @mayurinehate in #6517
  • fix(ui) Fix roles not always displaying on page load by @chriscollins3456 in #6524
  • feat(datahub-upgrade): Added msk IAM auth as a build dependency. by @pghazanfari in #6439
  • feat(kafka-setup): Added support for MSK IAM authentication. by @pghazanfari in #6435
  • Added sorting method to fieldpath column of schema tab by @Ankit-Keshari-Vituity in #6510
  • fix(ingest): make kafka emit callback optional by @hsheth2 in #6525
  • feat(ingest): automated term classification for snowflake by @mayurinehate in #6376
  • fix(ingest): fix typo in urn utilities by @bskim45 in #6520
  • fix(ingest): fix trino properties and tests by @mayurinehate in #6518
  • fix(build): remove warnings in github actions by @anshbansal in #6512
  • fix(security): Bump ranger plugin commons dependency by @pedro93 in #6535
  • fix(ingest): kafka - properly picking doc from union type by @treff7es in #6472
  • feat(ingest): disable stateful_ingestion fail-safe by default by @hsheth2 in #6537
  • fix(ingest/airflow): respect enabled flag in airflow plugin by @hsheth2 in #6528
  • refactor(ui): Adding apollo caching to manage domains page. by @jjoyce0510 in #6494
  • refactor(recommendations): Filtering for specific entity types in recommendations by @jjoyce0510 in #6538
  • fix(ingest): handle groupby custom label case by @phongvu99 in #6456
  • build(ingest): support flake8 6.0.0 by @hsheth2 in #6540
  • fix(ui) Wrap schema field descriptions to allow read more/less always by @chriscollins3456 in #6541
  • fix(ui) Display duplicate nodes in lineage viz by @chriscollins3456 in #6526
  • style(ingest): fix lint checks for superset by @mayurinehate in #6548
  • fix(envs): remove DATASET_ENABLE_SCSI stale env var by @szalai1 in #6546
  • feat(upgrade): Make restore from backup logic generic by @pedro93 in #6536
  • feat(ingest): refractor classification mixin, support new infotypes by @mayurinehate in #6545
  • fix(ingest): bigquery - missing sqlalchemy dep and row count fix by @treff7es in #6553
  • fix(ingest): bigquery - Fixing querying non-date partition columns in profiling by @treff7es in #6554
  • feat(ingest): powerbi # scan all accessible workspaces by @looppi in #6441
  • fix(ingest): bigquery - Setting partition id for profiling data and project_id fix by @treff7es in #6558
  • fix(gms): fix java.lang.NoClassDefFoundError: com/sun/syndication/io/FeedException for apache-ranger authorizer by @mohdsiddique in #6560
  • feat(ui): Add Test Connection Support for BigQuery ingestion source by @jjoyce0510 in #6543
  • fix(contrib): Update base python image for es7-upgrade by @david-leifker in #6562
  • fix(ingest): handle docker-compose version v prefix by @hsheth2 in #6561
  • docs(ingest/kafka): add field descriptions of kafka-related configs to pydantic by @mmmeeedddsss in #6559
  • feat(platform): Support @searchable + @relationship Annotations for Timeseries Aspects by @jjoyce0510 in #6455
  • feat(models): Adding 'created', 'lastModified' timestamp to Dataset, Container, Dashboard, Chart by @jjoyce0510 in #6527
  • fix(ingest): set DataProcessInstance created ts to start time by @hsheth2 in #6566
  • feat(docs-site): fast reload command for markdown edits by @hsheth2 in #6539
  • fix(ingest): graceful error handling in snowflake classification by @mayurinehate in #6568
  • ci(label): add smoke test label by @anshbansal in #6571
  • fix(ingest): fix types changes in clickhouse sqlalchemy 0.2.3 by @mayurinehate in #6572
  • fix(tests): Misc updates for tests, auth log level, and quickstart by @david-leifker in #6491
  • feat(ui) Add owner to dataset - allow same owner with a different type by @rtekal in #6463
  • fix(verions): Update opentelemetry and updates from pr-5239 by @david-leifker in #6563
  • refactor(airflow): remove verbose log from airflow plugin by @bskim45 in #6516
  • feat(cli): remove inconsistency check command by @anshbansal in #6569
  • fix(ingest): restrict snowflake's sqlalchemy dep by @hsheth2 in #6579
  • docs(notes): add release notes for v0.1.69 managed DataHub by @anshbansal in #6573
  • fix(test): fix delete smoke test by @david-leifker in #6585

New Contributors

Full Changelog: v0.9.2...v0.9.3

Don't miss a new datahub release

NewReleases is sending notifications on new releases.