github datahub-project/datahub v0.11.0

latest releases: v0.14.1, v0.14.1rc2, v0.14.1rc1...
14 months ago

Release Highlights

Potential Downtime

This release introduces substantial improvements to search ranking which require reindexing indices.

During the reindexing:

  • a system-update job will set indices to read-only and create a backup/clone of each index
  • new components will be prevented from start-up until the reindex completes
  • Helm deployments will go into read-only mode and new ingestion runs will fail

This process can take anywhere from 5 minutes to multiple hours; as a rough estimate, please expect it to take 1 hour for every 2.3 million entities. After the reindex is complete, please check your ingestion run to re-run any that did not complete.

User Experience

New Search and Browse Experience

We have some really exciting improvements to the DataHub user experience in this release! The new search and browse experience, which was first made available in the previous release behind a feature flag, is now on by default. Check out our release notes for v0.10.5 to get more information and documentation on this new Browse experience.

Learn all about the new Search and Browse experience!

Improvements to Search

In addition to the ranking changes mentioned above, this release includes changes to the highlighting of search entities to understand why they match your query. You can also sort your results alphabetically or by last updated times, in addition to relevance. In this release, we suggest a correction if your query has a typo in it.

See the Search improvements in action!

Manage Home Page Posts

In this release we now enable you to create and delete pinned announcements on your DataHub homepage! If you have the “Manage Home Page Posts” platform privilege you’ll see a new section in settings called “Home Page Posts” where you can create and delete text posts and link posts that your users see on the home page.

OpenAPI Endpoints Expanded

OpenAPI entity and aspect endpoints expanded to improve developer experience when using this API with additional aspects to be added in the near future.

Metadata ingestion

Added support for Confluent S3 Sink Connector, extracting stored procedures and jobs from mssql, and snowflake shares. Additionally, sql parsing source now converts query logs into CLL and usage.

Developer Experience

The CLI now supports recursive deletes.

Versioned documentation

Starting from this release, we support versioned documentation on the datahub docs site! Select the version you’re on and browse docs specifically at that version.

Performance Improvements

  • Batching of default aspects on initial ingestion (SQL)
  • Improvements to multi-threading. Ingestion recipes, if previously reduced to 1 thread, can be restored to the 15 thread default.
  • Gradle 7 upgrade moderately improves build speed
  • DataHub Ingestion slim images reduced in size by 2GB+

Important Bug Fixes

  • Glue Schema Registry fixed

Deprecation Notice

  • MAE Events are no longer produced. MAE events have been deprecated for over a year.

What's Changed

  • feat(ingest/presto-on-hive): enable partition key for presto-on-hive by @zheyu001 in #8380
  • feat(classification): allow parallelisation to reduce time by @mayurinehate in #8368
  • feat(ingest): Add metabase database id to platform instance mapping by @k-popov in #8359
  • feat(ingest): add ability to read other method types than GET for OAS ingest recipes by @jsmilkstein in #8303
  • fix(ingest): fix data platform urn in dataset_urn_to_key and dataset_key_to_urn by @Masterchen09 in #8209
  • fix(ingest/s3): wrong sorting in case of multi-partition key by @anshbansal in #8536
  • fix(ingest/presto): fix presto on hive test failures by @hsheth2 in #8548
  • Cypress test for managing groups by @kkorchak in #8520
  • feat(ingest/kafka-connect): add support for Confluent S3 Sink Connector by @tusharm in #8298
  • Variable rename - Allows deselection of members in add members modal for a group by @Sukeerthi31 in #8529
  • fix(ingest/s3): catch no such bucket exception instead of failing by @anshbansal in #8549
  • fix(ingest): add tableau sqlglot dep by @hsheth2 in #8552
  • fix(ingetion/mssql): convert dataset urns to lowercase by @siddiquebagwan in #8551
  • Fix flaky add_user smoke test by @kkorchak in #8471
  • feat(ci): use docker registry cache by @hsheth2 in #8544
  • fix(glue): restore glue configurations by @RyanHolstien in #8533
  • build(release): Update files for 0.10.5 release by @iprentic in #8556
  • docs(release): Update updating-datahub.md for 0.10.5 release by @iprentic in #8557
  • feat(ingestion/snowflake): use user email-id in urn generation for top users stat by @siddiquebagwan in #8513
  • docs(development.md): Minor grammatical error by @PauloGoncalvesLima in #8558
  • fix(usage): Update index lifecycle policy to not delete old datahub usage events by @iprentic in #8565
  • fix(ui): Simplify background color for Entity Health Status popover by @jjoyce0510 in #8559
  • fix: add --write args on pre-commit prettier by @yoonhyejin in #8560
  • docs(observe): Add feature doc for Freshness Assertions by @jjoyce0510 in #8547
  • docs(updating): add details on Unified Search & Browse experience by @maggiehays in #8568
  • fix: fix features section by @yoonhyejin in #8571
  • feat(ingest): allow lower freq profiling based on date of month/day of week by @anshbansal in #8489
  • fix(stats): default to 3 months by @anshbansal in #8566
  • fix(aspect): count query only for relevant aspect index by @iprentic in #8569
  • feat(quickstart): bump quickstart start periods more by @hsheth2 in #8573
  • Origin/cypress test for managing policies by @kkorchak in #8554
  • feat(ui) Show source documentation when editing entity documentation by @chriscollins3456 in #8516
  • fix(ingest): handle redaction of configs with int keys by @hsheth2 in #8545
  • fix(ingest/snowflake): maintain qualified name casing, do not lowercase by @mayurinehate in #8574
  • feat(docs): add github repo links to readme and docs by @yoonhyejin in #8422
  • feat(ebean): Add metric in ebean aspect DAO for failed tries, as well as failed operation… by @iprentic in #8576
  • refactor(search) Use search across multiple-entities API, deprecate Aggregator classes by @iprentic in #8498
  • feat(siblings): dont show multiple platform icons if the siblings are ghost nodes by @gabe-lyons in #8543
  • docs(lineage): Add description to make_lineage_mce by @eboneil in #8596
  • doc(ingest/log): failure log at pipeline level document by @anshbansal in #8591
  • Dataset ownership test by @kkorchak in #8583
  • doc(release): release notes for 0.2.10 by @anshbansal in #8599
  • docs(release): fix typo by @anshbansal in #8600
  • feat(ui): apply views to: domains, containers, terms by @eboneil in #8572
  • feat(search): embedded view dropdown by @joshuaeilers in #8598
  • fix(ingest/file): remove entity_type_counts and aspect_counts by @hsheth2 in #8586
  • fix(ingest): use hive pure_sasl variant by @hsheth2 in #8570
  • Feat(ingest/ldap)fix list index out of range error by @alplatonov in #8525
  • harden autocomplete test by @joshuaeilers in #8603
  • feat(ui/graphql) Add ability to sort search results from search results page by @chriscollins3456 in #8595
  • fix(ingest): Add client_certificate_path for rest client cert instead of ca_certif… by @mkamalas in #8581
  • refactor(graphql): extract code into metadata-io part 1 by @anshbansal in #8607
  • docs(ingest): update s3 and gcs doc with concept mapping by @mayurinehate in #8575
  • Fix(ingestion/clickhouse) move to two tier sqlalchemy by @alplatonov in #8300
  • fix(cypress): attempt to fix autocomplete test by @joshuaeilers in #8619
  • fix(cleanup): cleanup of 2 sub-modules by @anshbansal in #8616
  • docs(ingsetion/csv-enricher): fix sample csv mentioned in Docstrings by @siddiquebagwan in #8432
  • feat(ingest): allow relative start time config by @mayurinehate in #8562
  • fix(ingest/airflow): make inlets work again by @hsheth2 in #8631
  • feat(ingest/s3): Adding option to pass in any spark config property to s3 source by @treff7es in #8621
  • feat(impact analysis): allow deep linking of url params in impact analysis by @gabe-lyons in #8617
  • feat(ui) Display combined sibling results in search + 2 minor updates by @chriscollins3456 in #8602
  • feat(ui) Display consistent search results in embedded searches by @chriscollins3456 in #8597
  • feat(ingest): Add DataHub source by @asikowitz in #8561
  • fix(ingest/okta): fix event_loop RuntimeError with nested asyncio by @skrydal in #8637
  • fix(ingest/kafka): use SchemaReference properties instead of dict access by @Deepankarkr in #8615
  • feat(ingestion/ldap): flag to ingest ldap users with email instead of username by @Deepankarkr in #8606
  • Combine siblings in autocomplete by @joshuaeilers in #8610
  • fix(ingest): avoid mutable defaults in powerbi dataclass by @hsheth2 in #8609
  • chore(spring): upgrade minor versions of spring components by @david-leifker in #8627
  • docs(quickstart): quickstart documentation, clarification on production by @david-leifker in #8628
  • feat(datahub-ingestion): refactor datahub ingestion slim images by @david-leifker in #8515
  • bug(8584): emit data_platform_instance aspect if the config has platform_instance by @jinlintt in #8585
  • chore(snappy): fix snappy version constraint by @david-leifker in #8629
  • chore(hazelcast): update hazelcast version by @david-leifker in #8633
  • feat(graphql) Support exists operator in GraphQL Search API by @jjoyce0510 in #8652
  • [fix] [health ui] Removing ghost 0 for health signals on search cards by @jjoyce0510 in #8587
  • fix(data products): removing data products filter in search as its not indexed on entity documents by @gabe-lyons in #8650
  • feat(ingest/bigquery): add tag to BigQuery clustering columns by @ANich in #8495
  • fix(ingest/snowflake): fix usage enum bug by @hsheth2 in #8649
  • feat(ingest/dbt-cloud): use job-based graphql queries by @hsheth2 in #8647
  • Add and remove documentation and link for dataset by @kkorchak in #8604
  • Lineage column level test by @kkorchak in #8641
  • tests(search): search golden tests by @eboneil in #8605
  • Add test case for dataset deprecation test by @kkorchak in #8646
  • docs(ingest/kafka-connect): add details on platform instance mapping by @mayurinehate in #8654
  • docs(ingest/airflow): add capture_executions to docs by @hsheth2 in #8662
  • Fix a few view select issues by @joshuaeilers in #8670
  • feat(search): Add word gram analyzer for name fields by @iprentic in #8611
  • fix(docker): misc docker fixes by @david-leifker in #8677
  • tests(search): more golden tests by @eboneil in #8683
  • test(ingest/vertica): Skip integration test failing CI; support arm Macs by @asikowitz in #8694
  • ci: add needs_artifact_download output for ingestion image by @hsheth2 in #8695
  • logs(ingestion/unity): Hide stack trace on sql parse failure logs by @asikowitz in #8657
  • feat(ingestion/powerbi): support multiple tables as upstream in native SQL parsing by @siddiquebagwan-gslab in #8592
  • build(ingest): Bump pydantic pin by @asikowitz in #8660
  • remove(ingest/snowflake): Remove legacy snowflake lineage by @asikowitz in #8653
  • fix(ingest/ldap): Handle case when 'objectClass' not in attrs by @asikowitz in #8658
  • fix(ui) Remove new Role entity from searchable entity types by @chriscollins3456 in #8655
  • fix(java) Use alias for name search sorting and fix missing mappings by @chriscollins3456 in #8648
  • feat(ui) Create page for managing home page posts by @chriscollins3456 in #8707
  • fix(ingest/powerbi): add sqlglot python dep by @hsheth2 in #8704
  • ci(ingest): make ingestion caching rules correct by @hsheth2 in #8685
  • fix(cleanup): cleanup of 1 sub-module by @anshbansal in #8678
  • fix(policies): fix concurrent modification exception by @RyanHolstien in #8681
  • fix(ingest/bigquery): Add config option to create DataPlatformInstance, default off by @asikowitz in #8659
  • feat(ingest/looker): Record observed lineage timestamps for Looker and LookML sources by @ANich in #7735
  • feat(ingest/mssql): load jobs and stored procedures by @RChygir in #5363
  • fix(ingestion/kafka-connect): update retrieval of database name in Debezium SQL Server by @Starkie in #8608
  • feat(ingest/snowflake): tables from snowflake shares as siblings by @mayurinehate in #8531
  • feat(ingest/sql-queries): Add sql queries source, SqlParsingBuilder, sqlglot_lineage performance optimizations by @asikowitz in #8494
  • highlight matched fields in search results by @joshuaeilers in #8651
  • Add links to glossary term cards without counts by @joshuaeilers in #8705
  • fix non sibling document links by @joshuaeilers in #8724
  • refactor(policies): Rename edit all privilege to edit entity by @jjoyce0510 in #8722
  • feat(java/ui) Add search suggestions to our search experience by @chriscollins3456 in #8710
  • fix(cypress) Fix login.js cypress test by @chriscollins3456 in #8719
  • Fixes for faling login.js and managing_groups.js Cypress tests by @kkorchak in #8725
  • fix(kafka-setup): remove dependency confluent docker utils by @lix-mms in #8715
  • docs(docs): add native versioning by @yoonhyejin in #8714
  • config(ingest/rest): Update rest sink defaults to retry more often by @asikowitz in #8729
  • chore(jackson): update to released version of jackson by @david-leifker in #8674
  • fix(examples): fix typo in business glossary bootstrap yml by @mayurinehate in #8703
  • fix(schemaRegistry): change api servlet check to only apply to internal to fix glue support by @RyanHolstien in #8693
  • fix(ingest): stateful redundant run skip handler by @mayurinehate in #8467
  • fix(superset): get alternate platform value if sqlalchemy_uri param is missing by @akhil7philip in #8667
  • feat(ingest): support writing configs to files by @hsheth2 in #8696
  • feat(search): De-duplicate scale factors across entities by @iprentic in #8718
  • test(lineage): Add test for scroll across lineage by @iprentic in #8728
  • feat(ingest/metabase): detect source table for cards sourced from other cards by @k-popov in #8577
  • (ingestion) bug fix: emit platform instance aspect for dataset in Databricks ingestion by @jinlintt in #8671
  • feat(config): Turn on new search & browse experience by default by @iprentic in #8737
  • chore(ingest/s3) Bump Deequ and Pyspark version by @treff7es in #8638
  • docs(ingest/openapi): Downgrade status from CERTIFIED to INCUBATING by @asikowitz in #8736
  • feat(health): Adding Entity Health Status to the Lineage Graph View by @jjoyce0510 in #8739
  • build(ingest): Pin mypy-boto3-sagemaker directly by @asikowitz in #8746
  • feat(ingest/datahub): Improvements, bug fixes, and docs by @asikowitz in #8735
  • docs(obseve): Adding Volume Assertion Guide by @jjoyce0510 in #8706
  • fix(ingest/okta): Removed code closing okta's event_loop by @skrydal in #8675
  • fix(highlight): disable full name highlight by @joshuaeilers in #8750
  • fix(ui): hide pages from web crawlers by @hsheth2 in #8738
  • docs: add index pages for feature/deployment guides by @hsheth2 in #8723
  • feat(docs): move versioned_sidebars to static-assets by @yoonhyejin in #8743
  • docs(observe): DataHub Operation freshness assertion guide by @zmcnellis in #8749
  • feat(cli): support recursive deletes by @hsheth2 in #8709
  • fix(ingest/bigquery): Handle null view_definition; remove view definition hash ids by @asikowitz in #8747
  • feat(ingest/usage): Make cumulative query character limit configurable by @asikowitz in #8751
  • fix(ingest/athena): Fixing db container id by @treff7es in #8689
  • feat(systemMetadata): add pipeline names to system metadata by @hsheth2 in #8684
  • ci: separate airflow build and test by @mayurinehate in #8688
  • fix(ingest/athena): fix container linting by @hsheth2 in #8761
  • fix(datahub-frontend) Give permission for start.sh so it can run by @rtekal in #8594
  • feat(sql-parser): schema-aware output column casing by @hsheth2 in #8760
  • fix(ingest/bigquery): Filter out fine grained lineage with no upstreams by @asikowitz in #8758
  • feat(iceberg): Upgrade Iceberg ingestion source to pyiceberg 0.4.0 by @cccs-eric in #8357
  • Allow frontend to use http proxy by @githendrik in #8691
  • docs(observe): Dataset Profile volume assertion guide by @zmcnellis in #8764
  • docs:fix broken img links under managed-datahub by @yoonhyejin in #8769
  • fix:small typo on graphql tutorial by @yoonhyejin in #8741
  • refactor(build): upgrade to gradle 7 & guava update by @david-leifker in #8745
  • fix(siblings): space icons out by @joshuaeilers in #8767
  • chore(build): upgrade gradle wrapper by @hsheth2 in #8776
  • feat(EntityService): batched transactions and ebean updates by @david-leifker in #8456
  • fix(frontend): Fix"Logout with OIDC not working" by @FirKys in #8773
  • docs:upgrade docusaurus version by @yoonhyejin in #8770
  • fix:change global graph url to static-assets by @yoonhyejin in #8742
  • doc(tests): fix endpoint param to push results by @anshbansal in #8783
  • fix(elastic): improve error handling for profiling by @anshbansal in #8785
  • chore(analytics): bump version by @joshuaeilers in #8786
  • docs(session): add documentation for session token duration and fix default by @RyanHolstien in #8791
  • fix(ingest/datahub): Support postgres; build(postgres): Modernize postgres docker setup by @asikowitz in #8762
  • feat(airflow-plugin): add package type information by @mayurinehate in #8795
  • feat(systemMetadata): Adding a lastRunId field system metadata by @jjoyce0510 in #8672
  • added support for group-owners in dataflow entities by @dnks23 in #8154
  • fix(ingest/tableau): fix tableau native CLL for snowflake, add type annotations by @mayurinehate in #8779
  • fix(ingest/bigquery): fix partition and median queries for profiling by @mayurinehate in #8778
  • docs: add datahub source to integrations page by @hsheth2 in #8787
  • chore(ingest): upgrade sqlglot fork by @hsheth2 in #8775
  • docs: minor fix on versioning navbar and dropdown by @jeffmerrick in #8790
  • feat(ingest): drop sql_metadata parser by @hsheth2 in #8765
  • fix(ingest): drop wrap_aspect_as_workunit method by @hsheth2 in #8766
  • feat(search): Also de-duplicate the field queries based on field names by @iprentic in #8788
  • feat(openapi): entity endpoints & analytics raw by @david-leifker in #8537
  • docs(db-retention): update with default setting by @david-leifker in #8797
  • fix(custom-search): fix custom search to be able to use unquoted query by @david-leifker in #8805
  • feat: add feedback widget by @yoonhyejin in #8732
  • fix(gms): Fixed Recently Viewed section for users with '@' in the URN. by @skrydal in #8754
  • fix(spark-test): upgrade gradle and fix spark smoke test by @david-leifker in #8777

New Contributors

Full Changelog: v0.10.5...v0.11.0

Don't miss a new datahub release

NewReleases is sending notifications on new releases.