github datahub-project/datahub v1.5.0

latest release: v1.5.0.1
8 days ago

DataHub v1.5.0 Release Notes

Helm Chart Requirement: 0.9.2
Full technical release notes: Updating DataHub

Product Features

  • V1 UI officially sunset. All development targets V2 UI going forward. Ensure THEME_V2_ENABLED=true and THEME_V2_DEFAULT=true.
  • Multiple data products per asset (backend and UI).
  • Policy targeting by Glossary Terms and Groups.
  • Domain-scoped policies now include child domain assets.
  • datahub search CLI with semantic search, query projection, and agent-context integration.

Platform

  • Java 17 runtime required. Spark upgraded to 3.3.4, Hadoop to 3.3.6. Spark lineage users must be on Spark 3.3.0+.
  • Default token signing key & salt removed. Operators must explicitly set DATAHUB_TOKEN_SERVICE_SIGNING_KEY and DATAHUB_TOKEN_SERVICE_SALT. Helm users are unaffected.
  • Retention service disabled = no version history. Only the current version (v0) is retained when the retention service is not enabled.
  • TLS 1.0/1.1 disabled on frontend custom truststores.
  • Elasticsearch reindex/index-creation retries for improved upgrade resilience.
  • Kubernetes optional scale-down during system-update for blocking upgrades like reindexing. Disabled by default.
  • SDK: emit_mcps() now returns List[TraceData] instead of int. Trace IDs exposed for SYNC_PRIMARY and ASYNC modes.
  • Reproducible ingestion Docker builds via pinned transitive dependencies (uv.lock, constraints.txt).
  • Python deps migrating from setup.py to pyproject.toml (PEP 621); setup.py still the editing source for now.

Ingestion

New Connectors

  • RDF, Snowplow, Apache Doris

Breaking Changes (see migration guide for details)

  • PowerBI M-Query lineage rewritten using Microsoft's official parser. native_query_parsing: false behavior changed.
  • SQL view query IDs now use SHA-256 hashes — old query entities become orphaned. Use stateful ingestion to clean up.
  • Oracle multitenant URNs now use PDB name instead of CDB name when connecting via service_name.
  • Fabric OneLake workspace containers moved to fabric platform (from fabric-onelake).
  • Vertex AI pipeline URNs restructured for stable DataFlow entities; ML Metadata extraction enabled by default (requires additional GCP permissions).
  • DataHub source now uses URN pattern filtering to exclude secrets, ingestion sources, and execution requests by default.
  • Kafka Connect Debezium SQL Server platform changed from sqlserver to mssql.

Enhancements

  • dbt: Semantic model and exposures ingestion; convert_urns_to_lowercase option for case-insensitive platforms.
  • Snowflake: Metadata pattern pushdown, table type filtering, external DMF assertion ingestion.
  • Power BI: Column-level lineage enabled by default.
  • Kafka Connect: Debezium and Confluent JDBC sink connector support; bundled JVM removes system Java requirement.
  • SQL parsing: Major CTE/subquery join resolution performance improvements across all SQL-based connectors.
  • Mode: Concurrent API fetching, response caching, SQL parsing optimizations.
  • Trino: Column-level lineage on upstream datasets.
  • Iceberg: Ingestion-time domain assignment.
  • Azure Data Factory: Column lineage for Copy activity.
  • Airflow plugin: Multi-statement SQL parsing for lineage.
  • Sigma: Workbook filtering.
  • BigQuery: convert_column_urns_to_lowercase option.
  • Kafka source: Option to disable Avro schema name validation.
  • Great Expectations & SQLAlchemy profilers brought to feature parity.
  • Browse paths: DataFlow/DataJob entities get browsePathsV2 with platform instance when configured.
  • Vertex AI: Cross-platform lineage, hierarchical UI folders, stateful ingestion for large projects.
  • Oracle: Fixed container naming with service_name.
  • Configurable report sample sizes and richer failure logging.

Deprecations

  • Vertex AI: regionregions, project_idproject_ids. Old fields still work.
  • Vertex AI: normalize_external_dataset_paths will default to true in the next major version.

Full Changelog: v1.4.0.3...v1.5.0

Don't miss a new datahub release

NewReleases is sending notifications on new releases.