DataHub v1.5.0 Release Notes

Helm Chart Requirement: 0.9.2
Full technical release notes: Updating DataHub

Product Features

V1 UI officially sunset. All development targets V2 UI going forward. Ensure THEME_V2_ENABLED=true and THEME_V2_DEFAULT=true.
Multiple data products per asset (backend and UI).
Policy targeting by Glossary Terms and Groups.
Domain-scoped policies now include child domain assets.
datahub search CLI with semantic search, query projection, and agent-context integration.

Platform

Java 17 runtime required. Spark upgraded to 3.3.4, Hadoop to 3.3.6. Spark lineage users must be on Spark 3.3.0+.
Default token signing key & salt removed. Operators must explicitly set DATAHUB_TOKEN_SERVICE_SIGNING_KEY and DATAHUB_TOKEN_SERVICE_SALT. Helm users are unaffected.
Retention service disabled = no version history. Only the current version (v0) is retained when the retention service is not enabled.
TLS 1.0/1.1 disabled on frontend custom truststores.
Elasticsearch reindex/index-creation retries for improved upgrade resilience.
Kubernetes optional scale-down during system-update for blocking upgrades like reindexing. Disabled by default.
SDK: emit_mcps() now returns List[TraceData] instead of int. Trace IDs exposed for SYNC_PRIMARY and ASYNC modes.
Reproducible ingestion Docker builds via pinned transitive dependencies (uv.lock, constraints.txt).
Python deps migrating from setup.py to pyproject.toml (PEP 621); setup.py still the editing source for now.

PowerBI M-Query lineage rewritten using Microsoft's official parser. native_query_parsing: false behavior changed.
SQL view query IDs now use SHA-256 hashes — old query entities become orphaned. Use stateful ingestion to clean up.
Oracle multitenant URNs now use PDB name instead of CDB name when connecting via service_name.
Fabric OneLake workspace containers moved to fabric platform (from fabric-onelake).
Vertex AI pipeline URNs restructured for stable DataFlow entities; ML Metadata extraction enabled by default (requires additional GCP permissions).
DataHub source now uses URN pattern filtering to exclude secrets, ingestion sources, and execution requests by default.
Kafka Connect Debezium SQL Server platform changed from sqlserver to mssql.

dbt: Semantic model and exposures ingestion; convert_urns_to_lowercase option for case-insensitive platforms.
Snowflake: Metadata pattern pushdown, table type filtering, external DMF assertion ingestion.
Power BI: Column-level lineage enabled by default.
Kafka Connect: Debezium and Confluent JDBC sink connector support; bundled JVM removes system Java requirement.
SQL parsing: Major CTE/subquery join resolution performance improvements across all SQL-based connectors.
Mode: Concurrent API fetching, response caching, SQL parsing optimizations.
Trino: Column-level lineage on upstream datasets.
Iceberg: Ingestion-time domain assignment.
Azure Data Factory: Column lineage for Copy activity.
Airflow plugin: Multi-statement SQL parsing for lineage.
Sigma: Workbook filtering.
BigQuery: convert_column_urns_to_lowercase option.
Kafka source: Option to disable Avro schema name validation.
Great Expectations & SQLAlchemy profilers brought to feature parity.
Browse paths: DataFlow/DataJob entities get browsePathsV2 with platform instance when configured.
Vertex AI: Cross-platform lineage, hierarchical UI folders, stateful ingestion for large projects.
Oracle: Fixed container naming with service_name.
Configurable report sample sizes and richer failure logging.

Vertex AI: region → regions, project_id → project_ids. Old fields still work.
Vertex AI: normalize_external_dataset_paths will default to true in the next major version.

Full Changelog: v1.4.0.3...v1.5.0