DataHub v1.5.0 Release Notes
Helm Chart Requirement: 0.9.2
Full technical release notes: Updating DataHub
Product Features
- V1 UI officially sunset. All development targets V2 UI going forward. Ensure
THEME_V2_ENABLED=trueandTHEME_V2_DEFAULT=true. - Multiple data products per asset (backend and UI).
- Policy targeting by Glossary Terms and Groups.
- Domain-scoped policies now include child domain assets.
datahub searchCLI with semantic search, query projection, and agent-context integration.
Platform
- Java 17 runtime required. Spark upgraded to 3.3.4, Hadoop to 3.3.6. Spark lineage users must be on Spark 3.3.0+.
- Default token signing key & salt removed. Operators must explicitly set
DATAHUB_TOKEN_SERVICE_SIGNING_KEYandDATAHUB_TOKEN_SERVICE_SALT. Helm users are unaffected. - Retention service disabled = no version history. Only the current version (v0) is retained when the retention service is not enabled.
- TLS 1.0/1.1 disabled on frontend custom truststores.
- Elasticsearch reindex/index-creation retries for improved upgrade resilience.
- Kubernetes optional scale-down during system-update for blocking upgrades like reindexing. Disabled by default.
- SDK:
emit_mcps()now returnsList[TraceData]instead ofint. Trace IDs exposed forSYNC_PRIMARYandASYNCmodes. - Reproducible ingestion Docker builds via pinned transitive dependencies (
uv.lock,constraints.txt). - Python deps migrating from
setup.pytopyproject.toml(PEP 621);setup.pystill the editing source for now.
Ingestion
New Connectors
- RDF, Snowplow, Apache Doris
Breaking Changes (see migration guide for details)
- PowerBI M-Query lineage rewritten using Microsoft's official parser.
native_query_parsing: falsebehavior changed. - SQL view query IDs now use SHA-256 hashes — old query entities become orphaned. Use stateful ingestion to clean up.
- Oracle multitenant URNs now use PDB name instead of CDB name when connecting via
service_name. - Fabric OneLake workspace containers moved to
fabricplatform (fromfabric-onelake). - Vertex AI pipeline URNs restructured for stable DataFlow entities; ML Metadata extraction enabled by default (requires additional GCP permissions).
- DataHub source now uses URN pattern filtering to exclude secrets, ingestion sources, and execution requests by default.
- Kafka Connect Debezium SQL Server platform changed from
sqlservertomssql.
Enhancements
- dbt: Semantic model and exposures ingestion;
convert_urns_to_lowercaseoption for case-insensitive platforms. - Snowflake: Metadata pattern pushdown, table type filtering, external DMF assertion ingestion.
- Power BI: Column-level lineage enabled by default.
- Kafka Connect: Debezium and Confluent JDBC sink connector support; bundled JVM removes system Java requirement.
- SQL parsing: Major CTE/subquery join resolution performance improvements across all SQL-based connectors.
- Mode: Concurrent API fetching, response caching, SQL parsing optimizations.
- Trino: Column-level lineage on upstream datasets.
- Iceberg: Ingestion-time domain assignment.
- Azure Data Factory: Column lineage for Copy activity.
- Airflow plugin: Multi-statement SQL parsing for lineage.
- Sigma: Workbook filtering.
- BigQuery:
convert_column_urns_to_lowercaseoption. - Kafka source: Option to disable Avro schema name validation.
- Great Expectations & SQLAlchemy profilers brought to feature parity.
- Browse paths: DataFlow/DataJob entities get
browsePathsV2with platform instance when configured. - Vertex AI: Cross-platform lineage, hierarchical UI folders, stateful ingestion for large projects.
- Oracle: Fixed container naming with
service_name. - Configurable report sample sizes and richer failure logging.
Deprecations
- Vertex AI:
region→regions,project_id→project_ids. Old fields still work. - Vertex AI:
normalize_external_dataset_pathswill default totruein the next major version.
Full Changelog: v1.4.0.3...v1.5.0