Features
MCP Services
MCP (Model Context Protocol) is now a first-class service category with service entities, server entities, execution logs, test-connection support, REST resources, and UI pages.
- Usage analytics expose summary, history, tool breakdown, user breakdown, and current-user usage
- MCP OAuth now supports SAML SSO authentication
- Client secrets are not issued to public clients
- get_entity_details now surfaces custom properties in responses
Knowledge Graph and RDF
Requires Apache Jena. Run the RDF Knowledge Graph Index App after upgrade for first-time users.
- Distributed RDF indexing with state tables for jobs, partitions, locks, and server stats
- Glossary membership scoping, relation cleanup, distributed mode, and compaction
- Revamped graph with custom nodes, relation details, and distributed indexing status
Search Index Performance and Live Indexing
- Tunable settings: refresh interval, replica count, translog durability, sync interval, and per-entity overrides
- Per-stage reindex timing metrics for reader, process, sink, and vector stages
- Live indexing retries on failure with a dead-letter queue for failed items
- Search results can be exported to CSV from the Explore page under Tools
- Search Index App schedule has moved to weekly — review before upgrading
Ontology Explorer
New first-class governance page at /governance/ontology with graph filters, layout controls, side-panel entity details, and export controls.
Typed Glossary Term Relations
New relation types: relatedTo, synonym, antonym, broader, narrower, partOf, hasPart, calculatedFrom, usedToCalculate, seeAlso
- New governance settings page to manage relation types
- Relation badges, filters, and graph views throughout Glossary UI
- Concept mappings for external IRIs and SKOS-style relation types
- APIs for relation usage counts, asset counts, batch fetch, add/remove, and relation graph
Data Marketplace
- New sidebar and routes at /data-marketplace, /data-marketplace/domains, /data-marketplace/data-products
- Customizable landing page with widgets for domains, data products, announcements, and search
AI and Hybrid Search
- Google Gemini embedding provider with configurable dimensions and endpoint override
- OpenAI NLQ: modelId, request timeouts, max tokens, and temperature now configurable
- Hybrid search tuning: keyword/semantic weights, RRF settings, semantic score threshold, highlight fragment size
- textToLLMContext and vector body-text extension hooks
Data Quality and Profiler
- Dynamic and static sampling via profileSampleConfig
- Explicit metrics selection per profiler run
- Top-dimension controls for dimensional test cases
- Bulk add and select-all for logical and bundle test suites
- Dashboard widgets and filters: data products, certification, incident status, tiers, entity health
- Storage auto-classification for containers with language-aware recognizer selection
- Deterministic MySQL median behavior
Governance and Workflows
- Data-contract references across data assets and service entities
- Workflow triggers extended: data product, data contract, glossary terms, input ports, output ports
- Approval tasks show proposed changes with clickable entity links, domain stamped on creation
- Self-approval prevention for workflow change requests
- New Archived entity status
New Connectors
- Microsoft Fabric — database and pipeline with lineage, usage, and profiler
- Google Drive — ingestion connector and example workflow
- Pub/Sub — messaging connector with test-connection support
- QuestDB — database connector
- IOMETE — database connector
- SAP SuccessFactors — database connector
- SAP S/4HANA — dashboard connector
- Matillion Data Cloud — pipeline connector
- Airflow 3.x — API-based connector; constraints upgraded to 3.2.1
Connector Improvements
- Snowflake — opt-in ACCESS_HISTORY lineage path; queries chunked by day to avoid timeouts
- Unity Catalog — incremental metadata extraction, only fetching changed entities since last run
- SSRS — report-to-dataset lineage
- Metabase — chart-level lineage extraction
- OpenLineage — Glue, Kusto, Cosmos DB naming; symlinks facet for Iceberg; pipeline node for single-sided lineage
- Storage — compressed archive ingestion (ZIP, tar, gzip) in S3, ADLS, GCS; Redis caching for container ancestors
- MySQL — queryHistoryTable option; GCP Cloud SQL IAM support
- Athena — catalogId for S3 Tables and cross-account Glue
- Oracle — preserveIdentifierCase and useDBATable options
- S3, ADLS, GCS — profiling capability flags; REST connector S3 and SSL config
Platform, Cache, and Operability
- Read-bundle prefetch and cache warmup for tags, certifications, relationships, containers, ancestors
- Redis: cache metrics, distributed warmup, per-command timeout defaulting to 300ms
- Deadlock retry handling and reduced write deadlocks
- JSON log format via LOG_FORMAT=json, streamable logs, non-blocking handlers
- QoS request admission enabled by default via QOS_* settings
- CSP nonce handling and web security headers: COEP, CORP, COOP
- Regenerate-bot-tokens for JWT key rotation
- db-tune ops subcommand and production RDS runbook
- Diagnostics v2 framework — legacy ExecutionTimeTracker removed
Columns as Independent Entities
Columns are now indexed as independent entities. They appear in asset counts and are the default entity shown in Explore when selecting a database service. Previously tables were shown. This is a behavioral change.
Breaking Changes
Connector and Ingestion Changes
- Iceberg connector removed — services migrated to CustomDatabase, pipelines hard-deleted. Update any YAML or automation referencing serviceType Iceberg
- Databricks/Unity Catalog scheme changed from databricks+connector to databricks. Stored configs migrated; external YAMLs must be updated
- Profiler sampling changed to profileSampleConfig. Old fields profileSample, profileSampleType, samplingMethodType, and computeMetrics are removed
- randomizedSample defaults now explicitly false in migrated configs
- Python ingestion targets 3.10, 3.11, 3.12. Key deps: SQLAlchemy 2.x, pandas 2.1.x, pyodbc 5.3.x, Airflow 3.2.1, Databricks SQLAlchemy 2.x
- Storage manifest partitionColumns uses a smaller partition-column shape
API and Schema Changes
- Feed APIs no longer accept from in createThread or createPost — remove it from client payloads
- Search payloads removed the semanticSearch boolean
- Application schemas renamed preview to enabled with inverted meaning — custom app manifests must use enabled
- Webhook moved from secretKey to authType object (no auth / bearer / OAuth2)
- Custom property names must start alphanumeric and cannot contain / or ~
- Glossary relatedTerms changed to typed TermRelation objects — existing data migrated to relatedTo
- entity_relationship primary key now includes relationType
- Logical-suite add endpoint deprecated — use PUT /api/v1/dataQuality/testCases/logicalTestCases/bulk
- Bulk Assets dryRun now enforced for tag, glossary, dataProduct, and team removes
- New Archived entity status — update any hard-coded status enums
Operational Notes
- Quartz tables cleared during migration — stop all instances before upgrading
- Postgres fqnHash text_pattern_ops indexes added or replaced — runbook included in migration file if build is interrupted
- New tables for MCP services, servers, executions, RDF indexing jobs, partitions, locks, and server stats
- SERVER_CHANGE_LOG historical gaps backfilled — missing entries caused data-insights timeline holes
- Profiler pipeline cleanup force-executed on upgrade to clear stuck pre-1.13 state
- LOG_FORMAT=json now supported — review any custom Dropwizard logging config
- QoS admission enabled by default — check QOS_* settings if adjustment needed
- Redis per-command timeout defaults to 300ms — tune for slow Redis deployments
Changelog
Search and Reindexing
- Fixed nested children causing Elasticsearch/OpenSearch mapping-depth failures
- Fixed stale file-extension aggregation on v1.13.0 upgrade causing 500 errors on file search
- Fixed stale flattened-children highlight field on v1.13.0 upgrade causing 500 errors on container search
- Fixed search_after silently dropping entities when sort value contains a comma
- Fixed query, worksheet, and file reindexing missing relationship fields
- Fixed search-index alias resolution for entity-specific and OpenSearch cluster prefixes
- Fixed batch-prefetch of upstream lineage leaking Hikari connections during bulk reindex
- Fixed soft-delete propagation to time-series child aliases
- Fixed clean reindex jobs incorrectly marked failed when only warnings existed
- Fixed text-field sorting and aggregation .keyword resolution
- Fixed user index searches on nested owners queries
- Fixed advanced-search Contains and Not Contains operators for description field
Glossary, Tags, and Governance
- Fixed glossary relation rendering for multiple relation types between the same term pair
- Fixed related-term tooltip sanitization and relation badge colors and icons
- Fixed tag rename and relationship cache invalidation
- Fixed TagLabel server fields lost when saving tags
- Fixed certification tags leaking into regular tags and missing appliedBy audit trail
- Fixed soft-deleted users appearing in experts and reviewers selectors
- Fixed hyperlink workflow rules and Tags/Tier field ambiguity
Data Quality and Profiler
- Fixed test-case suite search membership preservation
- Fixed tier and certification filter queries in Data Quality dashboard
- Fixed incident manager status and severity chip behavior
- Fixed TableColumnCountToBeBetween API responses
- Fixed column profile percentages showing 0% for zero proportions
- Fixed tableCustomSQLQuery ignoring computePassedFailedRowCount flag
- Fixed orphan test cases breaking search indexing
- Fixed sample randomization at 100% sample
Ingestion and Connectors
- Fixed single bad table aborting entire schema ingestion run
- Fixed Snowflake and OpenMetadata socket waits causing silent hangs
- Fixed Power BI lineage buffer flushing, TSQL Sql.Database parsing, and workspace cache scope
- Fixed Databricks nested column descriptions and SQLAlchemy 2.x compatibility
- Fixed Databricks and Unity Catalog valueless tags being silently dropped
- Fixed Datalake JSON columns typed as string for empty dict values
- Fixed MySQL profiler median query quoting and deterministic behavior
- Fixed Redshift interval, numeric, and timestamp precision parsing, view definition, IAM auth, and LISTAGG errors
- Fixed Oracle, MSSQL, Athena, and Redshift profiler under SQLAlchemy 2.0
- Fixed dbt column tags, snapshot model patching, compiled-only test results, and test entity links
- Fixed SQL Server temporal-table period columns classified as PII
- Fixed SQLAlchemy engine resource leak on multi-database source iteration
- Fixed ADLS object counts scoped to configured sub-path
- Fixed PII recognizer selection based on configured language
- Fixed runtime spaCy model loading for non-root containers
UI and UX
- Fixed unknown service categories returning 404
- Fixed Explore page column icon display, search term warnings, and text overflow
- Fixed lineage edge misalignment, edge hover, temporary lineage table nodes, and service nodes
- Fixed table constraints UI and cluster-key constraint display and editing
- Fixed dotted custom-property names display
- Fixed custom relation badge color handling and overlapping badges
- Fixed activity feed, task notification refresh, and approval task rendering
- Fixed MSAL and SAML token renewal and Safari SSO session loss
- Fixed copy-to-clipboard in non-secure contexts
- Fixed charts not deleted when parent dashboard or service is deleted
- Fixed column.extension values silently dropped on entity creation
Security and Dependencies
- AWS SDK pinned to 2.41.30 — clears CloudFront CVE
- Airflow upgraded to 3.2.1 — clears 7 CVEs
- gnutls, libcap, openssh, and rsync CVEs closed in ingestion Docker images
- Test-connection workflow triggers now require authorization
- Python ingestion: explicit jsonify at route level to break XSS taint chain
- Axios, dompurify, follow-redirects, and related UI CVE fixes
- Jetty and pac4j upgraded for Java-side CVEs