DataHub v1.0.0
Release Highlights
DataHub v1.0.0 is packed with exciting updates, including:
- A completely redesigned user experience focused on simplified navigation and a visually stunning interface.
- Unified support for Data & AI, including AI Model Group Versions, AI Model Lineage, Model Stats, and Experiment/Run ingestion.
- DataHub Iceberg Catalog, allowing users to manage Iceberg tables directly from DataHub.
Read the blog post here!
Changelog
New User Interface: Putting Usability First
With a completely re-designed user interface, DataHub v1.0 represents a fundamental rethinking of how users interact with their metadata and data assets. The new experience includes:
- Intuitive Platform-Based Navigation - Hierarchically browse data by database and schema in Snowflake, BigQuery, Redshift, Databricks, and more. Combine hierarchical navigation with filtering by data owners, domain, tags, and glossary terms to find the right data fast.
- Seamless Lineage Exploration - Our reimagined lineage view features multi-level expansion, name-based search, and column-level visibility, making it easier than ever to understand data relationships and impact.
- Integrated Data Quality - Make confident decisions with deeply integrated quality signals throughout the platform, helping you quickly identify and trust reliable data assets.
DataHub Admins can enable the new UI for all users by setting the THEME_V2_DEFAULT
environment variable to true
; until then, Users can opt into the new experience by navigating to Settings > Appearance > Try New User Experience.
Comprehensive AI Asset Support: Unifying Data and AI
DataHub v1.0 treats AI assets as first-class citizens within the data ecosystem, allowing users to track their entire data-to-AI pipeline in one place.
- Unified Search and Discovery: Seamlessly search across models, model groups, and traditional data assets in one unified interface.
- Advanced Versioning System: Track multiple versions of datasets and ML models with detailed performance metrics and clear linkages between versions.
- Rich Model Statistics: Monitor key metrics across versions, understand performance trends, and make data-driven decisions about model deployment.
- End-to-End Lineage: Trace data flows from raw inputs through models to final outputs, with complete versioning support.
DataHub Iceberg REST Catalog Beta: Simplifying Data Lake Management
This release introduces an integration with Apace Iceberg, allowing users to manage Iceberg tables directly through DataHub, including:
- Create and manage Iceberg tables through DataHub
- Maintain consistent metadata across DataHub and Iceberg
- Facilitate data discovery by exposing Iceberg table metadata in DataHub
- Enable secure access to Iceberg tables through DataHub's permissions model
Metadata Ingestion
We’re continuously improving our integrations to add new capabilities and squash bugs.
- MLFlow: Significantly revamped our MLFlow connector, adding support for tracking Model Group Versions and Model Stats; tracking Model lineage to underlying datasets; and capturing Experiments and Runs
- Redshift: Added support for data shares and external schemas, including automatic lineage resolution across Redshift namespaces.
- Superset: (community contribution!): Added support for Superset virtual datasets, column-level lineage, and ownership information.
- Snowflake: Added support for Snowflake Streams and Hybrid Tables, and fixed a bug with lineage resolution across table renames.
- Oracle: Improved the accuracy of column-level lineage resolution.
- Iceberg**: Alongside our new Iceberg Catalog API, we’ve made various improvements to our Iceberg integration.
Additionally, we’re working on a new integration with Vertex AI. Please reach out if you’re interested in joining the beta.
Of course, this only scratches the surface of changes. This release contains 100+ improvements across 25 different integrations.
Thank You to our Contributors!
First-Time Contributors
@Bhadhri03 @brock-acryl @cccs-cat001 @davidebriscese @Deepalijain13 @dougbot01 @Haebuk @haon85 @josges @mihai103 @rajatgl17 @Rasnar @rharisi @samanthafigueredo5 @ttekampe
Repeat Contributors
@bda618 @deepgarg-visa @eagle-25 @jayasimhankv @ksrinath @llance @Masterchen09 @mayurinehate @mkamalas @PeteMango @pinakipb2 @remisalmon @sagar-salvi-apptware @svdimchenko @v-tarasevich-blitz-brain
Project Maintiners
@anshbansal @asikowitz @chakru-r @chriscollins3456 @david-leifker @gabe-lyons @hsheth2 @jayacryl @jjoyce0510 @kevinkarchacryl @pedro93 @RyanHolstien @ryota-cloud @sakethvarma397 @sgomezvillamor @shirshanka @skrydal @treff7es @yoonhyejin
View the full changelog: v0.15.0.1...v1.0.0