github datahub-project/datahub v0.8.33
DataHub v0.8.33

latest releases: v0.13.2, v0.13.1, v0.13.1rc2...
2 years ago

Release Highlights

User Experience

Refreshed the ML Entity page to match the feel of all other entity types; improved ML lineage functionality

Ingestion Improvements

  • Airflow Improvements - as demoed in March Town Hall
    • Add support to capture Airflow execution runs from lineage backend
    • Introduce new High level API for generating dataflow/job/dataprocessinstance
  • MS SQL ingestion now captures table & column descriptions
  • Trino platform support for Great Expectations
  • New Presto-on-Hive ingestion source
  • BigQuery ingestion now supports extraction of usage info from audit logs
  • Fix to Looker ingestion to extract Explore Views from join names
  • Fix to Tableau ingestion to avoid duplicating schema in URNs for upstream tables
  • Simplify & annotate Redshift Usage source

Full Commit Log

  • feat(gms): Expose kafka listener concurrency as a GMS setting by @jjoyce0510 in #4536
  • feat(ingest): add option for external Spark cluster by @kevinhu in #4571
  • fix(upgrade): Renaming kafka producer since it clashes with spring-internal by @dexter-mh-lee in #4573
  • feat(GraphQL): Add data platform query to GraphQL API by @jjoyce0510 in #4574
  • build(ui): Fix Windows UI lint by @mattmatravers in #4556
  • doc: make note prominent on quickstart by @anshbansal in #4558
  • fix(protobuf) minor bugfixes for protobuf by @leifker in #4553
  • feat(docs) Improves docs around developing datahub, removes deprecated docs on building metadata service by @pedro93 in #4552
  • chore: cleanup extra file by @anshbansal in #4541
  • feat(snowflake): reduce permissions provisioned by default by @anshbansal in #4543
  • fix(ingestion): Redshift usage refactoring - simplify, annotate, fix bugs by @rslanka in #4572
  • fix(graphql): Adding PRE FabricType to GraphQL by @jjoyce0510 in #4582
  • feat(search) - add DATETIME FieldType by @aditya-radhakrishnan in #4407
  • fix(tableau): fix for incorrect schema returned by tableau api for sn… by @mayurinehate in #4577
  • chore: update default cli for managed ingestion by @anshbansal in #4581
  • feat(okta) - add support for filtering/searching when ingesting Okta groups and users by @aditya-radhakrishnan in #4586
  • doc(snowflake): add example of table pattern by @anshbansal in #4580
  • fix(doc): try to fix broken link by @daha in #4593
  • fix(bigquery): incorrect lineage when views are present by @anshbansal in #4568
  • feat(metadata-service): Supporting a configurable Authorizer Chain by @jjoyce0510 in #4584
  • fix(search): Make sure home page and search pages are consistent by @dexter-mh-lee in #4588
  • fix(browse): Reduce browse aggregation size by @dexter-mh-lee in #4601
  • doc: add page for handling deprecations, breaking changes etc. by @anshbansal in #4590
  • docs(GraphQL): fix typo by @Falci in #4605
  • feat(search): Add SearchScore annotation to use fields for search ranking by @dexter-mh-lee in #4596
  • feat(ingestion): Redshift Usage Source - simplify OperationalStats workunit generation. by @rslanka in #4585
  • feat(tableau): add some logic to normalize table names in tableau by @gabe-lyons in #4609
  • fix: urlencode slash in urns too by @daha in #4527
  • fix(bigquery): fix lineage bug, improve docs, add dataset filter config by @anshbansal in #4607
  • fix(protobuf) fix test instabilitity by @leifker in #4612
  • fix(ui): Fix dashboard tags display by @jjoyce0510 in #4611
  • feat(ui): Adding GraphQL queries to fetch entity deprecation status by @jjoyce0510 in #4614
  • feat(ingest): enable connection string for all sqlalchemy datasources by @ms32035 in #4508
  • fix(docs): add grant statements for redshift-ingestion by @Abhiram98 in #4559
  • chore: fix lint and remove incorrect integration mark from unit tests by @anshbansal in #4621
  • feat: adding gradle, pip cache via gh cache, docker cache via dockerhub by @anshbansal in #4387
  • doc(scheduling): make it easier to find ui ingestion by @anshbansal in #4610
  • feat(glue): add CatalogId parameter for cross-account access by @BoyuanZhangDE in #4608
  • doc(cli): add env variables and options for ingest command by @anshbansal in #4598
  • fix(ingest): Restricting pytest docker version to <0.12 by @treff7es in #4639
  • fix(cypress) - add waits for cypress search test to remove flakiness by @aditya-radhakrishnan in #4640
  • Revert "feat: adding gradle, pip cache via gh cache, docker cache via dockerhub" by @dexter-mh-lee in #4637
  • feat(search): Only reindex if the mappings for an existing field changed by @dexter-mh-lee in #4629
  • feat: add presto-on-hive metadata ingestion source by @jchen0824 in #4625
  • feat(ingest): add trino platform for great expectations by @ms32035 in #4594
  • fix(kafka): Stop overriding kafka registry props with empty values by @jsotelo in #4604
  • [model]: Dataprocess instance entity to model datajob/jobflow runs by @treff7es in #4459
  • feat(ingest): add Urn python library for DataJob, DataFlow, Domain and Tag by @tc350981 in #4618
  • fix(ingestion): ensure source/sink reports are always logged by @anshbansal in #4592
  • fix(ingestion): extract explore views from join name in Looker by @dyanarose in #4627
  • feat(ingestion): Enable lower-casing of the name part of dataset urn if env variable is set. by @rslanka in #4649
  • feat: Enable the ingestion of bigquery audit logs to parse usage info… by @tha23rd in #4441
  • fix(ingest): Fix snowflake KEY_PAIR auth by @mkamalas in #4638
  • fix(home): Fix issue where some browse cards are missing by @dexter-mh-lee in #4652
  • fix(tableau): avoid duplicate schema in URNs for upstream tables by @maaaikoool in #4645
  • feat(ingest): capture MSSQL table+column descriptions by @kevinhu in #4579
  • feat(ml): bringing ml screens up to date w/ the modern ui layout & improving ml lineage by @gabe-lyons in #4651
  • (feat:airflow) Add support to capture airflow executions + high level dataflow/jobs api by @treff7es in #4615
  • fix(ingestion): add missing workunit ids by @anshbansal in #4657
  • fix(ingestion): Adding missing init.py by @anshbansal in #4659
  • fix(bigquery-usage): missing dependency by @anshbansal in #4661
  • feat(cypress) - add cypress dashboard view to CI by @aditya-radhakrishnan in #4654
  • feat(autocomplete): show fully qualified name in autocomplete by @gabe-lyons in #4663
  • feat(ingestion) dbt: Fixing issue with strip_user_ids_from_email and adding owner_naming_pattern by @arunvasudevan in #4587
  • fix(sqlparser): fix sqlparser breaking due to # sign by @anshbansal in #4662
  • fix(ingestion): validate datasource in Tableau connector, before creating its upstream by @nandacamargo in #4613
  • Added Relative Routing on the Users & Groups screen by @Ankit-Keshari-Vituity in #4664
  • fix(airflow): Not importing emitters directly to eliminate unneeded dependency by @treff7es in #4668
  • docs: remove ingestion source summary table by @maggiehays in #4670
  • feat(ml): some machine learning followups by @gabe-lyons in #4669
  • fix(search): Fix urn component settings by @dexter-mh-lee in #4672
  • fix(ingestion): update example recipes by @anshbansal in #4660
  • feat(theming): set custom logo without rebuilding by @gabe-lyons in #4674
  • feat(data-platform): Add platform entities for the connectors we support by @dexter-mh-lee in #4676
  • refactor(authorization): Add authorizedActor function to Authorizer interface by @dexter-mh-lee in #4678
  • docs(tags) - add tags usage guide by @aditya-radhakrishnan in #4677
  • fix(cli):Supress printing variables to logs during ingestion failure by @atulsaurav in #4566
  • fix(docs): Improving Add Users Doc by @jjoyce0510 in #4679
  • Fix/modal validations by @ShubhamThakre in #4673

New Contributors

Full Changelog: v0.8.32...v0.8.33

Don't miss a new datahub release

NewReleases is sending notifications on new releases.