github datahub-project/datahub v0.14.0.2

latest releases: v0.14.1, v0.14.1rc2, v0.14.1rc1...
28 days ago

DataHub v0.14.0.2 Release Notes

User Experience

  • Renamed: Validation --> Quality: The Validation tab has been renamed to Quality to make it more intuitive to end-users that it contains outcomes from data quality checks. [#10935]

  • Data Contract UI: A new Data Contract UI is now available under the Quality Tab, allowing users to handle various data assertion types and add/remove contracts more easily. [#10625]

  • Customized Search Ranking: Admins can now configure custom weights for freshness and popularity on assets to ensure that the most relevant results are always at the top of the list. [#10775, #10774]

  • Custom Dataset Names: Business users can now maintain an editable dataset name separate from default properties, providing more control over dataset identification. [#10608]

  • Documentation Propagation Setting Page: A new settings page has been added to the UI for managing Documentation Propagation, giving users more control over how documentation is shared across the platform. [#11038]

Developer Experience

  • NEW: DataHub Open Assertions Specification:

    • Announcing a universal assertions specification for declaring Data Quality checks and compiling them into artifacts for use by 3rd party Data Quality tools like Great Expectations, dbt tests, and Snowflake via Data Quality DMFs. [#10609]
    • Added ability to define data quality rules using a YAML specification file, enabling users to set assertions like volume metrics and conditions, with the ability to compile and schedule them to run on Snowflake as the assertion backend. [#10602]
  • API and SDK Enhancements:

    • New GraphQL APIs added for managing forms, structured properties, and data contracts. [#10826, #10825, #10632]
    • Updates to Java and Python SDKs to support creating and updating structured properties on assets. [#10823, #10824]
    • Support for conditional write semantics including If-Modified-Since, If-Unmodified-Since, and If-Version-Match in MetadataChangeProposals (MCP) and OpenAPI. [#10868]
  • CLI Improvements:

    • A new check server-config command has been added to test server credentials and retrieve diagnostic information. [#10990]
    • The get command now includes a --details/--no-details flag for more detailed output, facilitating easier issue debugging. [#10815]
    • Update to CLI to optionally display server configuration settings. [#10676]
    • Added functionality to the CLI by introducing the ability to assign actors (users or groups) to forms in the forms YAML API. [#10683 ]
  • Improved Logging and Monitoring:

    • Unified request logging implemented across GraphQL, OpenAPI, and Restli requests, including additional information like actor, IP address, and API type. [#10802]
    • New CLI command check server-config added to test server credentials and retrieve diagnostic information. [#10990]
  • Performance Optimizations:

    • Implemented throttling for the mce-consumer based on mae-consumer lag. [#10626]
    • Unified request logging now includes additional information like actor, IP address, and API type across GraphQL, OpenAPI, and Restli requests. [#10802]
    • Added an ASYNC_BATCH mode to the rest sink for improved performance. [#10733]
    • Improved the performance of read queries in Neo4j by specifying labels and combining multiple Neo4j statements within the addEdge function into a single statement, improving efficiency and performance. [#10593, #10598]
  • Security Enhancements:

    • Updated encryption and decryption methods with a stronger cryptographic algorithm. [#11059]
    • Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]

Metadata Ingestion

  • New Ingestion Sources:

    • Azure Blob Storage: Added as a new ingestion source with support for Path Specs. [#10813]
    • Grafana: New connector to ingest dashboards, providing documentation within DataHub for DevOps members on call. [#10891]
    • IBM DB2: Added support for this platform. [#10601]
  • Snowflake Improvements:

    • Enhanced view lineage parsing without query-based lineage/usage. [#10905]
    • Added support for more than 10k views in a Snowflake database. [#10718]
    • Implemented parallel schema extraction for improved performance. [#10653]
    • Added snowflake-queries source for lineage, usage, queries, and operational metadata to improve performance and configurability. [#10835]
  • BigQuery Enhancements:

    • Refactored and parallelized dataset metadata extraction for better performance. [#10884]
    • Added support for new data types including BIGNUMERIC, NUMERIC, DECIMAL, BIGDECIMAL, FLOAT64, and RANGE. [#10950]
    • Added support for ingesting View labels during ingestion. [#10648]
  • Looker Updates:

    • Ingested explore tags into DataHub. [#10547]
    • Fixed issues related to CLL generation when the view definition language is SQL. [#10542]
    • Added support for including platform instance details in URNs for dashboards and charts. [#10771]
  • Other Improvements:

    • dbt: Enhanced flexibility in lineage generation with the new experimental prefer_sql_parser_lineage flag. [#11039]
    • Airflow: Task ownership info can now be set as a group rather than an individual user. [#10742]
    • Athena: Enhanced profiling capabilities to support column quantiles and medians. [#10723]
    • Fivetran: Improved connector performance for faster ingestion. [#10556]
    • SageMaker: Added stateful ingestion capability to remove deleted assets during ingestion runs. [#10573]
    • Tableau: Support added for ingesting multiple Tableau sites in a single configuration, with sites appearing as containers in DataHub. [#10498]
    • Added support for ingesting schemas from schema registry in the Kafka module. [#10612]
    • Introduced a TagsToTermMapper transformer for mapping specific tags to glossary terms. [#10758]
    • Enhanced the SQL lineage parser with an optional default_dialect parameter for customized dialect selection. [#10830]

Other Improvements and Fixes

  • Fixed high vulnerabilities related to sensitive information logging. [#11088]
  • Optimized regular expressions to prevent potential ReDoS vulnerabilities. [#10315]
  • Improved error handling and logging across various modules.
  • Enhanced test coverage for new features and existing functionality.

Breaking Changes

  • Protobuf CLI will no longer create binary encoded protoc custom properties by default.
  • Changes to Data flow info and data job info aspects may require a server upgrade.
  • OpenAPI V3 - Creation of aspects now requires wrapping within a value key.
  • Profiling configuration for Glue source has been updated.

For full details on breaking changes, please refer to the updating guide.

Contributors

Massive shoutout to all of the contributors who made this release possible:

First-Time Contributors

@aabharti-visa, @acrylJonny, @amit-apptware, @AndreasHegerNuritas, @aviv-julienjehannet, @brbrown25, @chardaway, @dragontail, @ipolding-cais, @joelmataKPN, @john-claro-cko, @jordanjeremy, @lima-renan, @nadavgross, @nephtyws, @obaltian, @PeamThom, @pie1nthesky, @pulsar256, @samblackk, @shtephlee, @simaov, @steffengr, @tkdrahn, @TristanHeisler, @wornjs, @xkollar

Repeat Contributors

@ajoymajumdar, @bossenti, @cburroughs, @cccs-eric, @deepgarg-visa, @dushayntAW, @fjmacagno, @githendrik, @haeniya, @jayasimhankv, @k7ragav, @kevin1chun, @ksrinath, @Kunal-kankriya, @looppi, @Masterchen09, @mayurinehate, @ngamanda, @nmbryant, @noggi, @pankajmahato-visa, @PatrickfBraz, @pinakipb2, @Rajasekhar-Vuppala, @rtekal, @sagar-salvi-apptware, @shubhamjagtap639, @siladitya2, @ssilb4, @Sukeerthi31, @sumitappt, @TonyOuyangGit, @walter9388

DataHub Maintainers

@anshbansal, @asikowitz, @chriscollins3456, @darnaut, @david-leifker, @eboneil, @ethan-cartwright, @gabe-lyons, @hsheth2, @jayacryl, @jjoyce0510, @maggiehays, @pedro93, @RyanHolstien, @shirshanka, @sid-acryl, @skrydal, @treff7es, @yoonhyejin

What's Changed

  • fix(ingest/unity-catalog) upstream lineage for hive_metastore external table with s3 location by @dushayntAW in #10546
  • feat(ingestion/looker): ingest explore tags into the DataHub by @sid-acryl in #10547
  • fix(instropection): fix configuration application order by @david-leifker in #10579
  • fix(ingest/slack): pull real names by @hsheth2 in #10565
  • fix(ingest): Remove env deprecation message by @treff7es in #10581
  • test(ingest/sql): refactor CLL generator + add tests by @hsheth2 in #10580
  • docs(remote-ingestion): update description and deployment instructions by @darnaut in #10574
  • fix(ingest): DataProcessInstance.emit_process_end() ignored start_timestamp_millis by @obaltian in #10539
  • fix(ingest/metabase): Fix for query template expressions and invalid URNs for Text Cards by @pulsar256 in #10381
  • feat(graphql): Support tagging incidents and assertions via GraphQL API by @jjoyce0510 in #10575
  • docs(update): updating-datahub by @david-leifker in #10585
  • docs: reorder semantics guide to the bottom by @yoonhyejin in #10541
  • feat(auth): add viewTests platform privilege by @ksrinath in #10413
  • feat(ingestion/SageMaker): Remove deprecated apis and add stateful ingestion capability by @TonyOuyangGit in #10573
  • fix(search): fix autocomplete filter by @david-leifker in #10599
  • fix(ingest/snowflake): handle column level lineage for dbt temporary tables by @john-claro-cko in #10258
  • fix(mae-consumer): fix UpdateIndicesHook ignoring events with forceIndexing property set to true by @Masterchen09 in #10586
  • feat(fieldpaths): prevent duplicate field paths by @david-leifker in #10590
  • docs: update Town Hall page by @maggiehays in #10588
  • fix(search): implement queryByDefault annotation for SearchableRef by @david-leifker in #10603
  • fix(ingest/sagemaker): remove unsupported config by @hsheth2 in #10606
  • feat(neo4j): combine neo4j statements in addEdge into one statement by @deepgarg-visa in #10598
  • feat(neo4j): improve neo4j read query performance by specifying labels by @deepgarg-visa in #10593
  • feat(ingest): fetch connections from the backend by @hsheth2 in #10511
  • feat(graphql): custom complexity calculator and separate configurable thread pool for graphQL by @RyanHolstien in #10562
  • feat(ingest): enable stateful ingestion safety threshold by @hsheth2 in #10516
  • fix(ingest/spark): Bumping OpenLineage version to 0.14.0 by @treff7es in #10559
  • fix(ingest/dbt): only generate one subtype by @hsheth2 in #10615
  • fix(ingest/snowflake): make test connection logs less noisy by @hsheth2 in #10587
  • fix(ingest): move status aspect fixer logic by @hsheth2 in #10591
  • feat(data quality): update models, add assertions cli with snowflake integration by @mayurinehate in #10602
  • fix(gms/autosuggestion): autosuggestion query not returning the result if the query text has a prefix or suffix '-' on the search field by @siladitya2 in #10512
  • feat(consumers): mce-consumer throttling based on mae-consumer lag by @david-leifker in #10626
  • Add support for runAssertion, runAssertions, and runAssertionsForAsset APIs by @noggi in #10605
  • feat(graphql) data contract resolvers for graphql by @jayacryl in #10618
  • Revert "feat(graphql) data contract resolvers for graphql" by @jayacryl in #10631
  • fix(views): Add relationship annotation to GlobalViewsSettings urn by @pedro93 in #10597
  • feat(cli) Delete form references when using delete CLI by @chriscollins3456 in #10629
  • feat(ingest/looker): add ownership info to independent looks by @k7ragav in #10624
  • log(custom-plugins): add additional logging for spring plugins by @david-leifker in #10627
  • refactor(ui/glossary): Clean up term deletion by @asikowitz in #10589
  • fix(views): handle unknown view when resolving a view to a filter by @darnaut in #10640
  • feat(lineage): change query structure for explored hop limit by @RyanHolstien in #10607
  • feat(ingest): measure sink bottlenecking by @hsheth2 in #10628
  • fix(ingest/iceberg): update iceberg source to support newer versions of pyiceberg at runtime by @cccs-eric in #10614
  • feat(ingest/redshift): Adding way to filter s3 paths in Redshift Source by @treff7es in #10622
  • feat(businessAttribute): parallelize-business-attribute-propagation by @deepgarg-visa in #10638
  • docs(ingest): remove trailing comma on athena permission by @nephtyws in #10634
  • doc(roles): update privileges by @ksrinath in #10528
  • docs(subscriptions): adding docs for assertion level subscriptions on managed DH by @jayacryl in #10495
  • feat(ingest): add fast query fingerprinting by @hsheth2 in #10619
  • fix(ingestion/airflow-plugin): updated the document for developers by @dushayntAW in #10633
  • fix(ingest/trino): variable reference before define by @anshbansal in #10646
  • feat(entity-client): restli batchGetV2 batchSize fix and concurrency by @david-leifker in #10630
  • docs(): Adding API docs for incidents, operations, and assertions by @jjoyce0510 in #10522
  • feat(ci): fix conditionals and consolidate change detection by @david-leifker in #10649
  • fix(ingest/snowflake): avoid overfetching schemas from datahub by @hsheth2 in #10527
  • docs: add note for subResourceType being a fieldPath by @anshbansal in #10660
  • fix(ingest/qlik): improve logging for debug by @anshbansal in #10659
  • fix(doc): Fix doc typo in transformer by @sid-acryl in #10658
  • feat(graphql) data contract resolvers by @jayacryl in #10632
  • fix(openapiv3): v3 scroll response fix by @david-leifker in #10654
  • Use type: string for enum schemas by @kevin1chun in #10663
  • fix(ingestion/airflow-plugin): airflow remove old tasks by @dushayntAW in #10485
  • feat(platform): added db2 platform by @pankajmahato-visa in #10601
  • feat(ingestion/kafka)-Add support for ingesting schemas from schema registry by @aabharti-visa in #10612
  • fix(azure_ad): print request URL on error by @darnaut in #10677
  • docs(ingest): Rename csv / s3 / file source and sink by @asikowitz in #10675
  • feat(ingest/glue): database parameters extraction by @skrydal in #10665
  • fix(azure_ad): fix infinite loop on request error by @darnaut in #10679
  • perf(ingestion/fivetran): Connector performance optimization by @shubhamjagtap639 in #10556
  • feat(ingest): make query formatting more robust by @hsheth2 in #10678
  • feat(cli) Add actors to forms yaml API by @chriscollins3456 in #10683
  • doc(glossary): add note for github action for glossary by @anshbansal in #10687
  • feat(cli/data product): add support for institutional memory by @anshbansal in #10686
  • docs(cli/dataset): add dataset CLI get and upsert examples by @anshbansal in #10688
  • feat(ingest/airflow): fix materialize_iolets bug by @hsheth2 in #10613
  • feat(ingest/dbt): include package_name in dbt custom props by @hsheth2 in #10652
  • feat(ingest): add snowflake-summary source by @hsheth2 in #10642
  • feat(ui): Display 'View in Gitlab' if externalUrl is a link to Gitlab by @k7ragav in #10668
  • feat(ingest/cli): optionally show server config by @anshbansal in #10676
  • fix(docs): structured properties openapi guide by @david-leifker in #10671
  • docs(): Announcing DataHub Open Assertions Specification by @jjoyce0510 in #10609
  • fix(metadata-models) bridge gaps between graphql and pegasus models by @jayacryl in #10692
  • Aspect refs inside entity schema are nullable by @kevin1chun in #10695
  • feat(properties) Support custom properties on all entities with profile page by @chriscollins3456 in #10680
  • fix: APPT-43 | Lineage Edit: Modal Autocomplete by @sumitappt in #10569
  • chore(ui/ingest): improve description of executor ID by @darnaut in #10698
  • fix(ingest/fivetran): fix fivetran bigquery support by @hsheth2 in #10693
  • fix(ingest): fix redshift query urns + reduce memory usage by @hsheth2 in #10691
  • fix(operations): fix authorizer on operations controller by @david-leifker in #10701
  • fix(graphql): fix plugin collection by @david-leifker in #10696
  • fix(ingest/bigquery): Map BigQuery policy tags to datahub column-level tags by @sagar-salvi-apptware in #10669
  • fix(ingest/kafka-connect): Add lineage extraction for BigQuery Sink Connector in Kafka Connect source by @sagar-salvi-apptware in #10647
  • fix(search): fixes issue where exact match exclusive flag broke quoted structured search by @nmbryant in #10690
  • feat(openapi): openapi v3 updates by @david-leifker in #10710
  • fix(ingestion/sigma): Fix multiple requests http errors by @shubhamjagtap639 in #10616
  • docs(ingest): Add Oracle prerequisites by @darnaut in #10712
  • feat(gms): add ingestProposalBatch endpoint by @hsheth2 in #10706
  • feat(ingest/snowflake): refactor + parallel schema extraction by @hsheth2 in #10653
  • Expose get_entities_v2 endpoint in python client by @noggi in #10694
  • fix(docs): formatting of transformers code blocks by @walter9388 in #10670
  • feat(ingest/vertica): use 3 part naming by @Rajasekhar-Vuppala in #10636
  • feat(ingest): log http request retries by @hsheth2 in #10715
  • fix(ingestion/bigquery): user exceeded quota for concurrent project.lists requests by @shubhamjagtap639 in #10578
  • fix(ingest): fix dagster plugin release process by @hsheth2 in #10713
  • docs: add customer stories page by @yoonhyejin in #10600
  • feat(ingest/bigquery): Support for View Labels by @ethan-cartwright in #10648
  • feat(observe) expose assertion runId and lastObservedMillis to graphql by @jayacryl in #10726
  • fix(ingest): pin numpy<2 for classification by @hsheth2 in #10725
  • feat(ingest/bigquery): support using table read permission without profiling by @hsheth2 in #10699
  • fix(ingest/looker): fix looker browse paths v2 by @hsheth2 in #10700
  • feat(strucutred-properties): structured properties delete and schema change support by @david-leifker in #10711
  • feat(ingest/snowflake): support more than 10k views in a db by @hsheth2 in #10718
  • feat(cli): Make ingest deploy create recipe with urn if not exists by @pedro93 in #10724
  • fix(ingestion/airflow-plugin): fixed the failing pipeline by @dushayntAW in #10737
  • chore(security): updates for security vulnerabilities by @david-leifker in #10740
  • fix(ingest/dbt): support emitting only model performance by @hsheth2 in #10714
  • config(header): increase header size to 32k by @david-leifker in #10743
  • chore(security): bump jetty version by @darnaut in #10744
  • feat(ingest/snowflake): log queries at info level by @hsheth2 in #10745
  • feat: allow task ownership as group by @fjmacagno in #10742
  • fix(ingest/logging): fix excessive ingestion logging by @pie1nthesky in #10735
  • docs(notifications): add personal notifications docs by @eboneil in #10730
  • feat(ci): update base requirements file by @anshbansal in #10747
  • feat(ui/data-contract): Data contract UI under Validation Tab by @amit-apptware in #10625
  • fix(ingestion/airflow-plugin): emit browsePathV2 by @dushayntAW in #10738
  • fix(ingest/tableau): warn with better error message by @anshbansal in #10749
  • fix(mae-consumer-job): add PE processor to component scan by @pankajmahato-visa in #10751
  • chore(alpine): update alpine base image by @david-leifker in #10754
  • fix(bigquery): use get() instead of hassattr for view labels by @ethan-cartwright in #10756
  • ci(ui): Add prettier to CI by @asikowitz in #10741
  • feat(spark): Adding OpenLineage symlink support to Spark lineage by @treff7es in #10637
  • fix(ingest/iceberg): add support for nested dictionaries when configuring pyiceberg by @cccs-eric in #10762
  • fix(media-type): fix proxy media-type and openapi patch endpoint by @david-leifker in #10763
  • test(mae-consumer): test for injection of pe-consumer by @david-leifker in #10755
  • fix(spark):Add option to disable symlink resolution by @treff7es in #10767
  • feat(openapi): restore Timeline OpenAPIv1 and deprecations by @david-leifker in #10768
  • feat(web-react): adds the possibility to track events through GA4 by @PatrickfBraz in #8231
  • docs: merge cli guide by @yoonhyejin in #10464
  • feat(data quality): custom assertions models, graphql, sdk by @mayurinehate in #10761
  • chore(lint): spotless apply by @david-leifker in #10779
  • feat(protobuf): disable binary protoc custom properties by @david-leifker in #10778
  • feat(docs): Adding docs for custom assertion reporting APIs (WIP) by @jjoyce0510 in #10656
  • feat(schema-registry): enable config endpoint internal schema registry by @david-leifker in #10776
  • fix(ingest): use more aggressive errors with sqlglot by @hsheth2 in #10769
  • feat(ingest/snowflake): performance improvements by @hsheth2 in #10746
  • feat(ingest): add async batch mode to the rest sink by @hsheth2 in #10733
  • feat(search): adjust search config by @david-leifker in #10774
  • fix(ui/entityProfile/dataset): Show view definition tab if viewProperties.logic defined by @asikowitz in #10777
  • fix(ingest/snowflake): fix column batcher by @hsheth2 in #10781
  • docs(acryl cloud): release 0.3.3 by @anshbansal in #10772
  • build(deps-dev): bump vite from 4.5.2 to 4.5.3 in /datahub-web-react by @dependabot in #10199
  • docs(champions): update champions entry by @bossenti in #10721
  • feat(ingest): bump sqlglot by @hsheth2 in #10770
  • fix(ui/data-contract): fix freshness & schema assertion is not working by @amit-apptware in #10795
  • refactor(tags): Use TagUrn class when generating urn by @eboneil in #10786
  • fix(ingest/looker): prevent bad input fields by @hsheth2 in #10785
  • fix(ingest): add status aspect to dataProcessInstance by @hsheth2 in #10757
  • fix(ingest/pipeline): catch pipeline exceptions by @pie1nthesky in #10753
  • chore(gradle): remove httpclient 4 references by @david-leifker in #10787
  • feat(structuredproperties): aggregration fix & docs by @david-leifker in #10780
  • feat(ingest): set pipeline name in system metadata by @hsheth2 in #10190
  • fix(ingest/snowflake): add limits on tables/columns/queries in lineage by @hsheth2 in #10804
  • fix(airflow): fix airflow snowflake tests by @hsheth2 in #10803
  • feat(custom-plugins): improve plugin factory merge by @david-leifker in #10796
  • fix(ingest/snowflake): fix error case in column lineage by @hsheth2 in #10808
  • build(deps): bump braces from 3.0.2 to 3.0.3 in /docs-website by @dependabot in #10681
  • feat(ui): display chart query if it exists by @ngamanda in #10672
  • docs: update api overview by @yoonhyejin in #10543
  • refactor(web-react): add encoder to support non-ASCII characters csv download by @PeamThom in #10496
  • fix(docs) adding dataset column tags docs by @eboneil in #10479
  • build(deps): bump ejs from 3.1.9 to 3.1.10 in /datahub-web-react by @dependabot in #10417
  • fix(metadata-service): consider missing entities in form assignment hook by @Masterchen09 in #10392
  • feat(ingest/powerbi): powerbi dataset profiling by @looppi in #9355
  • fix(ui): show external url also in entity profile of containers by @Masterchen09 in #10390
  • fix(ERModelRelationship) UUID should mimic datahub_guid.py by @rtekal in #10355
  • chore(vulnerability): Inefficient Regular Expression - Potential high time complexity leading to ReDoS by @Sukeerthi31 in #10315
  • chore(vulnerability): Bumped up reactour version to address high vulnerability by @pinakipb2 in #10218
  • build(deps): bump express from 4.18.2 to 4.19.2 in /docs-website by @dependabot in #10128
  • feat(backend): Add new PDL entities + models for persona capture by @jjoyce0510 in #9637
  • feat(logging): unified request logging (graphql, openapi, restli) by @david-leifker in #10802
  • docs: hivePlatformAlias is different by @ssilb4 in #10765
  • fix(ingestion): ingest emails as empty if no ldap attribute by @tkdrahn in #9433
  • fix(patch): consider escaped characters when applying JSON patches by @ipolding-cais in #10717
  • fix(plugin): include ancestors when loading Spring custom plugin by @david-leifker in #10809
  • feat(docker/quickstart): Adding in support for overriding the conflue… by @brbrown25 in #10533
  • feat(ui): Add support for structured reporting of warnings and failures in the UI ingestion flow (ingest uplift 2/2) by @jjoyce0510 in #10790
  • docs(classification): correct the casing for full name infotype by @mayurinehate in #10782
  • feat(ingest): add and use file system abstraction in file source by @simaov in #8415
  • feat(ingest/lookml): ingest field tags by @sid-acryl in #10792
  • fix(assertions): minor changes in custom assertion api by @mayurinehate in #10794
  • build(jar): add datahub-custom-plugin-lib to jar workflow by @david-leifker in #10812
  • feat(SDK) Add FormPatchBuilder in python sdk and provide sample CRUD files by @chriscollins3456 in #10821
  • docs(ingest): add business glossary examples by @eboneil in #9851
  • feat(forms) Handle deleting forms references when hard deleting forms by @chriscollins3456 in #10820
  • refactor(ui): Misc improvements to the setup ingestion flow (ingest uplift 1/2) by @jjoyce0510 in #10764
  • fix(ingestion/airflow-plugin): pipeline tasks discoverable in search by @dushayntAW in #10819
  • feat(ingest/transformer): tags to terms transformer by @sagar-salvi-apptware in #10758
  • fix(ingestion/unity-catalog): fixed issue with profiling with GE turned on by @dushayntAW in #10752
  • feat(forms) Add java SDK for form entity PATCH + CRUD examples by @chriscollins3456 in #10822
  • feat(SDK) Add java SDK for structuredProperty entity PATCH + CRUD examples by @chriscollins3456 in #10823
  • feat(SDK) Add StructuredPropertyPatchBuilder in python sdk and provide sample CRUD files by @chriscollins3456 in #10824
  • feat(forms) Add CRUD endpoints to GraphQL for Form entities by @chriscollins3456 in #10825
  • add flag for includeSoftDeleted in scroll entities API by @kevin1chun in #10831
  • feat(deprecation) Return actor entity with deprecation aspect by @chriscollins3456 in #10832
  • feat(structuredProperties) Add CRUD graphql APIs for structured property entities by @chriscollins3456 in #10826
  • add scroll parameters to openapi v3 spec by @kevin1chun in #10833
  • fix(ingest): correct profile_day_of_week implementation by @jordanjeremy in #10818
  • feat(ingest/glue): allow ingestion of empty databases from Glue by @skrydal in #10666
  • feat(cli): add more details to get cli by @anshbansal in #10815
  • fix(ingestion/glue): ensure date formatting works on all platforms for aws glue by @sagar-salvi-apptware in #10836
  • fix(ingestion): fix datajob patcher by @david-leifker in #10827
  • fix(smoke-test): add suffix in temp file creation by @sid-acryl in #10841
  • feat(ingest/glue): add helper method to permit user or group ownership by @aviv-julienjehannet in #10784
  • Show data platform instances in policy modal if they are set on the policy by @githendrik in #10645
  • docs(patch): add patch documentation for how implementation works by @RyanHolstien in #10010
  • fix(jar): add missing custom-plugin-jar task by @david-leifker in #10847
  • fix: also check exceptions/stack trace when filtering log messages by @Masterchen09 in #10391
  • Update posts.md by @chardaway in #9893
  • chore(ingest): update acryl-datahub-classify version by @cburroughs in #10844
  • refactor(ingest): Refactor structured logging to support infos, warnings, and failures structured reporting to UI by @jjoyce0510 in #10828
  • fix(restli): log aspect-not-found as a warning rather than as an error by @ksrinath in #10834
  • fix(ingest/nifi): remove duplicate upstream jobs by @mayurinehate in #10849
  • fix(smoke-test): test access to create/revoke personal access tokens by @ksrinath in #10848
  • fix(smoke-test): missing test for move domain by @Kunal-kankriya in #10837
  • ci: update usernames to not considered for community by @anshbansal in #10851
  • env: change defaults for data contract visibility by @shirshanka in #10854
  • fix(ingest/tableau): quote special characters in external URL by @ipolding-cais in #10842
  • fix/ added click to perticular result and delay to element visibility by @Kunal-kankriya in #10861
  • ci(ingest): pin dask dependency for feast by @mayurinehate in #10865
  • fix(ingestion/lookml): liquid template resolution and view-to-view cll by @sid-acryl in #10542
  • feat(ingest/audit): add client id and version in system metadata props by @anshbansal in #10829
  • chore(ingest): Mypy 1.10.1 pin by @treff7es in #10867
  • docs: use acryl-datahub-actions as expected python package to install by @aviv-julienjehannet in #10852
  • docs: add new js snippet by @hsheth2 in #10846
  • refactor(ingestion): remove company domain for security reason by @shubhamjagtap639 in #10839
  • fix(ingestion/spark): Platform instance and column level lineage fix by @treff7es in #10843
  • feat(ingestion/tableau): optionally ingest multiple sites and create site containers by @haeniya in #10498
  • fix(ingestion/looker): Add sqlglot dependency and remove unused sqlparser by @sid-acryl in #10874
  • fix(manage-tokens): fix manage access token policy by @david-leifker in #10853
  • Batch get entity endpoints by @kevin1chun in #10880
  • feat(system): support conditional write semantics by @david-leifker in #10868
  • fix(build): upgrade vercel builds to Node 20.x by @hsheth2 in #10890
  • feat(ingest/lookml): shallow clone repos by @hsheth2 in #10888
  • fix(ingest/looker): add missing dependency by @hsheth2 in #10876
  • fix(ingest): only populate audit stamps where accurate by @hsheth2 in #10604
  • fix(ingest/dbt): always encode tag urns by @hsheth2 in #10799
  • fix(ingest/redshift): handle multiline alter table commands by @hsheth2 in #10727
  • fix(ingestion/looker): column name missing in explore by @sid-acryl in #10892
  • fix(lineage) Fix lineage source/dest filtering with explored per hop limit by @chriscollins3456 in #10879
  • feat(conditional-writes): misc updates and fixes by @david-leifker in #10901
  • feat(ci): update outdated action by @anshbansal in #10899
  • feat(rest-emitter): adding async flag to rest emitter by @gabe-lyons in #10902
  • feat(ingest): add snowflake-queries source by @hsheth2 in #10835
  • fix(ingest): improve auto_materialize_referenced_tags_terms error handling by @hsheth2 in #10906
  • docs: add new company to adoption list by @shtephlee in #10909
  • refactor(redshift): Improve redshift error handling with new structured reporting system by @jjoyce0510 in #10870
  • feat(ui) Finalize support for all entity types on forms by @chriscollins3456 in #10915
  • Index ExecutionRequestResults status field by @noggi in #10811
  • feat(ingest): grafana connector by @anshbansal in #10891
  • fix(gms) Add Form entity type to EntityTypeMapper by @chriscollins3456 in #10916
  • feat(dataset): add support for external url in Dataset by @dragontail in #10877
  • docs(saas-overview) added missing features to observe section by @jayacryl in #10913
  • fix(ingest/spark): Fixing Micrometer warning by @treff7es in #10882
  • fix(structured properties): allow application of structured properties without schema file by @gabe-lyons in #10918
  • fix(data-contracts-web) handle other schedule types by @jayacryl in #10919
  • fix(ingestion/tableau): human-readable message for PERMISSIONS_MODE_SWITCHED error by @sid-acryl in #10866
  • feat(ingest/snowflake): Add feature flag for view defintions by @ethan-cartwright in #10914
  • feat(ingest/BigQuery): refactor+parallelize dataset metadata extraction by @mayurinehate in #10884
  • fix(airflow): add error handling around render_template() by @hsheth2 in #10907
  • feat(ingestion/sqlglot): add optional default_dialect parameter to sqlglot lineage by @nadavgross in #10830
  • feat(mcp-mutator): new mcp mutator plugin by @david-leifker in #10904
  • fix(ingest/bigquery): changes helper function to decode unicode scape sequences by @PatrickfBraz in #10845
  • feat(ingest/postgres): fetch table sizes for profile by @pie1nthesky in #10864
  • feat(ingest/abs): Adding azure blob storage ingestion source by @joelmataKPN in #10813
  • fix(ingest/redshift): reduce severity of SQL parsing issues by @hsheth2 in #10924
  • fix(build): fix lint fix web react by @anshbansal in #10896
  • fix(ingest/bigquery): handle quota exceeded for project.list requests by @sagar-salvi-apptware in #10912
  • feat(ingest): report extractor failures more loudly by @hsheth2 in #10908
  • feat(ingest/snowflake): integrate snowflake-queries into main source by @hsheth2 in #10905
  • fix(ingest): fix docs build by @hsheth2 in #10926
  • fix(ingest/snowflake): fix test connection by @hsheth2 in #10927
  • fix(ingest/lookml): add view load failures to cache by @hsheth2 in #10923
  • docs(slack) overhauled setup instructions and screenshots by @jayacryl in #10922
  • fix(airflow): Add comma parsing of owners to DataJobs by @eboneil in #10903
  • fix(entityservice): fix merging sideeffects by @david-leifker in #10937
  • feat(ingest): Support System Ingestion Sources, Show and hide system ingestion sources with Command-S by @jjoyce0510 in #10938
  • chore() Set a default lineage filtering end time on backend when a start time is present by @jjoyce0510 in #10925
  • Added relationships APIs to V3. Added these generic APIs to V3 swagger doc. by @ajoymajumdar in #10939
  • docs: add learning center to docs by @yoonhyejin in #10921
  • doc: Update hubspot form id by @yoonhyejin in #10943
  • chore(airflow): add python 3.11 w/ Airflow 2.9 to CI by @hsheth2 in #10941
  • fix(Ingestor/Glue): Implement column upstream lineage between S3 and Glue by @sagar-salvi-apptware in #10895
  • fix(ingest/abs): Splitting abs utils into multiple files to not include abs specific includes which broke path_spec includes by @treff7es in #10945
  • fix(ingestion/looker): fix doc for sql parsing documentation by @sid-acryl in #10883
  • fix(ingest/bigquery): Adding missing BigQuery types by @treff7es in #10950
  • fix(ingest/setup): Fix for feast and abs source setup by @treff7es in #10951
  • fix(connections) Harden adding /gms to connections in backend by @chriscollins3456 in #10942
  • feat(siblings) Add flag to prevent combining siblings in the UI by @chriscollins3456 in #10952
  • fix(docs): make graphql doc gen more automated by @hsheth2 in #10953
  • feat(ingest/athena): Add option for Athena partitioned profiling by @treff7es in #10723
  • fix(spark-lineage): default timeout for future responses by @deepgarg-visa in #10947
  • feat(datajob/flow): add environment filter using info aspects by @anshbansal in #10814
  • fix(ui/ingest): ingest tab should show with manage ingestion privilege by @anshbansal in #10483
  • feat(ingest/looker): include dashboard urns in browse v2 by @hsheth2 in #10955
  • add a structured type to batchGet in OpenAPI V3 spec by @kevin1chun in #10956
  • fix(ui) Fix scroll on the domain sidebar to show all domains by @chriscollins3456 in #10966
  • fix: resolve incorrect variable assignment for SageMaker API call by @TristanHeisler in #10965
  • fix(airflow/build): Pinning mypy by @treff7es in #10972
  • Fixed a bug where the OpenAPI V3 spec was incorrect. The bug was introduced in #10939. by @ajoymajumdar in #10974
  • fix(ingest/test): Fix for mssql integration tests by @treff7es in #10978
  • fix(entity-service) exist check correctly extracts status by @jayacryl in #10973
  • fix(structuredProps) Fix casing bug in StructuredPropertiesValidator by @chriscollins3456 in #10982
  • bugfix: use anyOf instead of allOf when creating references in openapi v3 spec by @kevin1chun in #10986
  • fix(ui): Remove ant less imports by @asikowitz in #10988
  • feat(ingest/graph): Add get_results_by_filter to DataHubGraph by @asikowitz in #10987
  • feat(ingest/cli): init does not actually support environment variables by @darnaut in #10989
  • fix(ingest/graph): Update get_results_by_filter graphql query by @asikowitz in #10991
  • feat(ingest/spark): Promote beta plugin by @treff7es in #10881
  • feat(ingest): support domains in meta -> "datahub" section by @hsheth2 in #10967
  • feat(ingest): add check server-config command by @hsheth2 in #10990
  • feat(cli): Make consistent use of DataHubGraphClientConfig by @pedro93 in #10466
  • fix(ingest/s3): Fixing container creation when there is no folder in path by @treff7es in #10993
  • fix(ingest/looker): support platform instance for dashboards & charts by @sid-acryl in #10771
  • feat(ingest/bigquery): improve handling of information schema in sql parser by @hsheth2 in #10985
  • feat(ingest): improve ingest deploy command by @hsheth2 in #10944
  • fix(backend): allow excluding soft-deleted entities in relationship-queries; exclude soft-deleted members of groups by @ksrinath in #10920
  • fix(ingest/looker): downgrade missing chart type log level by @hsheth2 in #10996
  • doc(acryl-cloud): release docs for 0.3.4.x by @anshbansal in #10984
  • fix(protobuf/build): Fix protobuf check jar script by @treff7es in #11006
  • fix(ui/ingest): Support invalid cron jobs by @asikowitz in #10998
  • fix(ingest): fix graph config loading by @hsheth2 in #11002
  • feat(docs): Document _DATAHUB_TO_FILE directive by @pedro93 in #10968
  • fix(graphql/upsertIngestionSource): Validate cron schedule; parse error in CLI by @asikowitz in #11011
  • feat(ece): support custom ownership type urns in ECE generation by @hsheth2 in #10999
  • feat(assertion-v2): changed Validation tab to Quality and created new Governance tab by @amit-apptware in #10935
  • fix(ingestion/glue): Add support for missing config options for profiling in Glue by @sagar-salvi-apptware in #10858
  • feat(propagation): Add models for schema field docs, tags, terms (#2959) by @shirshanka in #11016
  • docs: standardize terminology to DataHub Cloud by @yoonhyejin in #11003
  • fix(ingestion/transformer): tranformer to replace the externalUrl in container properties by @sagar-salvi-apptware in #11013
  • docs(slack) troubleshoot docs by @jayacryl in #11014
  • feat(propagation): Add graphql API by @shirshanka in #11030
  • feat(propagation): Add models for Action feature settings by @samblackk in #11029
  • docs(custom properties): Remove duplicate from sidebar by @eboneil in #11033
  • feat(models): Introducing Dataset Partitions Aspect by @jjoyce0510 in #10997
  • feat(propagation): Add Documentation Propagation Settings by @samblackk in #11038
  • fix(models): chart schema fields mapping, add dataHubAction entity, t… by @shirshanka in #11040
  • fix(ci): smoke test lint failures by @shirshanka in #11044
  • docs: fix learning center color scheme & typo by @yoonhyejin in #11043
  • feat: add cloud main page by @yoonhyejin in #11017
  • feat(restore-indices): add additional step to also clear system metadata service by @Masterchen09 in #10662
  • docs: fix typo by @yoonhyejin in #11046
  • fix(lint): apply spotless by @anshbansal in #11050
  • docs(airflow) Example query to get datajobs for a dataflow by @eboneil in #11034
  • feat(cli): Add run-id option to put sub-command by @pedro93 in #11023
  • fix(ingest): improve sql error reporting calls by @hsheth2 in #11025
  • fix(airflow): fix CI setup by @hsheth2 in #11031
  • feat(ingest/dbt): add experimental prefer_sql_parser_lineage flag by @hsheth2 in #11039
  • fix(ingestion/lookml): enable stack-trace in lookml logs by @sid-acryl in #10971
  • (chore): Linting fix by @rtekal in #11015
  • chore(ci): update deprecated github actions by @anshbansal in #10977
  • Fix ALB configuration example by @steffengr in #10981
  • chore(ingestion-base): bump base image packages by @david-leifker in #11053
  • feat(cli): Trim report of dataHubExecutionRequestResult to max GMS size by @pedro93 in #11051
  • fix(ingestion/lookml): emit dummy sql condition for lookml custom condition tag by @sid-acryl in #11008
  • fix(ingestion/powerbi): fix issue with broken report lineage by @sid-acryl in #10910
  • feat(ingest/tableau): add retry on timeout by @hsheth2 in #10995
  • change generate kafka connect properties from env by @wornjs in #10545
  • fix(ingest): fix oracle cronjob ingestion by @lima-renan in #11001
  • chore(ci): revert update deprecated github actions (#10977) by @david-leifker in #11062
  • feat(ingest/dbt-cloud): update metadata_endpoint inference by @hsheth2 in #11041
  • build: Reduce size of datahub-frontend-react image by 50-ish% by @xkollar in #10878
  • fix(ci): Fix lint issue in datahub_ingestion_run_summary_provider.py by @pedro93 in #11063
  • docs(ingest): update developing-a-transformer.md by @acrylJonny in #11019
  • feat(search-test): update search tests from #10408 by @david-leifker in #11056
  • feat(cli): add aspects parameter to DataHubGraph.get_entity_semityped by @Masterchen09 in #11009
  • docs(airflow): update min version for plugin v2 by @hsheth2 in #11065
  • doc(ingestion/tableau): doc update for derived permission by @sid-acryl in #11054
  • fix(py): remove dep on types-pkg_resources by @hsheth2 in #11076
  • feat(ingest/mode): add option to exclude restricted by @anshbansal in #11081
  • fix(ingest): set lastObserved in sdk when unset by @hsheth2 in #11071
  • doc(ingest): Update capabilities by @treff7es in #11072
  • chore(vulnerability): Log Injection by @pinakipb2 in #11090
  • chore(vulnerability): Information exposure through a stack trace by @pinakipb2 in #11091
  • chore(vulnerability): Comparison of narrow type with wide type in loop condition by @pinakipb2 in #11089
  • chore(vulnerability): Insertion of sensitive information into log files by @pinakipb2 in #11088
  • chore(vulnerability): Risky Cryptographic Algorithm by @pinakipb2 in #11059
  • chore(vulnerability): Overly permissive regex range by @pinakipb2 in #11061
  • fix: update customer data by @yoonhyejin in #11075
  • fix(models): fixing the datasetPartition models by @jjoyce0510 in #11085
  • fix(ui): Adding view, forms GraphQL query, remove showing a fallback error message on unhandled GraphQL error by @jjoyce0510 in #11084
  • feat(docs-site) hiding learn more from cloud page by @jayacryl in #11097
  • fix(docs): Add correct usage of orFilters in search API docs by @gabe-lyons in #11082
  • fix(ingest/mode): Regexp in mode name matcher didn't allow underscore by @treff7es in #11098
  • docs: Refactor customer stories section by @yoonhyejin in #10869
  • fix(release): fix full/slim suffix on tag by @david-leifker in #11087
  • feat(config): support alternate hashing algorithm for doc id by @pinakipb2 in #10423
  • fix(emitter): fix typo in get method of java kafka emitter by @rtekal in #11007
  • fix(ingest): use correct native data type in all SQLAlchemy sources by compiling data type using dialect by @Masterchen09 in #10898
  • chore: Update contributors list in PR labeler by @skrydal in #11105
  • feat(ingest): tweak stale entity removal messaging by @hsheth2 in #11064
  • fix(ingestion): enforce lastObserved timestamps in SystemMetadata by @david-leifker in #11104
  • fix(ingest/powerbi): fix broken lineage between chart and dataset by @sid-acryl in #11080
  • feat(ingest/lookml): CLL support for sql set in sql_table_name attribute of lookml view by @sid-acryl in #11069
  • docs: update graphql docs on forms & structured properties by @yoonhyejin in #11100
  • test(openAPI v3): search openAPI test has been added by @Kunal-kankriya in #11049
  • fix(ingest/tableau): prevent empty site content urls by @hsheth2 in #11057
  • feat(entity-client): implement client batch interface by @david-leifker in #11106
  • fix(snowflake): avoid reporting warnings/info for sys tables by @hsheth2 in #11114
  • fix(ingest): downgrade column type mapping warning to info by @hsheth2 in #11115
  • feat(api): add AuditStamp to the V3 API entity/aspect response by @ajoymajumdar in #11118
  • fix(ingest/redshift): replace r'\n' with '\n' to avoid token error redshift serverless… by @AndreasHegerNuritas in #11111
  • fix(entiy-client): handle null entityUrn case for restli by @david-leifker in #11122
  • fix(sql-parser): prevent bad urns from alter table lineage by @hsheth2 in #11092
  • fix(ingest/bigquery): use small batch size if use_tables_list_query_v2 is set by @mayurinehate in #11121
  • fix(graphql): add missing entities to EntityTypeMapper and EntityTypeUrnMapper by @Masterchen09 in #10366
  • Changes to allow editable dataset name by @jayasimhankv in #10608
  • fix: remove saxo by @yoonhyejin in #11127
  • feat(mcl-processor): Update mcl processor hooks by @david-leifker in #11134
  • docs(policies): updates to policies documentation by @david-leifker in #11073
  • fix(openapi): fix openapi v2 and v3 docs update by @david-leifker in #11139
  • feat(auth): grant type and acr values custom oidc parameters support by @RyanHolstien in #11116
  • fix(mutator): mutator hook fixes by @RyanHolstien in #11140
  • feat(search): support sorting on multiple fields by @RyanHolstien in #10775
  • feat(ingest): various logging improvements by @hsheth2 in #11126
  • fix(ingestion/lookml): fix for sql parsing error by @sid-acryl in #11079
  • feat(docs-site) cloud page spacing and content polishes by @jayacryl in #11141
  • feat(ui) Enable editing structured props on fields by @chriscollins3456 in #11042
  • feat(tests): add md5 and last computed to testResult model by @RyanHolstien in #11117
  • test(openapi): openapi regression smoke tests by @david-leifker in #11143
  • fix(airflow): fix tox tests + update docs by @hsheth2 in #11125
  • docs: add chime to adoption stories by @yoonhyejin in #11142
  • fix(ingest/databricks): Updating code to work with Databricks sdk 0.30 by @treff7es in #11158

New Contributors

Full Changelog: v0.13.3...v0.14.0.2

Don't miss a new datahub release

NewReleases is sending notifications on new releases.