github OpenLineage/OpenLineage 1.38.0
OpenLineage 1.38.0

15 hours ago

Added

  • Spec: Add subset dataset facets to spec #4008 @pawel-big-lebowski
    Add subset dataset facets to OpenLineage specification for representing dataset relationships.
  • Spec: Add DatasetQualityMetricsDatasetFacet #3978 @heron--
    Allow attaching dataset quality information outside of InputDatasetFacet.
  • Spark: Add support for microbatch source write #4018 @tnazarew
    Add support for Spark structured streaming microbatch source write operations.
  • Spark: Add catalog properties to catalog facet #4016 @ddebowczyk92
    Add catalog properties support to Spark integration for better catalog metadata tracking.
  • Spark: Add GCP project ID and location to BigQuery Metastore catalog properties #4039 @ddebowczyk92
    Enhance BigQuery integration with GCP project ID and location in catalog properties.
  • Spark: Add support for COALESCE transformation #3972 @kchledowski
    Add support for tracking COALESCE transformations in Spark jobs.
  • Spark: Add catalog facet when using vanilla Hive tables #3982 @ddebowczyk92
    Add catalog facet support for vanilla Hive table operations.
  • Spark: Make output statistics available within complete event #4013 @pawel-big-lebowski
    Output statistics now available in complete events for better observability.
  • Spark: Add output stats for RDD jobs #3977 @pawel-big-lebowski
    Add output statistics tracking for Spark RDD-based jobs.
  • Java: Add equals and hashcode methods into generated classes #4050 @pawel-big-lebowski
    Improve generated model classes with proper equals and hashcode implementations.
  • dbt: Capture dbt tags #4022 @mobuchowski
    Add support for capturing dbt tags in OpenLineage events.
  • dbt: Add dbt Cloud account ID to DbtRunRunFacet #4017 @mobuchowski
    Add dbt Cloud account ID tracking to dbt run facets.
  • dbt: Update DbtRunRunFacet to add more useful information #3987 @mobuchowski
    Enhance DbtRunRunFacet with additional metadata for better observability.
  • Python: Add GCP Lineage transport #4006 @ddebowczyk92
    Add native Google Cloud Platform Lineage transport for Python client.
  • Python: Add fsspec support for FileTransport #3983 @JDarDagran
    Add fsspec filesystem support to FileTransport for broader filesystem compatibility.
  • Python: Add default tags with OL client version #3980 @kacpermuda
    Automatically add OpenLineage client version as default tag in events.
  • Airflow: Add GCP Composer facets #3986 @gabrysiaolsz
    Add GCP Cloud Composer environment metadata facets to Airflow integration.

Changed

  • dbt: Use alias when naming datasets #4055 @mobuchowski
    Use dbt model aliases when generating dataset names for more accurate lineage.
  • Spark: Serialize event to JSON for logging #4029 @EugeneYushin
    Serialize OpenLineage events to JSON format for improved debug logging.
  • Spark: Respect overridden appName in EventEmitter #4030 @EugeneYushin
    Properly respect user-overridden application names in event emission.
  • Spark: Refactor CLL ExpressionDependencyCollector #4003 @kchledowski
    Refactor column-level lineage expression dependency collector for better maintainability.
  • Spark: Improve logging in IcebergInputStatisticsInputDatasetFacetBuilder #3994 @JDarDagran
    Enhance logging for Iceberg input statistics collection.
  • Spark: Limit external getFileStatus calls when dealing with lots of S3 objects #3985 @pawel-big-lebowski
    Optimize S3 operations by limiting external getFileStatus calls for large object sets.
  • Java/Spark/Hive: Move TransformationInfo to Java client to reuse across integrations #3964 @kchledowski
    Refactor TransformationInfo into shared Java client for cross-integration reuse.
  • Python: Improve logging in AsyncHttpTransport #4026 @dolfinus
    Enhance logging capabilities in asynchronous HTTP transport.
  • Python: Allow type aliases #4000 @JDarDagran
    Support Python type aliases in client code generation.
  • Python: Fix classes generation for almost identical classes #3997 @JDarDagran
    Improve code generation to properly handle nearly identical class definitions.
  • Python: Raise errors if custom token provider cannot be loaded #4014 @dolfinus
    Fail fast with clear errors when custom token providers fail to load.
  • Python: Don't silence import errors in DefaultTransportFactory #4015 @dolfinus
    Improve error visibility by not silencing import errors in transport factory.
  • Python: Import from facet_v2 and event_v2 instead of generated modules #3968 @kacpermuda
    Update import paths to use versioned facet and event modules.
  • Java: Refactor ExecutorService management in OpenLineageClientUtils #4012 @JDarDagran
    Improve thread pool management in Java client utilities.
  • CI: Replace pre-commit with prek across CI and documentation #3965 @JDarDagran
    Migrate from pre-commit to prek for pre-commit hook management.

Fixed

  • Spark: Fix false Hive Glue detection #4053 @jsjasonseba
    Fix incorrect Glue catalog detection due to always attempting ARN resolution.
  • Spark: Fix CLL on hiveless runtimes #4052 @kchledowski
    Fix column-level lineage failures on Spark runtimes without spark-hive package.
  • Spark: Fix missing inputs and CLL on some table creation commands #4031 @kchledowski
    Fix missing input datasets and column-level lineage for CreateDataSourceTableAsSelect and CreateHiveTableAsSelect commands.
  • Spark: Rely on BQ bucket info inside BigQueryIntermediateJobFilter #4044 @EugeneYushin
    Fix BigQuery intermediate job filtering by using bucket configuration.
  • Spark: Fix for TypeNotPresentException/RefreshTableCommand errors in Spark 3.0.2 #4002 @MaciejGajewski
    Add additional exception handling for TypeNotPresentException in Spark 3.0.2.
  • Python: Fix license field in pyproject.toml when using build module #4034 @JDarDagran
    Correct license field specification in Python package metadata.
  • Python: Accept both apikey and api_key in token provider #4045 @kacpermuda
    Support both naming conventions for API key configuration parameter.
  • Java: Fix empty sources jar generation #4037 @EugeneYushin
    Fix build issue causing empty sources JAR files to be generated.

Don't miss a new OpenLineage release

NewReleases is sending notifications on new releases.