github dlt-hub/dlt 1.24.0

10 hours ago

dlt 1.24.0 Release Notes

Breaking Changes

  1. Custom resource metrics now stored as tables (#3718 @rudolfix) — Incremental metrics in the trace are now represented in table format. This changes the location and structure of incremental metrics in the trace object.

Highlights

  • Insert-only merge strategy (#3741 @rudolfix, based on #3372 by @OnAzart) — New insert-only merge strategy that performs idempotent, key-based appending: inserts records whose primary key doesn't exist in the destination while silently skipping duplicates. No updates or deletes. Supported across all SQL destinations, Delta Lake, and Iceberg.
  • Parallelize all sources in Airflow (#3652 @JustinSobayo) — In parallel and parallel-isolated decompose modes, all source components now fan out concurrently from a shared start node. Previously the first source had to complete before others could begin, adding unnecessary wall-clock time. This release also adds basic Airflow 3 support with smoke tests.
  • ClickHouse ReplacingMergeTree support (#3366 @prevostc) — New replacing_merge_tree table engine type for ClickHouse that enables native deduplication and soft deletes via dedup_sort and hard_delete column hints.
  • Custom resource metrics as tables (#3718 @rudolfix) — Resources can now emit custom metrics that are stored as tables in the trace, enabling richer observability for pipelines.

Core Library

  • Insert-only merge strategy (#3741 @rudolfix, based on #3372 by @OnAzart) — See Highlights.
  • ClickHouse ReplacingMergeTree support (#3366 @prevostc) — See Highlights.
  • Parallelize all sources in Airflow (#3652 @JustinSobayo) — See Highlights.
  • Custom resource metrics as tables (#3718 @rudolfix) — See Highlights.
  • Configurable Arrow table concatenation promote_options (#3701 @AyushPatel101) — arrow_concat_promote_options can now be set to "default" or "permissive" instead of the hardcoded "none", enabling automatic type promotion when yielding multiple Arrow tables with slightly different inferred types.
  • Fix: CLI info/show fails on custom destinations (#3676 @anuunchin) — dlt pipeline info/show no longer crashes with UnknownDestinationModule on pipelines using @dlt.destination.
  • Fix: Primary key assignment for incremental resources (#3679 @shnhdan) — Passing primary_key=() to Incremental to disable deduplication is no longer silently overwritten by the resource's own primary key.
  • Fix: MotherDuck missing catalog validation (#3723 @YuF-9468) — Connection strings that omit the catalog/database name (e.g. bare md:) now raise a clear configuration error instead of a confusing connection failure.
  • Fix: BigQuery infinite loop on internal error (#3732 @aditypan) — BigQuery jobs that encounter an internal error no longer cause an infinite retry loop.
  • Fix: SCD2 column order mismatch in SQLAlchemy destinations (#3733 @anuunchin) — SCD2 validity column insert jobs now match the column order of existing tables in SQLAlchemy destinations.
  • Fix: Timezone mapping in SQL timestamp datatype (#3735 @aditypan) — Timezone is now correctly set for timestamp/datetime column datatypes.

Docs

  • Realistic closure-based data masking example (#3617 @veeceey) — Replaced the hardcoded example with a reusable mask_columns() function supporting all sql_database backends.
  • Redirects for removed pages (#3688 @djudjuu)
  • AI workbench license info (#3729 @lis365b)
  • Minor doc fixes (#3734 @anuunchin)

Chores

New Contributors

Don't miss a new dlt release

NewReleases is sending notifications on new releases.