github dlt-hub/dlt 1.28.0

6 hours ago

dlt 1.28.0 Release Notes

Breaking Changes

  1. refresh="drop_data" on Delta and persistent-catalog Iceberg no longer frees storage (#4051 @rudolfix) — Truncation is now a transactional delete that keeps the table, schema, version history, and data files (retained for time travel until vacuum). Previously the files were deleted. This corrects prior erroneous behavior, but pipelines relying on drop_data to reclaim disk space will no longer see storage freed without an explicit vacuum.
  2. replace now fully truncates empty and orphaned tables (#4010 @rudolfix) — Tables belonging to a replace resource that receive no data in a run (including nested tables, dynamic-name and variant tables) are now consistently truncated. Previously these tables could be left orphaned with stale rows surviving the reload, leaving the dataset in an inconsistent state. Pipelines that implicitly relied on that leftover data will now see those tables emptied.

Highlights

  • Lance destination write optimizations (#4051 @rudolfix) — Namespace/session pooling shares one LanceNamespace + lance.Session across job clients; atomic single-commit-per-table writes uncommitted fragments in parallel and commits them in one version (Append/Overwrite/upsert). replace is now a single Overwrite commit so readers never see a partially-replaced table, and the namespace pool rebuilds handles on credential rotation to avoid ExpiredToken on long loads. Also fixes #3800 (Iceberg 409 Table already exists after drop_sources).
  • Reliable replace / refresh truncation (#4010 @rudolfix) — replace resources now consistently truncate all participating tables even when a load carries no data (nested tables, dynamic names, variants included), and drop_data refresh truncates correctly on append and survives non-existent tables. refresh is now the recommended way to do a full refresh — the replace switch is deprecated. Also fixes #3998 and #4017.
  • Configurable CSV encoding (#4045 @AstrakhantsevaAA) — New write_encoding option lets you choose the encoding of CSV files dlt writes (default utf-8), e.g. utf-8-sig for Excel BOM or latin-1/cp1252 for legacy importers. Set via [normalize.data_writer] write_encoding="latin-1".
  • Refreshable cloud credentials for long-running loads (#4056 @tderk) — Default credential-chain credentials are now passed to external consumers (fsspec, rust crates, fileio) consistently and as refreshable where supported, instead of being frozen once. Fixes ExpiredToken failures on long-held connections (#4003).

Core Library

Features

  • Prune duplicate deps from launcher groups (#4044 @rudolfix) — Deps already present in user requirements are eliminated; numpy/pandas removed from rows→arrow conversion and dashboard deps. Row conversion uses a pure-arrow fast path (up to ~2x faster) or a Python zip path as good as pandas.
  • CDN marimo launcher (#4049 @tetelio) — Serve marimo frontend assets from jsDelivr CDN via configurable --asset-url. Launcher path resolution uses find_spec instead of import_module so notebooks aren't executed before marimo run / streamlit run (~1–1.5s faster port readiness).

Fixes

  • Fix: duckdb refreshes credential_chain secrets to survive temp-token expiry (#4021 @0ywfe) — Adds REFRESH auto so long-held sql_client connections no longer die with ExpiredToken once temporary AWS tokens rotate. Fixes #3987.
  • Bump duckdb to 1.5.3, ducklake to 1.0 (#4055 @rudolfix)
  • Fix: Retry-After: 0 no longer triggers an immediate retry loop (#4043 @AstrakhantsevaAA) — Values ≤ 0 are treated as no actionable hint, letting tenacity's exponential backoff take over. Fixes #4036.
  • Fix: clickhouse insert file quoting (#4018 @rudolfix) — Also bumps the driver version. Fixes #4014.
  • Fix: incremental merge truncates destination on no-data runs (#4000 @burnash) — Port of the 1.27.2 hotfix (#3998) to devel with a regression test.
  • Fix: databricks emits foreign key only when create_indexes is enabled (#4011 @burnash) — Avoids Unity Catalog UC_REFERENTIAL_CONSTRAINT_DOES_NOT_EXIST failures when the matching primary/unique key isn't created.
  • Fix: DuckLake DuckDB-backed catalog attach incorrectly applied META_TYPE 'sqlite' (#3871 @Analect) — Splits the duckdb/sqlite branch so a duckdb:///catalog.duckdb catalog URI attaches cleanly instead of failing on PRAGMA journal_mode=WAL.
  • Fix: mssql ingests parquet row-groups individually to bound ADBC driver memory (#3947 @wtfashwin) — Prevents OOM on parquet files larger than available memory. Closes #3915.

Docs

Chores

  • Retry flaky databricks and motherduck remote tests (#4046 @burnash) — Adds pytest-rerunfailures, scoped to just those two transient destinations via --only-rerun.
  • Fix Arrow string-width assumptions in pandas 3 CI (#4025 @Travior) — Updates deltalake/filesystem reader tests for pandas 3's Arrow-backed string columns. Fixes #4024.
  • Increase marimo cell re-render timeout in dashboard tests (#3991 @burnash) — Bumps 15s → 30s to stop test_multi_schema_selection flaking on slow CI.
  • master → devel merge after hotfixes (#4019 @rudolfix)

New Contributors

Don't miss a new dlt release

NewReleases is sending notifications on new releases.