dlt 1.28.0 Release Notes
Breaking Changes
refresh="drop_data"on Delta and persistent-catalog Iceberg no longer frees storage (#4051 @rudolfix) — Truncation is now a transactional delete that keeps the table, schema, version history, and data files (retained for time travel untilvacuum). Previously the files were deleted. This corrects prior erroneous behavior, but pipelines relying ondrop_datato reclaim disk space will no longer see storage freed without an explicitvacuum.replacenow fully truncates empty and orphaned tables (#4010 @rudolfix) — Tables belonging to areplaceresource that receive no data in a run (including nested tables, dynamic-name and variant tables) are now consistently truncated. Previously these tables could be left orphaned with stale rows surviving the reload, leaving the dataset in an inconsistent state. Pipelines that implicitly relied on that leftover data will now see those tables emptied.
Highlights
- Lance destination write optimizations (#4051 @rudolfix) — Namespace/session pooling shares one
LanceNamespace+lance.Sessionacross job clients; atomic single-commit-per-table writes uncommitted fragments in parallel and commits them in one version (Append/Overwrite/upsert).replaceis now a singleOverwritecommit so readers never see a partially-replaced table, and the namespace pool rebuilds handles on credential rotation to avoidExpiredTokenon long loads. Also fixes #3800 (Iceberg409 Table already existsafterdrop_sources). - Reliable
replace/refreshtruncation (#4010 @rudolfix) —replaceresources now consistently truncate all participating tables even when a load carries no data (nested tables, dynamic names, variants included), anddrop_datarefresh truncates correctly onappendand survives non-existent tables.refreshis now the recommended way to do a full refresh — thereplaceswitch is deprecated. Also fixes #3998 and #4017. - Configurable CSV encoding (#4045 @AstrakhantsevaAA) — New
write_encodingoption lets you choose the encoding of CSV files dlt writes (defaultutf-8), e.g.utf-8-sigfor Excel BOM orlatin-1/cp1252for legacy importers. Set via[normalize.data_writer] write_encoding="latin-1". - Refreshable cloud credentials for long-running loads (#4056 @tderk) — Default credential-chain credentials are now passed to external consumers (fsspec, rust crates, fileio) consistently and as refreshable where supported, instead of being frozen once. Fixes
ExpiredTokenfailures on long-held connections (#4003).
Core Library
Features
- Prune duplicate deps from launcher groups (#4044 @rudolfix) — Deps already present in user requirements are eliminated; numpy/pandas removed from rows→arrow conversion and dashboard deps. Row conversion uses a pure-arrow fast path (up to ~2x faster) or a Python
zippath as good as pandas. - CDN marimo launcher (#4049 @tetelio) — Serve marimo frontend assets from jsDelivr CDN via configurable
--asset-url. Launcher path resolution usesfind_specinstead ofimport_moduleso notebooks aren't executed beforemarimo run/streamlit run(~1–1.5s faster port readiness).
Fixes
- Fix: duckdb refreshes
credential_chainsecrets to survive temp-token expiry (#4021 @0ywfe) — AddsREFRESH autoso long-heldsql_clientconnections no longer die withExpiredTokenonce temporary AWS tokens rotate. Fixes #3987. - Bump duckdb to 1.5.3, ducklake to 1.0 (#4055 @rudolfix)
- Fix:
Retry-After: 0no longer triggers an immediate retry loop (#4043 @AstrakhantsevaAA) — Values ≤ 0 are treated as no actionable hint, letting tenacity's exponential backoff take over. Fixes #4036. - Fix: clickhouse insert file quoting (#4018 @rudolfix) — Also bumps the driver version. Fixes #4014.
- Fix: incremental merge truncates destination on no-data runs (#4000 @burnash) — Port of the 1.27.2 hotfix (#3998) to devel with a regression test.
- Fix: databricks emits foreign key only when
create_indexesis enabled (#4011 @burnash) — Avoids Unity CatalogUC_REFERENTIAL_CONSTRAINT_DOES_NOT_EXISTfailures when the matching primary/unique key isn't created. - Fix: DuckLake DuckDB-backed catalog attach incorrectly applied
META_TYPE 'sqlite'(#3871 @Analect) — Splits the duckdb/sqlite branch so aduckdb:///catalog.duckdbcatalog URI attaches cleanly instead of failing onPRAGMA journal_mode=WAL. - Fix: mssql ingests parquet row-groups individually to bound ADBC driver memory (#3947 @wtfashwin) — Prevents OOM on parquet files larger than available memory. Closes #3915.
Docs
- Streamlit cookbook entry + Cookbook sidebar restructure into dlt / dlthub subcategories (#4050 @ShreyasGS)
- Langfuse cookbook (#4038 @zilto)
- Apply review feedback on dltHub docs (datasets, data-quality API) (#3975 @ShreyasGS)
- Remove memory warning from delta-rs docs now that the upstream issue is resolved (#4060 @tetelio)
- Restore orphaned content after docs restructuring + add orphan-detection lint step (#4028 @rudolfix)
- Restore orphaned docs on master (#4029 @rudolfix)
- High-contrast dltHub docs button (#4040 @zilto)
- Fix outdated dashboard description (commands + broken link) (#4053 @AstrakhantsevaAA)
- Escape
__in generated CLI docs to prevent markdown bold (#3993 @burnash) - Escape characters to correct
__deployment__.pyfile name in CLI docs (#3988 @nuetu) - Fix typo in egress IPs list (#3992 @tetelio)
- Remove the Star Wars gif (#3994 @AstrakhantsevaAA)
Chores
- Retry flaky databricks and motherduck remote tests (#4046 @burnash) — Adds
pytest-rerunfailures, scoped to just those two transient destinations via--only-rerun. - Fix Arrow string-width assumptions in pandas 3 CI (#4025 @Travior) — Updates deltalake/filesystem reader tests for pandas 3's Arrow-backed string columns. Fixes #4024.
- Increase marimo cell re-render timeout in dashboard tests (#3991 @burnash) — Bumps 15s → 30s to stop
test_multi_schema_selectionflaking on slow CI. - master → devel merge after hotfixes (#4019 @rudolfix)