github dlt-hub/dlt 1.23.0

16 hours ago

Breaking Changes

  1. Streamlit dashboard removed (#3674 @rudolfix) — The legacy Streamlit-based pipeline dashboard (dlt pipeline show) has been removed. It was a dead code for a long time.

  2. New sources.<name>.<key> configuration lookup path (#3626 @rudolfix) — Source configuration now supports a compact layout. When a source's section name differs from its resource/source name, dlt now also looks up sources.<name>.<key> in addition to the full sources.<section>.<name>.<key> path. For example, for a source registered under section chess_com with name chess:

    # Before (still works): full qualified path
    [sources.chess_com.chess]
    api_key = "secret"
    
    # New (also works now): compact path using just the source name
    [sources.chess]
    api_key = "secret"
    
    # Credentials follow the same pattern:
    # Full:    sources.chess_com.chess.credentials.api_key
    # Compact: sources.chess.credentials.api_key

    This is breaking if you previously had values at sources.<name> that were unrelated to this source — they will now be resolved where they were previously ignored.

Highlights

  • AI Workbench (#3674 @rudolfix) — New dlt ai CLI command group that turns dlt workspaces into AI-assisted development environments. Includes toolkit system for installing curated skill/rule bundles, pluggable MCP server architecture with composable features (pipeline, workspace, toolkit, secrets), and multi-agent support (Claude Code, Cursor, Codex).

  • Relational normalizer optimization (#3626 @rudolfix) — Major performance improvements to JSON data normalization and schema evolution: 5x faster on flat data, ~2x on nested REST API data, ~1.8x on wide nested data. ISO timestamp parsing improved 2-3x by removing timezone conversions.

  • Iceberg table properties (#3699 @rudolfix) — Adds support for setting Iceberg table and namespace properties via the adapter and configuration.

Core Library

  • Fetch Databricks compute credentials (#3667 @aditypan) — Automatically fetches credentials from Databricks shared/job compute when running dlt in a notebook, fixing the issue of defaulting to SQL warehouse connections.
  • Add override_data_path option to DuckLake ATTACH (#3709 @udus122) — New override_data_path configuration option that appends OVERRIDE_DATA_PATH true to the ATTACH statement, allowing the current DATA_PATH to override the path stored in catalog metadata.
  • Add missing parameters in Paginator Configs (#3658 @aditypan) — Adds missing parameters to PageNumberPaginatorConfig, OffsetPaginatorConfig, and JSONResponseCursorPaginatorConfig.
  • Fix: path traversal in FileStorage (CWE-22) (#3678 @rudolfix) — Replaced os.path.commonprefix() with os.path.commonpath() in FileStorage.is_path_in_storage() to correctly validate path containment using path segments instead of characters.
  • Fix: monotonic wall clock (#3695 @rudolfix) — Improves elapsed time calculation across several places, ensuring load IDs are always monotonic even on systems with clock jitter.
  • Fix: threading issues causing potential locking (#3698 @rudolfix) — Fixes async pool shutdown in extract (now closed with timeout) and corrects synchronization sections in various tests.
  • Fix: dev mode survives attach and reset (#3662 @rudolfix) — Saves dev_mode flag in pipeline local state so it persists across dlt.attach() calls. Detects dev→non-dev transitions and resets working folder cleanly.
  • Fix: respect custom Hugging Face endpoint for dataset card operations (#3696 @jorritsandbrink) — Fixes custom endpoint support broken by subset/dataset card feature by temporarily setting HF_ENDPOINT env var for card operations.
  • Fix: explicit dataset name should be authoritative (#3700 @anuunchin) — Makes the dataset argument passed to the pipeline authoritative, always setting pipeline dataset when restoring state.
  • Fix: start_out_of_range flag with range_start="open" (#3708 @AyushPatel101) — Correctly sets start_out_of_range=True when a row's cursor value equals start_value with range_start="open", fixing delayed can_close() in descending-order pipelines.
  • Fix: LanceDB SQL view creation with dataset_name=None (#3710 @Travior) — Handles the case where dataset_name is None in LanceDBSqlClient.create_view, preventing None prefix in view names.

Docs

  • Fix docstring typo in BigQuery factory (#3705 @dnskr)

New Contributors

Don't miss a new dlt release

NewReleases is sending notifications on new releases.