github apache/iceberg-python pyiceberg-0.9.0
PyIceberg 0.9.0

3 days ago

Full Changelog: pyiceberg-0.8.0...pyiceberg-0.9.0

There have been 243 new commits since the last minor release, 0.8.0, including 148 commits from various contributors and 95 from Dependabot. This release features contributions from 63 unique contributors, including 33 first-time contributors.

What's Changed

New Features

  • Introduced the capability to perform UPSERT operations on their table directly within PyIceberg.
  • Added support for dynamic overwrites as an optimization when an entire partition is replaced.
  • Implemented namespace_exists functionality for the REST catalog.
  • Extended the table updates to include new remove-snapshot-ref and remove-snapshot action
  • Added view_exists method to the REST catalog as a part of the effort to add view support to the REST Catalog.
  • Implemented support for Alibaba OSS protocol in PyArrowFileIO
  • Introduced read support for the Iceberg V3 spec.
  • Added support for Location Providers for tables which includes the ObjectStoreLocationProvider and also enables for custom write paths for both data and metadata.
  • Extended S3FileIO operations to allow for cross region read support.
  • Introduced support to convert Iceberg table scan to polars DataFrame and LazyFrame.
  • Added support for the all_manifests metadata table.
  • Implemented support for writes to bucket partitioned tables.
  • Added automatic metadata cleanup to iceberg tables via write.metadata.delete-after-commit.enabled.
  • Introduced syntactic sugar for and and or operations in filters.
  • Implemented configurable S3 request timeout settings for better performance tuning.
  • Add support to use apache/iceberg-rest-fixture image for integration tests
  • Introduced support to update table statistics
  • Add support for Bucket and Truncate transforms utilizing pyiceberg_core (iceberg-rust)
  • Add support for column projections from partition metadata
  • Add support for ResidualEvaluator

Deprecations

Catalog & Table Identifiers

  • Parsing catalog-level identifiers in Catalog references is deprecated
    • Please refer to tables using only their namespace and table name
  • Table.identifier property is deprecated
    • Use Table.name() instead

Expression Parsing

  • Parsing expressions with table names is deprecated
    • Only provide field names in row_filter

Configuration Properties

  • rest.authorization-url property is deprecated
    • Use oauth2-server-uri instead
  • gcs.endpoint property is deprecated
    • Use gcs.service.host instead
  • Properties starting with adlfs. are deprecated
    • Use properties that start with adls.

Table API Changes

  • project_table is deprecated
    • Use ArrowScan.to_table() instead
    • Use ArrowScan.to_record_batches() instead

Name Mapping

  • NameMapping.find is deprecated
    • Use apply_name_mapping instead

Table Update Field Removal

  • The initial_change field has been removed from table updates, affecting:
    • AddSchemaUpdate
    • AddPartitionSpecUpdate
    • AddSortOrderUpdate

Table Class Refactoring
Several table classes have been moved to private classes:

  • pyiceberg.table.Movepyiceberg.table.update.schema._Move
  • pyiceberg.table.MoveOperationpyiceberg.table.update.schema._MoveOperation
  • pyiceberg.table.DeleteFilespyiceberg.table.update.snapshot._DeleteFiles
  • pyiceberg.table.FastAppendFilespyiceberg.table.update.snapshot._FastAppendFiles
  • pyiceberg.table.MergeAppendFilespyiceberg.table.update.snapshot._MergeAppendFiles
  • pyiceberg.table.OverwriteFilespyiceberg.table.update.snapshot._OverwriteFiles

Table Properties Refactoring
Several constants have been moved to TableProperties:

  • DEFAULT_MAX_SNAPSHOT_AGE_MSTableProperties.MAX_SNAPSHOT_AGE_MS_DEFAULT
  • DEFAULT_MIN_SNAPSHOTS_TO_KEEPTableProperties.MIN_SNAPSHOTS_TO_KEEP_DEFAULT

Documentation Updates

  • Added documentation for the new UPSERT operation support.
  • Added documentation of the new LocationProvider feature.
  • Improve the "How to Release" documentation.
  • Add documentation linking to community contributing guidelines
  • Add documentation on nightly build

Bug Fixes

  • Fixed KeyError in add_files for Parquet files missing column stats.
  • Fixed Table.scan case sensitivity handling.
  • Resolved TypeError in create_match_filter for composite keys.
  • Allowed leading underscore in column name used in row filter.
  • Ensured correct statistics updates by removing redundant snapshot_id in SetStatisticsUpdate.
  • Fixed namespace existence check for multi-level namespaces in SqlCatalog.
  • Improved handling of S3 request timeouts.
  • Fixed TypeError in composite key joins.

Dependencies

  • Remove python 3.13 upper bound restriction
  • Remove fsspec upper bound restriction
  • Bump PyArrow to 19.0.0

Infra

  • Improve and automate release process using github workflow
  • Add support for testpypi nightly build
  • Add codespell to pre-commit
  • Replace pycln with ruff

Commits

Features

Documentations

Bug Fixes

Dependencies

Infra

Dependabot

New Contributors

Don't miss a new iceberg-python release

NewReleases is sending notifications on new releases.