github snowflakedb/snowpark-python v1.22.1
Release

7 days ago

1.22.1 (2024-09-11)

This is a re-release of 1.22.0. Please refer to the 1.22.0 release notes for detailed release content.

1.22.0 (2024-09-10)

Snowpark Python API Updates

New Features

  • Added the following new functions in snowflake.snowpark.functions:
    • array_remove
    • ln

Improvements

  • Improved documentation for Session.write_pandas by making use_logical_type option more explicit.
  • Added support for specifying the following to DataFrameWriter.save_as_table:
    • enable_schema_evolution
    • data_retention_time
    • max_data_extension_time
    • change_tracking
    • copy_grants
    • iceberg_config A dicitionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for specifying the following to DataFrameWriter.copy_into_table:
    • iceberg_config A dicitionary that can hold the following iceberg configuration options:
      • external_volume
      • catalog
      • base_location
      • catalog_sync
      • storage_serialization_policy
  • Added support for specifying the following parameters to DataFrame.create_or_replace_dynamic_table:
    • mode
    • refresh_mode
    • initialize
    • clustering_keys
    • is_transient
    • data_retention_time
    • max_data_extension_time

Bug Fixes

  • Fixed a bug in session.read.csv that caused an error when setting PARSE_HEADER = True in an externally defined file format.
  • Fixed a bug in query generation from set operations that allowed generation of duplicate queries when children have common subqueries.
  • Fixed a bug in session.get_session_stage that referenced a non-existing stage after switching database or schema.
  • Fixed a bug where calling DataFrame.to_snowpark_pandas without explicitly initializing the Snowpark pandas plugin caused an error.
  • Fixed a bug where using the explode function in dynamic table creation caused a SQL compilation error due to improper boolean type casting on the outer parameter.

Snowpark Local Testing Updates

New Features

  • Added support for type coercion when passing columns as input to UDF calls.
  • Added support for Index.identical.

Bug Fixes

  • Fixed a bug where the truncate mode in DataFrameWriter.save_as_table incorrectly handled DataFrames containing only a subset of columns from the existing table.
  • Fixed a bug where function to_timestamp does not set the default timezone of the column datatype.

Snowpark pandas API Updates

New Features

  • Added limited support for the Timedelta type, including the following features. Snowpark pandas will raise NotImplementedError for unsupported Timedelta use cases.
    • supporting tracking the Timedelta type through copy, cache_result, shift, sort_index, assign, bfill, ffill, fillna, compare, diff, drop, dropna, duplicated, empty, equals, insert, isin, isna, items, iterrows, join, len, mask, melt, merge, nlargest, nsmallest, to_pandas.
    • converting non-timedelta to timedelta via astype.
    • NotImplementedError will be raised for the rest of methods that do not support Timedelta.
    • support for subtracting two timestamps to get a Timedelta.
    • support indexing with Timedelta data columns.
    • support for adding or subtracting timestamps and Timedelta.
    • support for binary arithmetic between two Timedelta values.
    • support for binary arithmetic and comparisons between Timedelta values and numeric values.
    • support for lazy TimedeltaIndex.
    • support for pd.to_timedelta.
    • support for GroupBy aggregations min, max, mean, idxmax, idxmin, std, sum, median, count, any, all, size, nunique, head, tail, aggregate.
    • support for GroupBy filtrations first and last.
    • support for TimedeltaIndex attributes: days, seconds, microseconds and nanoseconds.
    • support for diff with timestamp columns on axis=0 and axis=1
    • support for TimedeltaIndex methods: ceil, floor and round.
    • support for TimedeltaIndex.total_seconds method.
  • Added support for index's arithmetic and comparison operators.
  • Added support for Series.dt.round.
  • Added documentation pages for DatetimeIndex.
  • Added support for Index.name, Index.names, Index.rename, and Index.set_names.
  • Added support for Index.__repr__.
  • Added support for DatetimeIndex.month_name and DatetimeIndex.day_name.
  • Added support for Series.dt.weekday, Series.dt.time, and DatetimeIndex.time.
  • Added support for Index.min and Index.max.
  • Added support for pd.merge_asof.
  • Added support for Series.dt.normalize and DatetimeIndex.normalize.
  • Added support for Index.is_boolean, Index.is_integer, Index.is_floating, Index.is_numeric, and Index.is_object.
  • Added support for DatetimeIndex.round, DatetimeIndex.floor and DatetimeIndex.ceil.
  • Added support for Series.dt.days_in_month and Series.dt.daysinmonth.
  • Added support for DataFrameGroupBy.value_counts and SeriesGroupBy.value_counts.
  • Added support for Series.is_monotonic_increasing and Series.is_monotonic_decreasing.
  • Added support for Index.is_monotonic_increasing and Index.is_monotonic_decreasing.
  • Added support for pd.crosstab.
  • Added support for pd.bdate_range and included business frequency support (B, BME, BMS, BQE, BQS, BYE, BYS) for both pd.date_range and pd.bdate_range.
  • Added support for lazy Index objects as labels in DataFrame.reindex and Series.reindex.
  • Added support for Series.dt.days, Series.dt.seconds, Series.dt.microseconds, and Series.dt.nanoseconds.
  • Added support for creating a DatetimeIndex from an Index of numeric or string type.
  • Added support for string indexing with Timedelta objects.
  • Added support for Series.dt.total_seconds method.

Improvements

  • Improve concat, join performance when operations are performed on series coming from the same dataframe by avoiding unnecessary joins.
  • Refactored quoted_identifier_to_snowflake_type to avoid making metadata queries if the types have been cached locally.
  • Improved pd.to_datetime to handle all local input cases.
  • Create a lazy index from another lazy index without pulling data to client.
  • Raised NotImplementedError for Index bitwise operators.
  • Display a more clear error message when Index.names is set to a non-like-like object.
  • Raise a warning whenever MultiIndex values are pulled in locally.
  • Improve warning message for pd.read_snowflake include the creation reason when temp table creation is triggered.
  • Improve performance for DataFrame.set_index, or setting DataFrame.index or Series.index by avoiding checks require eager evaluation. As a consequence, when the new index that does not match the current Series/DataFrame object length, a ValueError is no longer raised. Instead, when the Series/DataFrame object is longer than the provided index, the Series/DataFrame's new index is filled with NaN values for the "extra" elements. Otherwise, the extra values in the provided index are ignored.

Bug Fixes

  • Stopped ignoring nanoseconds in pd.Timedelta scalars.
  • Fixed AssertionError in tree of binary operations.
  • Fixed bug in Series.dt.isocalendar using a named Series
  • Fixed inplace argument for Series objects derived from DataFrame columns.
  • Fixed a bug where Series.reindex and DataFrame.reindex did not update the result index's name correctly.
  • Fixed a bug where Series.take did not error when axis=1 was specified.

Don't miss a new snowpark-python release

NewReleases is sending notifications on new releases.