github snowflakedb/snowpark-python v1.39.0
Release

7 hours ago

1.39.0 (2025-09-17)

Snowpark Python API Updates

New Features

  • Added support for unstructured data engineering in Snowpark, powered by Snowflake AISQL and Cortex functions:
    • DataFrame.ai.complete: Generate per-row LLM completions from prompts built over columns and files.
    • DataFrame.ai.filter: Keep rows where an AI classifier returns TRUE for the given predicate.
    • DataFrame.ai.agg: Reduce a text column into one result using a natural-language task description.
    • RelationalGroupedDataFrame.ai_agg: Perform the same natural-language aggregation per group.
    • DataFrame.ai.classify: Assign single or multiple labels from given categories to text or images.
    • DataFrame.ai.similarity: Compute cosine-based similarity scores between two columns via embeddings.
    • DataFrame.ai.sentiment: Extract overall and aspect-level sentiment from text into JSON.
    • DataFrame.ai.embed: Generate VECTOR embeddings for text or images using configurable models.
    • DataFrame.ai.summarize_agg: Aggregate and produce a single comprehensive summary over many rows.
    • DataFrame.ai.transcribe: Transcribe audio files to text with optional timestamps and speaker labels.
    • DataFrame.ai.parse_document: OCR/layout-parse documents or images into structured JSON.
    • DataFrame.ai.extract: Pull structured fields from text or files using a response schema.
    • DataFrame.ai.count_tokens: Estimate token usage for a given model and input text per row.
    • DataFrame.ai.split_text_markdown_header: Split Markdown into hierarchical header-aware chunks.
    • DataFrame.ai.split_text_recursive_character: Split text into size-bounded chunks using recursive separators.
    • DataFrameReader.file: Create a DataFrame containing all files from a stage as FILE data type for downstream unstructured data processing.
  • Added a new datatype YearMonthIntervalType that allows users to create intervals for datetime operations.
  • Added a new function interval_year_month_from_parts that allows users to easily create YearMonthIntervalType without using SQL.
  • Added a new datatype DayTimeIntervalType that allows users to create intervals for datetime operations.
  • Added a new function interval_day_time_from_parts that allows users to easily create DayTimeIntervalType without using SQL.
  • Added support for FileOperation.list to list files in a stage with metadata.
  • Added support for FileOperation.remove to remove files in a stage.
  • Added an option to specify copy_grants for the following DataFrame APIs:
    • create_or_replace_view
    • create_or_replace_temp_view
    • create_or_replace_dynamic_table
  • Added a new function snowflake.snowpark.functions.vectorized that allows users to mark a function as vectorized UDF.
  • Added support for parameter use_vectorized_scanner in function Session.write_pandas().
  • Added support for the following scalar functions in functions.py:
    • getdate
    • getvariable
    • invoker_role
    • invoker_share
    • is_application_role_in_session
    • is_database_role_in_session
    • is_granted_to_invoker_role
    • is_role_in_session
    • localtime
    • systimestamp

Bug Fixes

Deprecations

Dependency Updates

Improvements

  • Unsupported types in DataFrameReader.dbapi(PuPr) are ingested as StringType now.
  • Improved error message to list available columns when dataframe cannot resolve given column name.
  • Added a new option cacheResult to DataFrameReader.xml that allows users to cache the result of the XML reader to a temporary table after calling xml. It helps improve performance when subsequent operations are performed on the same DataFrame.

Snowpark pandas API Updates

New Features

Improvements

  • Downgraded to level logging.DEBUG - 1 the log message saying that the
    Snowpark DataFrame reference of an internal DataFrameReference object
    has changed.
  • Eliminate duplicate parameter check queries for casing status when retrieving the session.
  • Retrieve dataframe row counts through object metadata to avoid a COUNT(*) query (performance)
  • Added support for applying Snowflake Cortex function Complete.
  • Introduce faster pandas: Improved performance by deferring row position computation.
    • The following operations are currently supported and can benefit from the optimization: read_snowflake, repr, loc, reset_index, merge, and binary operations.
    • If a lazy object (e.g., DataFrame or Series) depends on a mix of supported and unsupported operations, the optimization will not be used.
  • Updated the error message for when Snowpark pandas is referenced within apply.
  • Added a session parameter dummy_row_pos_optimization_enabled to enable/disable dummy row position optimization in faster pandas.

Dependency Updates

  • Updated the supported modin versions to >=0.35.0 and <0.37.0 (was previously >= 0.34.0 and <0.36.0).

Bug Fixes

  • Fixed an issue with drop_duplicates where the same data source could be read multiple times in the same query but in a different order each time, resulting in missing rows in the final result. The fix ensures that the data source is read only once.
  • Fixed a bug with hybrid execution mode where an AssertionError was unexpectedly raised by certain indexing operations.

Snowpark Local Testing Updates

New Features

  • Added support to allow patching functions.ai_complete.

Don't miss a new snowpark-python release

NewReleases is sending notifications on new releases.