1.39.0 (2025-09-17)
Snowpark Python API Updates
New Features
- Added support for unstructured data engineering in Snowpark, powered by Snowflake AISQL and Cortex functions:
DataFrame.ai.complete
: Generate per-row LLM completions from prompts built over columns and files.DataFrame.ai.filter
: Keep rows where an AI classifier returns TRUE for the given predicate.DataFrame.ai.agg
: Reduce a text column into one result using a natural-language task description.RelationalGroupedDataFrame.ai_agg
: Perform the same natural-language aggregation per group.DataFrame.ai.classify
: Assign single or multiple labels from given categories to text or images.DataFrame.ai.similarity
: Compute cosine-based similarity scores between two columns via embeddings.DataFrame.ai.sentiment
: Extract overall and aspect-level sentiment from text into JSON.DataFrame.ai.embed
: Generate VECTOR embeddings for text or images using configurable models.DataFrame.ai.summarize_agg
: Aggregate and produce a single comprehensive summary over many rows.DataFrame.ai.transcribe
: Transcribe audio files to text with optional timestamps and speaker labels.DataFrame.ai.parse_document
: OCR/layout-parse documents or images into structured JSON.DataFrame.ai.extract
: Pull structured fields from text or files using a response schema.DataFrame.ai.count_tokens
: Estimate token usage for a given model and input text per row.DataFrame.ai.split_text_markdown_header
: Split Markdown into hierarchical header-aware chunks.DataFrame.ai.split_text_recursive_character
: Split text into size-bounded chunks using recursive separators.DataFrameReader.file
: Create a DataFrame containing all files from a stage as FILE data type for downstream unstructured data processing.
- Added a new datatype
YearMonthIntervalType
that allows users to create intervals for datetime operations. - Added a new function
interval_year_month_from_parts
that allows users to easily createYearMonthIntervalType
without using SQL. - Added a new datatype
DayTimeIntervalType
that allows users to create intervals for datetime operations. - Added a new function
interval_day_time_from_parts
that allows users to easily createDayTimeIntervalType
without using SQL. - Added support for
FileOperation.list
to list files in a stage with metadata. - Added support for
FileOperation.remove
to remove files in a stage. - Added an option to specify
copy_grants
for the followingDataFrame
APIs:create_or_replace_view
create_or_replace_temp_view
create_or_replace_dynamic_table
- Added a new function
snowflake.snowpark.functions.vectorized
that allows users to mark a function as vectorized UDF. - Added support for parameter
use_vectorized_scanner
in functionSession.write_pandas()
. - Added support for the following scalar functions in
functions.py
:getdate
getvariable
invoker_role
invoker_share
is_application_role_in_session
is_database_role_in_session
is_granted_to_invoker_role
is_role_in_session
localtime
systimestamp
Bug Fixes
Deprecations
- Deprecated warnings will be triggered when using snowpark-python with Python 3.9. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
Dependency Updates
Improvements
- Unsupported types in
DataFrameReader.dbapi
(PuPr) are ingested asStringType
now. - Improved error message to list available columns when dataframe cannot resolve given column name.
- Added a new option
cacheResult
toDataFrameReader.xml
that allows users to cache the result of the XML reader to a temporary table after callingxml
. It helps improve performance when subsequent operations are performed on the same DataFrame.
Snowpark pandas API Updates
New Features
Improvements
- Downgraded to level
logging.DEBUG - 1
the log message saying that the
SnowparkDataFrame
reference of an internalDataFrameReference
object
has changed. - Eliminate duplicate parameter check queries for casing status when retrieving the session.
- Retrieve dataframe row counts through object metadata to avoid a COUNT(*) query (performance)
- Added support for applying Snowflake Cortex function
Complete
. - Introduce faster pandas: Improved performance by deferring row position computation.
- The following operations are currently supported and can benefit from the optimization:
read_snowflake
,repr
,loc
,reset_index
,merge
, and binary operations. - If a lazy object (e.g., DataFrame or Series) depends on a mix of supported and unsupported operations, the optimization will not be used.
- The following operations are currently supported and can benefit from the optimization:
- Updated the error message for when Snowpark pandas is referenced within apply.
- Added a session parameter
dummy_row_pos_optimization_enabled
to enable/disable dummy row position optimization in faster pandas.
Dependency Updates
- Updated the supported
modin
versions to >=0.35.0 and <0.37.0 (was previously >= 0.34.0 and <0.36.0).
Bug Fixes
- Fixed an issue with drop_duplicates where the same data source could be read multiple times in the same query but in a different order each time, resulting in missing rows in the final result. The fix ensures that the data source is read only once.
- Fixed a bug with hybrid execution mode where an
AssertionError
was unexpectedly raised by certain indexing operations.
Snowpark Local Testing Updates
New Features
- Added support to allow patching
functions.ai_complete
.