0.2.0 (2021-12-02)
New Features
- Updated the
Session.createDataFrame()
method for creating aDataFrame
from a Pandas DataFrame. - Added the
Session.write_pandas()
method for writing aPandas DataFrame
to a table in Snowflake and getting aSnowpark DataFrame
object back. - Added new classes and methods for calling window functions.
- Added the new functions
cume_dist()
, to find the cumulative distribution of a value with regard to other values within a window partition,
androw_number()
, which returns a unique row number for each row within a window partition. - Added functions for computing statistics for DataFrames in the
DataFrameStatFunctions
class. - Added functions for handling missing values in a DataFrame in the
DataFrameNaFunctions
class. - Added new methods
rollup()
,cube()
, andpivot()
to theDataFrame
class. - Added the
GroupingSets
class, which you can use with the DataFrame groupByGroupingSets method to perform a SQL GROUP BY GROUPING SETS. - Added the new
FileOperation(session)
class that you can use to upload and download files to and from a stage. - Added the
DataFrame.copy_into_table()
method for loading data from files in a stage into a table. - In CASE expressions, the functions
when()
andotherwise()
now accept Python types in addition toColumn
objects. - When you register a UDF you can now optionally set the
replace
parameter toTrue
to overwrite an existing UDF with the same name.
Improvements
- UDFs are now compressed before they are uploaded to the server. This makes them about 10 times smaller, which can help
when you are using large ML model files. - When the size of a UDF is less than 8196 bytes, it will be uploaded as in-line code instead of uploaded to a stage.
Bug Fixes
- Fixed an issue where the statement
df.select(when(col("a") == 1, 4).otherwise(col("a"))), [Row(4), Row(2), Row(3)]
raised an exception. - Fixed an issue where
df.toPandas()
raised an exception when a DataFrame was created from large local data.