🚀 Performance improvements
- Improve binview extend/ifthenelse (#18164)
- Start on better Parquet delta decoding (#18049)
- Rechunk group-by __iter__ (#18162)
- Tune jemalloc to not create muzzy pages (#18148)
- Reduce default async thread count (#18142)
- Make expensive selector expansion lazy (#18118)
- Use single threaded algorithms if only 1 core given (#18101)
- Use
Arc<Vec<_>>
instead ofArc<[_]>
for paths and hive partitions (#18066) - SIMD View from
FixedSizeBinary
(#18059) - Use bitmask to filter Parquet predicate-pushdown items (#17993)
- Zerocopy buffers for
FixedSizeBinary
toBinaryView
cast (#18043)
✨ Enhancements
- Create literals for datetime/date expressions (#18184)
- Create literals in 'datetime' expression (#18182)
- Expose top-level "has_header" param for
read_excel
andread_ods
(#18078) - Raise on invalid 'is_between' and improve error message quality (#18147)
🐞 Bug fixes
- Fix struct shift and list builder (#18189)
- Don't load Parquet nested metadata (#18183)
- Throw bigidx error for Parquet row-count (#18154)
- Fix unpivot on empty df (#18179)
- Don't vertically parallelize cse contexts (#18177)
- Ensure default values are included when saving/restoring the current
Config
state (#18151) - Properly handle empty Parquet row groups with no dictionary (#18161)
- Struct outer nullabillity (#18156)
- Fix pyarrow predicate pushdown regression (#18145)
- Prevent unwanted supertype cast in 'search_sorted' (#18143)
- Parquet with
filter=None
(#18139) - Don't raise when converting from pandas if index contains duplicate names when
include_index=False
(the default) (#18133) - Fix cast Float to String where Float is not turn to Integer before turning to String (#18123)
- Don't remove leading whitespace in
read_csv
(#18131) - Py-polars compilation with no features (#18129)
- String transform
to_titlecase
was too narrowly defined (#18122) - Reading Parquet with Null dictionary page (#18112)
- When setting
write_excel
column totals, don't forget to include any row-total cols (#18042) - Incorrect lazy CSV
select(len())
for compressed files (#18067) - Fix
sink_ipc_cloud
panicking with runtime error (#18091) - Properly write Parquet for sliced lists (#18073)
- Panic reading multiple CSV files from cloud (#18056)
- Fix
CloudWriter
to use buffer before making requests (#18027) - Fix typos and remove trailing whitespace (#18024)
- Handle
cfg(feature)
forshrink_dtype
(#18038)
📖 Documentation
- Fix references to old methods in
lazy
docstring (#18178) - Include PyCapsule Interface in DataFrame and Series API docs (#18174)
- Corrected example result in group_by docs (#18169)
- Mention 'Array' in data types overview (#18060)
- Correct concat rechunk in user guide (#18080)
- Fix typo in title of Hugging Face docs page (#18097)
- Update pivot docstring for clarity (#18000)
🛠️ Other improvements
- Remove unneeded growable (#18165)
- Update Cargo.lock to fix build error on Linux (#18153)
- Remove Nth,Wildcard from ExprIR and make conversion falllible (#18115)
Thank you to all our contributors for making this release possible!
@EricTulowetzke, @KDruzhkin, @MarcoGorelli, @Vincenthays, @alexander-beedie, @coastalwhite, @davanstrien, @deanm0000, @ember91, @kylebarron, @mcrumiller, @nameexhaustion, @orlp, @philss, @ritchie46 and @rosstitmarsh