🏆 Highlights
- support 'hive partitioning' aware readers (#11284)
- natively support reading parquet for aws, gcp and azure (#11210)
- Add support for Iceberg (#10375)
- The great expressification by @reswqa (#11320, #11344, #11313, #11257, #11288, #11275, #11197, #11167, #11155)
⚠️ Deprecations
- Add
disable_string_cache
(#11020)
🚀 Performance improvements
- improve dynamic_groupby_iter (#11341)
- improve and fix rolling windows by linear scanning (#11326)
- faster init from
pydantic
models that have a small number of fields, and support direct init from SQLModel data (often used with FastAPI) (#11263) - improve outer join materialization (#11241)
- use ryu and itoa for primitive serialization (#11193)
- use try-binary-elementwise instead of try-binary-elementwise-values in dt_truncate (#11189)
- Using cache for str.contains regex compilation (#11183)
✨ Enhancements
- introduce 'label' instead of 'truncate' in group_by_dynamic, which can take
label='right'
(#11337) - Expressify list.shift (#11320)
- top_k and bottom_k supports pass an expr (#11344)
- add "pyxlsb" engine support to
read_excel
(for excel binary workbook files) (#11248) - support 'hive partitioning' aware readers (#11284)
- str.strip_chars supports take an expr argument (#11313)
- sample n can take an expr (#11257)
- Add
disable_string_cache
(#11020) - clip supports expr arguments and physical numeric dtype (#11288)
- Introduce list.drop_nulls (#11272)
- str.splitn and split_exact can take an expr argument by (#11275)
- introduce ambiguous option for dt.round (#11269)
- Adds
NULLIF
andCOALESCE
SQL functions (#11124) - better
tree-formatting
representation (#11176) - natively support reading parquet for aws, gcp and azure (#11210)
- Expressify str.strip_prefix & suffix (#11197)
- Add support for Iceberg (#10375)
- list.join's separator can be expression (#11167)
- argument every of datetime.truncate can be expression (#11155)
🐞 Bug fixes
- Fix
Series.__contains__
for None values and implementis_in
for null Series (#11345) - don't panic on multi-nodes in streaming conversion (#11343)
- ensure trailing quote is written for temporal data when CSV
quote_style
is non-numeric (#11328) - clarify
has_validity
docstring and fix several cases where the presence of a bitmask was used to incorrectly infer the existence ofnull
values (#11319) - fix empty Series construction edge-case with Struct dtype (#11301)
- DataFrame init from
collections.namedtuple
values (#11314) - Exclude functools wrapper frames in
find_stacklevel
(#11292) - set partitions independent of thread pool (#11304)
- address VSCode issue with autocomplete on
selector
expressions in editor/console (#11235) - consume duplicates in rolling_by window (#11261)
- handle url encoded paths in objectpath creation (#11240)
- use POOL when writing csv (#11222)
- don't conflate saved
Config
JSON string with file path (#11098) - is_in for bool evaluate has_false incorrectly (#11217)
- improve handling of database drivers that can return arrow data (#11201)
- fix nullable filter mask in group_by (#11207)
- replace n-th in filter (#11206)
- fix translation of Series-nested datetime/date values for
scan_pyarrow
predicates (#11195) - address unexpected expression name from use of unary
-
or+
operators (#11158) - impl hash for more function expr (#11182)
- list.join's separator can be expression (#11167)
- Add some missing expr type hint for series (#11171)
- consistently use negative every as the default for offset in group_by_dynamic (#11164)
- Make pl.struct serializable (#11169)
- only raise on actual parameter collision when "dtypes" specified in
read_excel
"read_csv_options" (#11162) - propagate null value for str/binary starts/ends_with and contains (#11141)
🛠️ Other improvements
- simplify/clarify group_by_dynamic examples (#11335)
- tighten
assert_frame_equal
for LazyFrames (don't collect until after the schema has been checked) (#11331) - unify display for namespaced function expr (#11342)
- add lazy pivot example (#11325)
- Use
GITHUB_TOKEN
to get contributor information for docs (#11321) - Enable version warning banner (#11322)
- cross-reference
null_count
fromhas_validity
(clarifies the correct way to check for nulls) (#11323) - Pin pydantic in dev requirements
<2.4.0
(#11312) - remove default auto-explode for map_many_private (#11270)
- Add type alias
IntoExprColumn
(#11296) - update a few dependencies (#11283)
- Properly skip ADBC test (#11282)
- Fix some minor Makefile issues (#11276)
- update sponsors (#11271)
- parametric tests for group_by_rolling (#11262)
- Make some list function expr non-anonymous (#11230)
- Mention the
performant
feature only once (#11223) - remove unneeded indirection (#11233)
- remove unneeded mutex around object-store (#11224)
- clarify every/period/offset in group_by_dynamic (#11175)
- Fix
read_database
batch_size
docstring (#11132)
Thank you to all our contributors for making this release possible!
@ByteNybbler, @Cheukting, @Fokko, @Hofer-Julian, @MarcoGorelli, @SeanTroyUWO, @alexander-beedie, @billylanchantin, @jonashaag, @mcrumiller, @orlp, @ptiza, @reswqa, @ritchie46, @stinodego and @universalmind303