What's Changed
- respect ipc column ordering by @ritchie46 in #3591
- zfill expression by @ritchie46 in #3593
- Patch release by @ritchie46 in #3595
- Fix TOML typos by @ryanrussell in #3598
- Anonymous scan lazyframe by @universalmind303 in #3561
ljust
andrjust
expressions by @ritchie46 in #3603- cast string to categorical in 'is_in' by @ritchie46 in #3606
- python data type units by @ritchie46 in #3609
- unset sorted metadata on append by @ritchie46 in #3610
- feat(nodejs): scan json by @universalmind303 in #3611
- Expand regex function input by @ritchie46 in #3613
- node 0.5.3 release by @universalmind303 in #3612
- improve when then otherwise for lists by @ritchie46 in #3614
- python polars 0.13.44 by @ritchie46 in #3615
- Fix mode for multiple modes by @GregoryBL in #3566
- fix empty list edge case by @ritchie46 in #3621
- fix invalid concat dtype by @ritchie46 in #3622
- respect n_rows by @ritchie46 in #3624
- Python:
scan_ipc/parquet
can scan from fsspec sources e.g.s3
. by @ritchie46 in #3626 - Fix Series init (as pl.Object dtype) from mixed-type input and extend test coverage by @alexander-beedie in #3627
- restrict parallel branches in lazy Union by @ritchie46 in #3628
- native exp expression by @ritchie46 in #3629
- python dict parallel dataframe creation by @ritchie46 in #3630
- Enhanced column typedef/inference support for DataFrame init by @alexander-beedie in #3633
- fix row count file projection pushdown by @ritchie46 in #3635
- fix list concat by @ritchie46 in #3636
- rust publish makefile by @ritchie46 in #3637
- improve explode of empty lists by @ritchie46 in #3638
- Improve numpy ufunc support. fixes: #3228 by @ghuls in #3583
- Update various python build requirements. by @ghuls in #3641
- is_in for struct dtype by @ritchie46 in #3639
- Update black and change some code so is sees it as a call chain. by @ghuls in #3645
- concat list determine supertype by @ritchie46 in #3649
- update arrow by @ritchie46 in #3650
- Parallel csv writer by @ritchie46 in #3652
- fix groups state in complex aggregation by @ritchie46 in #3656
- Rust Comment Readability Fixes by @ryanrussell in #3662
- Add Expr.reverse() Python API example by @cnpryer in #3660
- Added StringCache Python API example by @cnpryer in #3659
- improve dtype selection by @ritchie46 in #3664
- accept regex in filter by @ritchie46 in #3666
- python: improve html render by @ritchie46 in #3667
- Python: infer_schema_len arg to from_dicts by @ritchie46 in #3669
- Add LICENSE link to py-polars by @gyscos in #3674
- python: fix and test globbing by @ritchie46 in #3675
- python polars 0.13.45 by @ritchie46 in #3676
- Add useful example for pl.StringCache(). by @ghuls in #3677
- Fix StringCache docstring typo by @cnpryer in #3678
- Fix polars.Expr.apply() Python API docs text by @cnpryer in #3661
- Anonymous scan enhancements & cleanup by @universalmind303 in #3657
- add pyarrow install to quickstart setup by @ritchie46 in #3682
- fix oob in sorted groupby by @ritchie46 in #3681
- fix branch supertypes by @ritchie46 in #3683
- fix cargo.toml for docs.rs by @ritchie46 in #3684
- python polars 0.13.46 by @ritchie46 in #3686
- ndjson reader complex types support by @universalmind303 in #3665
- fix groupby aggregation on empty df by @ritchie46 in #3688
- Nodejs groupbyrolling by @universalmind303 in #3670
- Add pl.Expr.hash Python example by @cnpryer in #3679
- Adding 'line-height' at 95% to df _html.py print by @LVG77 in #3691
- unique counts for logical types by @ritchie46 in #3694
- Update arrow and prepare for mutable arithmetics by @ritchie46 in #3695
- Improve lit agg by @ritchie46 in #3702
- panic on invalid groupby rolling input by @ritchie46 in #3703
- docs: Readability improvements in
py-polars
by @ryanrussell in #3700 - docs:
polars-lazy
readability improvements by @ryanrussell in #3701 - Python: parallel concat df by @gunjunlee in #3671
- fix ipc column order by @ritchie46 in #3706
- nodejs release by @universalmind303 in #3698
- add coc by @ritchie46 in #3712
- inplace arithmetic by @ritchie46 in #3709
- format empty df by @ritchie46 in #3719
- Add typing overloads for
DataFrame.hstack()
by @adamgreg in #3697 - Add
Series
toDataFrame.with_columns()
argument annotation by @adamgreg in #3696 - fix rolling groupby ordering with 'by' argument by @ritchie46 in #3720
- allow literal as aggregation by @ritchie46 in #3722
- Improve performance of categorical casting by @ritchie46 in #3724
- Add flag to allow str.contains to search for string literals (#3711) by @alexander-beedie in #3718
- fix join negative keys by @ritchie46 in #3730
- fix arr.get() offsets by @ritchie46 in #3731
- update arrow by @ritchie46 in #3732
- fix from_pandas object null array by @ritchie46 in #3733
- python polars 0.13.47 by @ritchie46 in #3734
- Replace OOB slice indexing with spare_capacity_mut by @saethlin in #3737
- pow fast paths by @ritchie46 in #3738
- Simplify
contains
check that opts-in tocontains_literal
fast-path by @alexander-beedie in #3736 - fix aritmetic bug introduced in #3709 by @ritchie46 in #3741
- check nan in sort by single column by @ritchie46 in #3742
- python fix concat by @ritchie46 in #3743
- patch python polars 0.13.48 by @ritchie46 in #3744
- ternary literal predicates by @ritchie46 in #3747
- python polars 0.13.49 by @ritchie46 in #3748
- unset sorted on take by @ritchie46 in #3756
- reexport polars for extension libraries by @universalmind303 in #3760
- add global pl by @universalmind303 in #3763
arg_where
expression by @ritchie46 in #3757- update arrow by @ritchie46 in #3762
- python lhs power and broadcast by @ritchie46 in #3768
- allow regex expansion in binary/ternary expressions by @ritchie46 in #3769
- str.ends_with/ str.starts_with by @ritchie46 in #3770
- fix bug in agg projections and init tpch schema tests by @ritchie46 in #3771
- always include offset in groupby_dynamic by @ritchie46 in #3779
- Cache file reads
(tpch 2/7) ~5%
faster by @ritchie46 in #3774 - python fix arr.contains type by @ritchie46 in #3782
- improve predicate combination and schema state by @ritchie46 in #3788
- fix duration computation by @ritchie46 in #3790
- Update arrow2 to support IPC Stream Reading with projections by @joshuataylor in #3793
- Some API alignment (missing funcs) between
DataFrame
,LazyFrame
, andSeries
by @alexander-beedie in #3791 - Docs: sort entries within subsections by @alexander-beedie in #3794
- csv don't skip delimiter in whitespace trimming by @ritchie46 in #3796
- don't copy the sorted flag on many operations by @ritchie46 in #3795
- csv don't skip trailing delimiters when infering schema. by @ghuls in #3799
- Allow
date_range
to producedate
ranges as well asdatetime
by @alexander-beedie in #3798 - quarter expression by @ritchie46 in #3797
- Update rustc to 2022-06-22 by @ritchie46 in #3801
- Fix Node installation instructions by @Smittyvb in #3804
- python polars 0.13.50 by @ritchie46 in #3802
- rolling groupby fix index column output order by @ritchie46 in #3806
- Add support for IPC Streaming Read/Write by @joshuataylor in #3783
- chore:
chunked_array
readability improvements by @ryanrussell in #3810 - Add serde feature to field to fix serde feature by @joshuataylor in #3808
- fix join asof on floats by @ritchie46 in #3812
- chore:
/polars/polars-core/src/frame/
readability by @ryanrussell in #3813 - Fixing small typos in docs by @thatlittleboy in #3811
- fix join asof tolerance by @ritchie46 in #3816
- docs: use quotes in pip install instruction by @thatlittleboy in #3820
- Improve parquet reading performance
~35-40%
by @ritchie46 in #3821 - from anyvalue for small integers by @ritchie46 in #3826
- add date offset by @ritchie46 in #3827
- fix sorted unique by @ritchie46 in #3837
- fix ternary groupby
agg_list
/not_aggregated
combination by @ritchie46 in #3835 - don't parallelize upsample by @ritchie46 in #3836
- python fix time divide by zero by @ritchie46 in #3838
- Improve map/apply docstrings by @braaannigan in #3750
- don't cache in-expression window functions by @ritchie46 in #3840
- Hypothesis testing framework integrations for Polars by @alexander-beedie in #3842
- docs: Improve expr.string documentation by @thatlittleboy in #3841
- make hypothesis optional and don't fail if not installed by @ritchie46 in #3849
- update arrow by @ritchie46 in #3848
- python: fix time conversion by @ritchie46 in #3851
- Make frame/series asserts more resilient against integer overflow by @alexander-beedie in #3850
- parquet: allow writing smaller row groups by @ritchie46 in #3852
- python polars 0.13.51 by @ritchie46 in #3854
- allow branching null with struct dtype by @ritchie46 in #3856
- Address distinction between DataType and DataType() by @alexander-beedie in #3857
- Deprecate df/ldf argument to .join by @thomasaarholt in #3855
null_probability
functionality for dataframes/series test strategies. by @alexander-beedie in #3860- Modern style type hints by @stinodego in #3863
- Concise empty class syntax by @stinodego in #3864
- fix groups after take expression by @ritchie46 in #3881
- fix predicate pushdown in union + count expression by @ritchie46 in #3882
- add join/union branch in window cache keys by @ritchie46 in #3884
- Fast/cheap empty
clone
ops by @alexander-beedie in #3883 - parquet read: fix remaining_rows counter by @ritchie46 in #3887
- Parquet writing: reduce heap allocs by @ritchie46 in #3879
- Negative-indexing support for additional functions, and frame-level
take_every
by @alexander-beedie in #3888 - Make numpy an optional requirement by @stinodego in #3861
- Address deprecation warnings while running pytest by @stinodego in #3889
- Fix reading of gzipped CSV files. Fixes: #3895 by @ghuls in #3896
- Relocate hypothesis unit tests to parallel
tests_parametric
dir by @alexander-beedie in #3899 - Assign dtypes to expected columns when dtypes is a list and column se… by @ghuls in #3901
- docs: fix link to series method in DataFrame by @duskmoon314 in #3897
- docs: Improve py-polars docs by @thatlittleboy in #3873
- Complete pythonic slice support (inc. negative indexing/stride) for DataFrame and Series by @alexander-beedie in #3904
- Update docstring outputs by @ghuls in #3912
- Make embedded CSV test strings easier to read. by @ghuls in #3907
- Quiet an unnecessary warning (tests), and minor optimisation for slices with negative stride by @alexander-beedie in #3913
- fix dataframe explode with empty lists by @ritchie46 in #3916
- Implement pow/rpow for Series by @stinodego in #3908
- Fix Series
__setitem__
andtake
by @stinodego in #3910 - fix negative offset in groupby_rolling by @ritchie46 in #3918
- make string formatting configurable by @ritchie46 in #3919
- Expr docstrings by @braaannigan in #3871
- parquet: parallelize over row groups
~3x
by @ritchie46 in #3924 - Don't unwrap IPC Stream, instead use ? to not panic by @joshuataylor in #3927
- Corrected .select type hint to Sequence[str, Expr] by @thomasaarholt in #3931
- add impl from anyvalue for literal by @savente93 in #3921
- update arrow: ipc limit and reduce categorical-> dictionary bound checks by @ritchie46 in #3926
- fix window expression case by @ritchie46 in #3937
- fix oob panic on expand_at_index and series from pyarrow chunkedarray by @ritchie46 in #3938
- block equality/ordering based predicates on null producing joins by @ritchie46 in #3939
- Extended
with_columns
to allow **kwargs style named expressions by @alexander-beedie in #3917 - upcast float16 to float32 by @ritchie46 in #3940
- python: fix already mutable borrowed append by @ritchie46 in #3943
- Fixed
assert_frame_equal
andassert_series_equal
for NaN values by @alexander-beedie in #3941 - Add from_numpy constructor by @stinodego in #3944
- Fix Pandas date_range warnings in tests by @zundertj in #3945
- fix ipc ordering by @ritchie46 in #3947
- Remove "import polars as pl" from docstrings by @zundertj in #3948
- [docs] improve python polars documentation by @thatlittleboy in #3954
- Modern style type hints for the test suite by @stinodego in #3949
- Fixed most
See Also
docstring formatting, quietened the last warnings coming fromdoctests
by @alexander-beedie in #3932 - python: loossen truncate sorted restriction in docstring by @ritchie46 in #3956
- groupby apply: use inner type to infer dtype by @ritchie46 in #3955
- python polars 0.13.52 by @ritchie46 in #3957
- Fix pytest warning by @stinodego in #3962
- Update README.md by @cxtruong70 in #3959
- implicit datelike string comparison warning by @ritchie46 in #3967
- fix count union predicate by @ritchie46 in #3969
- docs: conventions, mwe and docstring fixes by @thatlittleboy in #3973
- Pythonic slice support for
LazyFrame
(efficient computation paths only) by @alexander-beedie in #3970 - add from_numpy to docs by @thatlittleboy in #3976
- use bitflags crate by @ritchie46 in #3978
- fix accidentally slow cross join by @ritchie46 in #3980
- ensure main lazyframe gets file cache opt state by @ritchie46 in #3981
- chore(tests): small readability fixes by @ryanrussell in #3989
- Remove unnessary imports by @zundertj in #3988
- Add support for loading a collection of parquet files by @andrei-ionescu in #3894
- improve from dictionary -> categorical by @ritchie46 in #3996
- fix col aggregation schema and ternary on empty series by @ritchie46 in #3995
- release memory on 0% selectivity by @ritchie46 in #4000
- col(dtypes).exclude() by @ritchie46 in #4001
- fix explode offsets for empty lists by @ritchie46 in #4005
- reduce peak memory of reading parquet by row groups
~-22%
by @ritchie46 in #4006 - fix rolling groupby with negative windows by @ritchie46 in #4010
- fix: Lazyframe::from(lp) #3877 by @universalmind303 in #4012
- Date encode types by @ritchie46 in #4013
- csv: allow multiple null values by @ritchie46 in #4016
- python polars 0.13.53 by @ritchie46 in #4017
- Improve lazy state struct by @ritchie46 in #4008
- python: fix pyarrow imports by @ritchie46 in #4025
- fix lazy schema by @ritchie46 in #4027
- Align the exclude docstrings and annotation by @thatlittleboy in #4020
- docs: add mwe and internal links by @thatlittleboy in #4019
- impl explode for nested lists by @ritchie46 in #4028
- allow joining on expressions by @ritchie46 in #4029
- allow nulls last in sort by expressions by @ritchie46 in #4030
- python polars 0.13.54 by @ritchie46 in #4031
- feat: implement contains for DataFrame and LazyFrame by @thatlittleboy in #4035
- Remove py-polars legacy package by @stinodego in #4037
- Native trigonometry functions by @stinodego in #4034
- parquet: stop reading when slice is reached by @ritchie46 in #4046
- fix cross join by @ritchie46 in #4045
- More trigonometry by @stinodego in #4047
- Update flake8 settings by @stinodego in #4038
- pivot: fix categorical logicaltype by @ritchie46 in #4048
- Update mypy settings by @stinodego in #4049
- fix: reproducible Expr.hash by @thatlittleboy in #4033
- Fix constructor
orient
type hint by @stinodego in #3961 - Improve coverage report settings by @stinodego in #4039
- Added
literal
param to string-replace functions, optimizedreplace
performance in small-string regime (30-80% faster) by @alexander-beedie in #4057 - parquet: low memory arg by @ritchie46 in #4050
- Upgrade Windows 10 tests, benchmark and doc jobs to Python3.10 by @zundertj in #4059
- Revert "Upgrade Windows 10 tests, benchmark and doc jobs to Python3.10" by @ritchie46 in #4062
- fill_null expr: ensure minimal supertype by @ritchie46 in #4061
- Fix connector-x integration for PostgreSQL by @valxv in #4063
- node updates by @universalmind303 in #3984
- python polars 0.13.55 by @ritchie46 in #4064
- Handle wrong input for
orient
argument by @stinodego in #4065 - Turn on doctests; fix wrong examples by @zundertj in #4060
- Mypy warn redundant casts by @zundertj in #4055
- Add mypy optional error codes by @stinodego in #4054
- recursively convert arrow logical types in to_arrow by @ritchie46 in #4067
- improve unique performance by @ritchie46 in #4070
- Small formatting fixes by @stinodego in #4071
- [mypy] Add error codes by @stinodego in #4072
- reduce contention of global string cache:
>4x
performance improvement by @ritchie46 in #4078 - Add lazy() method to LazyFrame by @zundertj in #4077
- [flake8] Enable flake8-bugbear extension by @stinodego in #4073
- csv: allow reading with different eol character by @ritchie46 in #4080
- docs: rework some MWE and minor formatting fixes by @thatlittleboy in #4082
- Upgrade maturin to 0.13.0 by @messense in #4086
- dataframe display: use POLARS_FMT_STR_LEN by @ritchie46 in #4088
- don't allow comparing local categoricals by @ritchie46 in #4087
- implement list hash for simply nested lists by @ritchie46 in #4090
- improve error on missing column access by @ritchie46 in #4095
- value_counts add sorted argument by @ritchie46 in #4094
- from_rows improve schema correctness by @ritchie46 in #4097
- Cache length of
ChunkedArray
. by @ritchie46 in #4105 - fix explode with empty lists by @ritchie46 in #4113
- fix so rank by @ritchie46 in #4114
- fix explode for sliced arrays by @ritchie46 in #4115
- python: to_numpy use first type as supertype by @ritchie46 in #4116
- python: remove css line for vscode by @ritchie46 in #4117
- Remove read_excel hacks by @cnpryer in #4081
- python allow set by string by @ritchie46 in #4118
fill_nan
preserve name by @ritchie46 in #4119- Fix prefix/suffix docstrings. by @ghuls in #4122
- allow summing of duration in selection context by @ritchie46 in #4124
- python: improve setitem by @ritchie46 in #4121
- python polars 0.13.56 by @ritchie46 in #4127
- Assert deprecation warning on DataFrame.setitem in tests by @zundertj in #4126
- Run PR workflows on definition changes by @zundertj in #4125
- fix 'fatal: unsafe repository' in python build by @ritchie46 in #4129
- Nested dict by @ritchie46 in #4131
- improve performance of building global string cache from arrow dictio… by @ritchie46 in #4132
- csv writer quote if string contains new line char by @ritchie46 in #4134
- fix explode edge cases by @ritchie46 in #4133
- add pl.cut utility by @ritchie46 in #4137
- python polars 0.13.57 by @ritchie46 in #4141
- Mypy disallow untyped calls by @ritchie46 in #4140
- Improve re-raises of Exceptions by @zundertj in #4142
- pivot fix categorical index by @ritchie46 in #4149
- Fix typo by @stinodego in #4146
- Wrap long strings by @stinodego in #4144
- Fix Python line lengths to 88 characters by @stinodego in #4152
- add
is_in
for categoricals by @ritchie46 in #4153 - python 0.13.58 by @ritchie46 in #4154
- Docstring lints & improvements by @stinodego in #4155
- pivot: fix logical type of multiple indexes by @ritchie46 in #4159
- more tests by @ritchie46 in #4163
- Use latest arrow2 to support latest nightly rust by @gyscos in #4162
- Fix invalid inputs for trigonometric functions by @stinodego in #4164
- update schema in udfs by @ritchie46 in #4165
- python: expose idx type by @ritchie46 in #4167
- Improve getitem for Dataframe/Series. by @ghuls in #4160
- Dataframe equality by @stinodego in #4076
- Docstring improvements & enable lints by @stinodego in #4161
- Native implementation of the sign function by @stinodego in #4147
- Minor docs updates by @stinodego in #4173
- Validation for groupby arguments by @stinodego in #4176
- update arrow by @ritchie46 in #4177
- throw error on schema failure by @ritchie46 in #4178
- with_columns update on duplicates by @ritchie46 in #4179
- fold regex expand by @ritchie46 in #4181
- python: prefer pyarrow when we can memory map the file by @ritchie46 in #4182
- window functions: sort cached groups if needed by @ritchie46 in #4184
- reduce supertype match by calling twice/ allow Some(tz)/None supertype by @ritchie46 in #4186
- Added const empty initializer to DataFrame by @TheDan64 in #4187
- fix utf8 explode for nulls and empty strings by @ritchie46 in #4189
- type-coercion: ignore unknown untill replaced by @ritchie46 in #4192
- python: always use stdlib http reader and improve memmap ipc reader a… by @ritchie46 in #4193
- slice pushdown for cross joins by @ritchie46 in #4194
- csv: ignore quoted lines in skip lines by @ritchie46 in #4191
- Small fixes in type formatting by @stinodego in #4195
- use native ndjson reader by @ritchie46 in #4196
- python polars: 0.13.59 by @ritchie46 in #4198
- Miscellaneous improvements by @matteosantama in #4203
- Add flake8 extension: comprehensions by @stinodego in #4200
- Add flake8 extension: simplify by @stinodego in #4201
- don't use pyarrow read if we have categoricals in the schema by @ritchie46 in #4205
- python: don't lock gil in arr.contains by @ritchie46 in #4210
- fix nested struct append by @ritchie46 in #4217
- use default context for col upstream col expression type by @ritchie46 in #4219
- ensure weekday starts at 0 by @ritchie46 in #4220
- python datetime consistency by @ritchie46 in #4221
- python: improve error by @ritchie46 in #4223
- Upgrade black, blackdoc, mypy, flake8 by @matteosantama in #4209
- python: ensure utf8 encoding when writing dot file by @ritchie46 in #4225
- convert arrow map to list by @ritchie46 in #4226
- fast path for sorted min/max by @ritchie46 in #4228
- Set no_implicit_reexport = true in
pyproject.toml
by @matteosantama in #4211 - fix and improve rolling_skew by @ritchie46 in #4232
- ternary expr: validate predicate in groupby context by @ritchie46 in #4237
- Overload pl.from_arrow type hints by @matteosantama in #4236
- python: allow horizontal expanding sum by @ritchie46 in #4242
- improve strictness/consistency of when then otherwise by @ritchie46 in #4241
- reinstate old ternary behavior as experimental by @ritchie46 in #4244
- correct dtype for power by @ritchie46 in #4246
- csv: improve data/datetime/bool overwrite by @ritchie46 in #4247
- Release rust 0.23.0 by @ritchie46 in #4248
New Contributors
- @GregoryBL made their first contribution in #3566
- @gyscos made their first contribution in #3674
- @LVG77 made their first contribution in #3691
- @gunjunlee made their first contribution in #3671
- @saethlin made their first contribution in #3737
- @joshuataylor made their first contribution in #3793
- @Smittyvb made their first contribution in #3804
- @thatlittleboy made their first contribution in #3811
- @braaannigan made their first contribution in #3750
- @thomasaarholt made their first contribution in #3855
- @duskmoon314 made their first contribution in #3897
- @savente93 made their first contribution in #3921
- @cxtruong70 made their first contribution in #3959
- @andrei-ionescu made their first contribution in #3894
- @valxv made their first contribution in #4063
- @matteosantama made their first contribution in #4203
Full Changelog: rust-polars-v0.22.1...rust-polars-v0.23.0