Modin 0.7.0 release notes
Modin 0.7 comes with the largest expansion of the API since the first release. Modin now supports over 83% of the pandas API, up from 71% last release. A number of long awaited features have been implemented to include: I/O support for Dask for parquet and other column stores and groupby
with a list of column names.
Bugfixes + Pandas Concordance (🐛 + 🐼)
- Allow merging of named Series (#879)
- Correctly merge
CategoricalDtype
dtypes (#889) - Fix issue where certain arguments were not defaulting to pandas (#890)
- Send full path to workers on
read_csv
(#899) - Remove
__array_prepare__
from Series API (#900) - Add Series.str.title to API (#901)
- Fix
df.squeeze
whenaxis=0
on a 1x1 dataframe (#902) - Fix
skiprows
logic forread_csv
(#918) - read_sql() will default to pandas when chuksize is given (#920)
- Fix DeprecationWarning: invalid escape sequence \d (#950)
- Fix
apply
whenargs
is set (#953) - Fix inplace updates on partitions (#962)
- Fix bug where certain encodings were throwing an error (#980)
- Fix inplace operations without inplace keyword on emtpy dataframes (#983)
- Support console with repr like pandas (#984)
- Fix
count
whennumeric_only=False
(#1002) - Fix bug in
loc
where slice on columns only threw Exception (#1024)
New Functionality ✨
- support for duplicated() and drop_duplicates() (#892)
- Create SeriesGroupBy wrapper to default to pandas and return to Modin (#908)
- Bring I/O support to Dask for everything supported (#955) ⭐️
- Add support for grouping by multiple columns when doing a reduction (#987) ⭐️
- Implement
DataFrame.at_time
andSeries.at_time
(#991) - Add implementation for
between_time
forSeries
and `DataFram… (#992) - Implement
combine
for Series and DataFrame (#995) - Add implementation for
combine_first
forDataFrame
and `Seri… (#996) - Add implementation for
droplevel
forSeries
andDataFrame
(#1000) - Implement
assign
forDataFrame
(#998) - Add implementation for
first
forSeries
andDataFrame
(#1006) - Add implementation for
last
forDataFrame
andSeries
(#1007) - Add implementation for
swapaxes
for DataFrame and Series (#1010) - Add implementation for
tz_convert
forSeries
andDataFrame
(#1013) - Implement
tz_localize
forSeries
andDataFrame
(#1014) - Add implementation for
tshift
(#1016) - Add implementation:
swaplevel
forSeries
andDataFrame
(#1018) - Add implementation:
reorder_levels
for DataFrame and Series (#1022) - Add implementation:
take
for Series and DataFrame (#1020) - Add implentation:
truncate
for Series and DataFrame (#1026) - Fix bug where Parsing error was thrown when text spanned multip… (#1027)
Code Quality + Testing 💯
- Update pytest and clean up tests a bit (#903)
- CI updates (#924, #925, #926, #927, #928, #929, #930, #931, #933, #936, #937, #938)
- Fix Windows Remove file test error (#943)
- Add test script for simple execution of all unit tests (#948)
- Fix pyarrow.parquet import in tests (#952)
- Support parameters with run-tests.sh (#959)
- Fix environment variables for CI and Master test suite (#964)
- Make test_dataframe.py more granular with test suite (#966)
- Cache pip depdendencies between builds to speed up process (#967)
- Fix master CI workflow order and simplify workflow names (#968)
- Update master build to allow coverage to be run (#969)
- Optimize CI with minimal pip installs (#971)
- use versioneer for versioning using VCS (#1028)
Backend enhancements + Performance 🚀
- Improve performance of setting a column from an existing one (#942)
- Increase
n_workers
for Dask from default to number of cores (#965)
Documentation 📃
- Update README to have a more accurate API coverage section (#974)
- Move API Coverage section in README to a more appropriate place (#975)
- Update README to add advanced usage (#988)
Dependencies 🔗
- Enforce only Python3+ on future releases (#907)
- Change Coverage version to avoid sqlite3 errors (#916)
- Remove top level import of
py
(#935) - Update Ray version to latest (#941)
- Restructure import attempt to only try Ray if on a non-windows machine (#945)
- Set
pure=False
for DaskClient.submit
andhash=False
for `… (#957)
Contributors this release
The following users contributed code to Modin since the last release.
@ecoughlan (First time contributor) ⭐️
@aeroaks (First time contributor) ⭐️
@eavidan (Returning contributor) 🌟
@devin-petersohn (Maintainer)
🎉🎉 Thank you! 🎉🎉