github modin-project/modin 0.7.0
Modin 0.7.0

latest releases: 0.32.0, 0.31.0, 0.27.1...
4 years ago

Modin 0.7.0 release notes

Modin 0.7 comes with the largest expansion of the API since the first release. Modin now supports over 83% of the pandas API, up from 71% last release. A number of long awaited features have been implemented to include: I/O support for Dask for parquet and other column stores and groupby with a list of column names.

Bugfixes + Pandas Concordance (🐛 + 🐼)

  • Allow merging of named Series (#879)
  • Correctly merge CategoricalDtype dtypes (#889)
  • Fix issue where certain arguments were not defaulting to pandas (#890)
  • Send full path to workers on read_csv (#899)
  • Remove __array_prepare__ from Series API (#900)
  • Add Series.str.title to API (#901)
  • Fix df.squeeze when axis=0 on a 1x1 dataframe (#902)
  • Fix skiprows logic for read_csv (#918)
  • read_sql() will default to pandas when chuksize is given (#920)
  • Fix DeprecationWarning: invalid escape sequence \d (#950)
  • Fix apply when args is set (#953)
  • Fix inplace updates on partitions (#962)
  • Fix bug where certain encodings were throwing an error (#980)
  • Fix inplace operations without inplace keyword on emtpy dataframes (#983)
  • Support console with repr like pandas (#984)
  • Fix count when numeric_only=False (#1002)
  • Fix bug in loc where slice on columns only threw Exception (#1024)

New Functionality ✨

  • support for duplicated() and drop_duplicates() (#892)
  • Create SeriesGroupBy wrapper to default to pandas and return to Modin (#908)
  • Bring I/O support to Dask for everything supported (#955) ⭐️
  • Add support for grouping by multiple columns when doing a reduction (#987) ⭐️
  • Implement DataFrame.at_time and Series.at_time (#991)
  • Add implementation for between_time for Series and `DataFram… (#992)
  • Implement combine for Series and DataFrame (#995)
  • Add implementation for combine_first for DataFrame and `Seri… (#996)
  • Add implementation for droplevel for Series and DataFrame (#1000)
  • Implement assign for DataFrame (#998)
  • Add implementation for first for Series and DataFrame (#1006)
  • Add implementation for last for DataFrame and Series (#1007)
  • Add implementation for swapaxes for DataFrame and Series (#1010)
  • Add implementation for tz_convert for Series and DataFrame (#1013)
  • Implement tz_localize for Series and DataFrame (#1014)
  • Add implementation for tshift (#1016)
  • Add implementation: swaplevel for Series and DataFrame (#1018)
  • Add implementation: reorder_levels for DataFrame and Series (#1022)
  • Add implementation: take for Series and DataFrame (#1020)
  • Add implentation: truncate for Series and DataFrame (#1026)
  • Fix bug where Parsing error was thrown when text spanned multip… (#1027)

Code Quality + Testing 💯

  • Update pytest and clean up tests a bit (#903)
  • CI updates (#924, #925, #926, #927, #928, #929, #930, #931, #933, #936, #937, #938)
  • Fix Windows Remove file test error (#943)
  • Add test script for simple execution of all unit tests (#948)
  • Fix pyarrow.parquet import in tests (#952)
  • Support parameters with run-tests.sh (#959)
  • Fix environment variables for CI and Master test suite (#964)
  • Make test_dataframe.py more granular with test suite (#966)
  • Cache pip depdendencies between builds to speed up process (#967)
  • Fix master CI workflow order and simplify workflow names (#968)
  • Update master build to allow coverage to be run (#969)
  • Optimize CI with minimal pip installs (#971)
  • use versioneer for versioning using VCS (#1028)

Backend enhancements + Performance 🚀

  • Improve performance of setting a column from an existing one (#942)
  • Increase n_workers for Dask from default to number of cores (#965)

Documentation 📃

  • Update README to have a more accurate API coverage section (#974)
  • Move API Coverage section in README to a more appropriate place (#975)
  • Update README to add advanced usage (#988)

Dependencies 🔗

  • Enforce only Python3+ on future releases (#907)
  • Change Coverage version to avoid sqlite3 errors (#916)
  • Remove top level import of py (#935)
  • Update Ray version to latest (#941)
  • Restructure import attempt to only try Ray if on a non-windows machine (#945)
  • Set pure=False for Dask Client.submit and hash=False for `… (#957)

Contributors this release

The following users contributed code to Modin since the last release.

@ecoughlan (First time contributor) ⭐️
@aeroaks (First time contributor) ⭐️
@eavidan (Returning contributor) 🌟
@devin-petersohn (Maintainer)

🎉🎉 Thank you! 🎉🎉

Don't miss a new modin release

NewReleases is sending notifications on new releases.