github modin-project/modin 0.20.0
Modin 0.20.0

latest releases: 0.32.0, 0.31.0, 0.27.1...
19 months ago

Modin 0.20.0

This release adds parallel implementations for some functions on Dask that were previously implemented for other engines.
It also includes support for pyhdk 0.5, many bug fixes and some performance enhancements.

Key Features and Updates Since 0.19.0

  • Stability and Bugfixes
    • FIX-#2850: use modin.pandas.Series instead of pandas.Series for where func (#5883)
    • FIX-#3925: Fixed AssertionError on columns and index drop (#5156)
    • FIX-#4227: Calling FactoryDispatcher.get_factory also initializes the engine (#4228)
    • FIX-#4635: allow pass modin functions to apply (#5915)
    • FIX-#4924: fix read_excel when header is None (#5919)
    • FIX-#5309: series iloc/loc raises IndexingError if a key is too long (#5784)
    • FIX-#5373: Fix Series.shift() for named Series (#5823)
    • FIX-#5432: don't return None when astype used with copy=False parameter (#5918)
    • FIX-#5454: add missed methods for SeriesGroupBy, DataFrameGroupBy objects (#5866)
    • FIX-#5509: default to pandas for read_parquet if any additional kwargs are passed to the engine (#5911)
    • FIX-#5566: Enable test_indexing test on the HDK engine and add to ci (#5567)
    • FIX-#5576: Enable test_join_sort test on the HDK engine and add to CI (#5578)
    • FIX-#5580: HDK-BUG: 'AVG|SUM' is only valid on integer and floating point (#5583)
    • FIX-#5618: don't ignore 'errors' parameter for astype (#5895)
    • FIX-#5653: implement convert_dtypes as a full-axis operation instead of using map approach (#5885)
    • FIX-#5737: BUG: String columns are converted to Categorical, if exported from HDK (#5738)
    • FIX-#5767: cast pathlib.Path to str for read_parquet (#5860)
    • FIX-#5770: Enable test_series test on the HDK engine and add to ci (#5771)
    • FIX-#5774: Correctly calculate shape of single row (#5775)
    • FIX-#5776: fix IndexError when concatenating dict of series along columns (#5804)
    • FIX-#5781: Fix sort in descending order for columns with highly dense values (#5783)
    • FIX-#5787: Enable test_reduce test on the HDK engine and add to ci (#5788)
    • FIX-#5794: Enable test_default test on the HDK engine and add to ci (#5795)
    • FIX-#5806: Enable test_io test on the HDK engine and add to ci (#5807)
    • FIX-#5810: Enable test_binary test on the HDK engine (#5811)
    • FIX-#5819: Fix np.argmax/argmin on 1D arrays (#5820)
    • FIX-#5829: fix ndarray assignment via loc (#5847)
    • FIX-#5846: add Series.str.removeprefix/removesuffix/fullmatch methods (#5845)
    • FIX-#5849: add Series.dt.day_of_week/day_of_year/isocalendar/asfreq methods (#5848)
    • FIX-#5859: Fix '.sort_values()' when there's only one row partition (#5869)
    • FIX-#5862: fix Inline strong start-string without end-string for read_custom_text (#5861)
    • FIX-#5870: Enable test_general test on the HDK engine and add to ci (#5871)
    • FIX-#5888: Fix to_parquet in s3. (#5912)
    • FIX-#5891: BUG: HDK: Query execution fails because the query contains not supported self-join pattern (#5892)
    • FIX-#5927: Enable test_map_metadata test on the HDK engine and add to ci (#5929)
    • FIX-#5934: Enable test_window test on the HDK engine and add to ci (#5935)
    • FIX-#5941: TEST: The test test_io.py fails on HDK (#5942)
    • FIX-#5976: correct use of dtypes cache for concat op (#5975)
    • FIX-#5977: use wrapper.materialize instead of wait_partitions; use AWS env vars in pytest_sessionstart function (#5981)
  • Performance enhancements
    • PERF-#5590: Precompute columns and dtypes metadata for '.merge()' (#5594)
    • PERF-#5670: create self._identity in partitions only for "debug" logging level (#5679)
    • PERF-#5674: reduce data transferring in _launch_tasks function (#5678)
    • PERF-#5675: make index calculation for read_csv function lazy; introduce ModinIndex (#5677)
    • PERF-#5740: allow read_csv, read_fwf, read_table, read_custom_text functions be executed fully asynchronous; introduce ModinDtypes (#5713)
    • PERF-#5777: Filter out empty bins at range-based reshuffling (#5779)
    • PERF-#5778: Avoid extra materialization at range-based reshuffling (#5780)
    • PERF-#5808: Delay metadata computations for '.sort_values' result (#5828)
    • PERF-#5837: Defer index materialization for MapReduce implemented groupby (#5948)
  • Refactor Codebase
    • REFACTOR-#2863: remove 'other_name' from broadcast_apply (#5882)
    • REFACTOR-#5414: Move partition.get into base class (#5408)
    • REFACTOR-#5417: fix FutureWarning: the mangle_dupe_cols keyword is deprecated (#5407)
    • REFACTOR-#5683: remove Engine.subscribe(_update_engine) in DataFrame/Series constructors (#5855)
    • REFACTOR-#5786: align logging of Dask partitions with other executions (#5785)
    • REFACTOR-#5799: Clean up numpy array operations (#5800)
    • REFACTOR-#5830: rename experimental dispatchers and parsers (#5864)
    • REFACTOR-#5874: move lazy_metadata_decorator into utils.py (#5872)
    • REFACTOR-#5875: use default implementations for dt methods from the base query compiler (#5873)
    • REFACTOR-#5902: use __make_read for non experimental IO classes (#5898)
    • REFACTOR-#5908: remove unused parameters from 'run_exec_plan' (#5907)
    • REFACTOR-#5910: remove '_dtypes_for_cols' internal function as unused (#5909)
    • REFACTOR-#5922: let upload-coverage action fail if there is no .coverage file (#5921)
    • REFACTOR-#5923: add pragma: no cover for functions that used in apply_full_axis (#5920)
  • Update testing suite
    • TEST-#2544: delay codecov notifications until all reports have been sent (#5782)
    • TEST-#4261: test rolling with axis=1, win_type=, and center=True (#5881)
    • TEST-#5477: fix typo: read_stata kwargs -> read_sas kwargs (#5854)
    • TEST-#5790: add ASV configs for Dask and Unidist (#5789)
    • TEST-#5802: update some actions in CI (#5801)
    • TEST-#5826: remove _propagate_index_objs internal function usage from tests (#5813)
    • TEST-#5832: Suppress pytest coverage messages in terminal (#5833)
    • TEST-#5851: test api of cat/sparse accessors (#5850)
    • TEST-#5878: exclude modin/experimental/batch/test/ folder from computing coverage (#5877)
    • TEST-#5897: Add more robust tests for numpy API (#5900)
    • TEST-#5913: Cancel CI for commits to same branch. (#5914)
    • TEST-#5933: Add assert_array_equals utility to numpy tests (#5947)
    • TEST-#5943: Rebalance tests between different CI jobs (#5890)
    • TEST-#5977: Add AWS mock keys to moto in push-to-master.yml (#5978)
  • Documentation improvements
    • DOCS-#0000: fix pip install command for macos (#5749)
    • DOCS-#5659: Supplement quickstart notebook with a note regarding OOM issue (#5821)
    • DOCS-#5852: Add mention of read_custom_text experimental api in docs (#5853)
    • DOCS-#5957: Compress Import.gif as it's too large (#5958)
  • New Features
    • FEAT-#4624: add to_parquet parallel implementation for Dask (#5876)
    • FEAT-#5497: add several experimental functions for Dask (#5496)
    • FEAT-#5880: add to_sql parallel implementation for Dask (#5879)
    • FEAT-#5901: add read_fwf parallel implementation for Dask (#5899)
    • FEAT-#5930: Bump pyhdk version to 0.5 (#5931)

Contributors

@MSHADroo
@AndreyPavlenko
@RehanSD
@YarShev
@anmyachev
@dchigarev
@mvashishtha
@noloerino
@pyrito
@vnlitvinov

Don't miss a new modin release

NewReleases is sending notifications on new releases.