github modin-project/modin 0.25.0
Modin 0.25.0

latest releases: 0.32.0, 0.31.0, 0.27.1...
12 months ago

This release introduces modin.utils.execute function to improve benchmarking experience, includes new version of HDK 0.9.
It also includes performance optimizations for sort_values, value_counts, 2D setitem and several others, as well as many bug fixes.

Key Features and Updates Since 0.24.0

  • Stability and Bugfixes
    • FIX-#4507: Do not call ray.get() inside of the kernel executing call queues (#6633)
    • FIX-#6585: Avoid FutureWarnings in rolling unless necessary (#6586)
    • FIX-#6600: Fix usage of list of UDF functions in Series.groupby.agg (#6613)
    • FIX-#6602: Refactor join to avoid distributing a dict object warning (#6612)
    • FIX-#6604: HDK: Added support for list to DataFrame.agg() (#6606)
    • FIX-#6607: Fix incorrect cache after .sort_values() (#6608)
    • FIX-#6624: Add FutureWarnings for first/last/bool (#6625)
    • FIX-#6628: Allow groupby.diff() for dates (#6631)
    • FIX-#6632: Return Series instead of Dataframe for groupby.apply in case of experimental groupby (#6649)
    • FIX-#6635: HDK: read_csv(): treat object dtype as string (#6636)
    • FIX-#6637: Fix skiprows parameter usage for read_excel (#6638)
    • FIX-#6642: Fix modin.numpy.array.sum on HDK (#6643)
    • FIX-#6647: Added init file to make modin/experimental/sql/hdk/query.py part of modin package (#6646)
    • FIX-#6651: Make sure Series.between works correctly (#6656)
    • FIX-#6680: Specify navigation_with_keys=True to fix docs build (#6681)
  • Performance enhancements
    • PERF-#2813: Distributed from_pandas() for numerical data in Ray (#6640)
    • PERF-#5533: Improved sort_values by reducing the number of partitions (#6589)
    • PERF-#6362: Implement 2D setitem without to-pandas conversion (#6618)
    • PERF-#6614: HDK: Use MODIN_CPUS instead of os.cpu_count() for the fragment size calculation (#6615)
    • PERF-#6629: HDK: Avoid LazyProxyCategoricalDtype materialization on merge (#6630)
    • PERF-#6645: Avoid label synchronization for dot operation (#6644)
    • PERF-#6653: value_counts(): Eliminate redundant sorting. (#6654)
    • PERF-#6661: Do not convert columns dtypes if the new dtypes are the same (#6662)
  • Refactor Codebase
    • REFACTOR-#6622: Don't use deprecated random_integers func (#6623)
  • Update testing suite
    • TEST-#5489: Allow for pytest to print warnings in tests output (#6621)
  • Documentation improvements
    • DOCS-#4085: Replace vague links to actual names of the pages/sections in docs (#4096)
    • DOCS-#6658: Add a note how to enable object spilling in a multi-node Ray cluster (#6659)
  • New Features
    • FEAT-#5221: Add execute to trigger lazy computations and wait for them to complete (#6648)
    • FEAT-#5634: Introduce materialize parameter for partition.ip func (#6650)
    • FEAT-#6675: Bump pyhdk version to 0.9 (#6676)

Contributors

@AndreyPavlenko
@Egor-Krivov
@Garra1980
@YarShev
@anmyachev
@dchigarev

Don't miss a new modin release

NewReleases is sending notifications on new releases.