This release introduces modin.utils.execute
function to improve benchmarking experience, includes new version of HDK 0.9.
It also includes performance optimizations for sort_values
, value_counts
, 2D setitem and several others, as well as many bug fixes.
Key Features and Updates Since 0.24.0
- Stability and Bugfixes
- FIX-#4507: Do not call
ray.get()
inside of the kernel executing call queues (#6633) - FIX-#6585: Avoid
FutureWarning
s inrolling
unless necessary (#6586) - FIX-#6600: Fix usage of list of UDF functions in
Series.groupby.agg
(#6613) - FIX-#6602: Refactor
join
to avoiddistributing a dict object
warning (#6612) - FIX-#6604: HDK: Added support for list to
DataFrame.agg()
(#6606) - FIX-#6607: Fix incorrect cache after
.sort_values()
(#6608) - FIX-#6624: Add
FutureWarning
s forfirst/last/bool
(#6625) - FIX-#6628: Allow
groupby.diff()
for dates (#6631) - FIX-#6632: Return Series instead of Dataframe for
groupby.apply
in case of experimental groupby (#6649) - FIX-#6635: HDK:
read_csv()
: treat object dtype as string (#6636) - FIX-#6637: Fix
skiprows
parameter usage forread_excel
(#6638) - FIX-#6642: Fix
modin.numpy.array.sum
on HDK (#6643) - FIX-#6647: Added init file to make
modin/experimental/sql/hdk/query.py
part of modin package (#6646) - FIX-#6651: Make sure
Series.between
works correctly (#6656) - FIX-#6680: Specify
navigation_with_keys=True
to fix docs build (#6681)
- FIX-#4507: Do not call
- Performance enhancements
- PERF-#2813: Distributed
from_pandas()
for numerical data in Ray (#6640) - PERF-#5533: Improved
sort_values
by reducing the number of partitions (#6589) - PERF-#6362: Implement 2D setitem without to-pandas conversion (#6618)
- PERF-#6614: HDK: Use
MODIN_CPUS
instead ofos.cpu_count()
for the fragment size calculation (#6615) - PERF-#6629: HDK: Avoid
LazyProxyCategoricalDtype
materialization onmerge
(#6630) - PERF-#6645: Avoid label synchronization for
dot
operation (#6644) - PERF-#6653:
value_counts()
: Eliminate redundant sorting. (#6654) - PERF-#6661: Do not convert columns dtypes if the new dtypes are the same (#6662)
- PERF-#2813: Distributed
- Refactor Codebase
- Update testing suite
- Documentation improvements
- New Features
Contributors
@AndreyPavlenko
@Egor-Krivov
@Garra1980
@YarShev
@anmyachev
@dchigarev