Highlights
- Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
- Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
- It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
- Distributed reference counting is turned on by default. (#7628, #7337)
- This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with:
ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0}))
. - When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using
ray.init(lru_evict=True)
.
- This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with:
- A new command
ray memory
is added to help debug memory usage: (#7589)- It shows all object IDs that are in scope, their reference types, sizes and creation site.
- Read more in the docs: https://ray.readthedocs.io/en/latest/memory-management.html.
- It shows all object IDs that are in scope, their reference types, sizes and creation site.
> ray memory
-----------------------------------------------------------------------------------------------------
Object ID Reference Type Object Size Reference Creation Site
=====================================================================================================
; worker pid=51230
ffffffffffffffffffffffff0100008801000000 PINNED_IN_MEMORY 8231 (deserialize task arg) __main__..sum_task
; driver pid=51174
45b95b1c8bd3a9c4ffffffff010000c801000000 USED_BY_PENDING_TASK ? (task call) memory_demo.py:<module>:13
ffffffffffffffffffffffff0100008801000000 USED_BY_PENDING_TASK 8231 (put object) memory_demo.py:<module>:6
ef0a6c221819881cffffffff010000c801000000 LOCAL_REFERENCE ? (task call) memory_demo.py:<module>:14
-----------------------------------------------------------------------------------------------------
API change
- Change
actor.__ray_kill__()
toray.kill(actor)
. (#7360) - Deprecate
use_pickle
flag for serialization. (#7474) - Remove
experimental.NoReturn
. (#7475) - Remove
experimental.signal API
. (#7477)
Core
- Add Apache 2 license header to C++ files. (#7520)
- Reduce per worker memory usage to 50MB. (#7573)
- Option to fallback to LRU on OutOfMemory. (#7410)
- Reference counting for actor handles. (#7434)
- Reference counting for returning object IDs created by a different process. (#7221)
- Use
prctl(PR_SET_PDEATHSIG)
on Linux instead of reaper. (#7150) - Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
- Remove static concurrency limit from gRPC server. (#7544)
- Remove
get_global_worker()
,RuntimeContext
. (#7638) - Fix known issues from 0.8.2 release:
RLlib
- New features:
- Bug fix highlights:
Tune
- Integrate Dragonfly optimizer. (#5955)
- Fix HyperBand errors. (#7563)
- Access Trial Name, Trial ID inside trainable. (#7378)
- Add a new
repeater
class for high variance trials. (#7366) - Prevent deletion of checkpoint from user-initiated restoration. (#7501)
Libraries
- [Parallel Iterators] Allow for operator chaining after repartition. (#7268)
- [Parallel Iterators] Repartition functionality. (#7163)
- [Serve]
@serve.route
returns a handle, addhandle.scale
,handle.set_max_batch_size
. (#7569) - [RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
- [RaySGD] Custom training API. (#7211)
- [RaySGD] Breaking User API changes: (#7384)
data_creator
fed to TorchTrainer now must return a dataloader rather than datasets.- TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
data_loader_config
andbatch_size
are no longer parameters for TorchTrainer.- TorchTrainer parallelism is now set by
num_workers
. - All TorchTrainer args now must be named parameters.
Java
- New Java actor API (#7414)
@RayRemote
annotation is removed.- Instead of
Ray.call(ActorClass::method, actor)
, the new API isactor.call(ActorClass::method)
.
- Allow passing internal config from raylet to Java worker. (#7532)
- Enable direct call by default. (#7408)
- Pass large object by reference. (#7595)
Others
- Progress towards Ray Streaming, including a Python API. (#7070, #6755, #7152, #7582)
- Progress towards GCS Service for GCS fault tolerance. (#7292, #7592, #7601, #7166)
- Progress towards cross language call between Java and Python. (#7614, #7634)
- Progress towards Windows compatibility. (#7529, #7509, #7658, #7315)
- Improvement in K8s Operator. (#7521, #7621, #7498, #7459, #7622)
- New documentation for Ray Dashboard. (#7304)
Known issues
- Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.
Thanks
We thank the following contributors for their work on this release:
@rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @clarkzinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @tisonkun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz