ray-project/ray ray-0.8.3 on GitHub

Highlights

Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
- Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
- It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
Distributed reference counting is turned on by default. (#7628, #7337)
- This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).
- When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using ray.init(lru_evict=True).
A new command ray memory is added to help debug memory usage: (#7589)
- It shows all object IDs that are in scope, their reference types, sizes and creation site.
  - Read more in the docs: https://ray.readthedocs.io/en/latest/memory-management.html.

> ray memory
-----------------------------------------------------------------------------------------------------
 Object ID                                Reference Type       Object Size   Reference Creation Site
=====================================================================================================
; worker pid=51230
ffffffffffffffffffffffff0100008801000000  PINNED_IN_MEMORY            8231   (deserialize task arg) __main__..sum_task
; driver pid=51174
45b95b1c8bd3a9c4ffffffff010000c801000000  USED_BY_PENDING_TASK           ?   (task call) memory_demo.py:<module>:13
ffffffffffffffffffffffff0100008801000000  USED_BY_PENDING_TASK        8231   (put object) memory_demo.py:<module>:6
ef0a6c221819881cffffffff010000c801000000  LOCAL_REFERENCE                ?   (task call) memory_demo.py:<module>:14
-----------------------------------------------------------------------------------------------------

API change

Change actor.__ray_kill__() to ray.kill(actor). (#7360)
Deprecate use_pickle flag for serialization. (#7474)
Remove experimental.NoReturn. (#7475)
Remove experimental.signal API. (#7477)

Core

Add Apache 2 license header to C++ files. (#7520)
Reduce per worker memory usage to 50MB. (#7573)
Option to fallback to LRU on OutOfMemory. (#7410)
Reference counting for actor handles. (#7434)
Reference counting for returning object IDs created by a different process. (#7221)
Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)
Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
Remove static concurrency limit from gRPC server. (#7544)
Remove get_global_worker(), RuntimeContext. (#7638)
Fix known issues from 0.8.2 release:
- Fix passing duplicate by-reference arguments. (#7306)
- Fix Raise gRPC message size limit to 100MB. (#7269)

RLlib

New features:
- Exploration API improvements. (#7373, #7314, #7380)
- SAC: add discrete action support. (#7320, #7272)
- Add high-performance external application connector. (#7641)
Bug fix highlights:
- PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
- Rename sample_batch_size => rollout_fragment_length. (#7503)
- Fix bugs and speed up SegmentTree.

Tune

Integrate Dragonfly optimizer. (#5955)
Fix HyperBand errors. (#7563)
Access Trial Name, Trial ID inside trainable. (#7378)
Add a new repeater class for high variance trials. (#7366)
Prevent deletion of checkpoint from user-initiated restoration. (#7501)

Libraries

[Parallel Iterators] Allow for operator chaining after repartition. (#7268)
[Parallel Iterators] Repartition functionality. (#7163)
[Serve] @serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)
[RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
[RaySGD] Custom training API. (#7211)
[RaySGD] Breaking User API changes: (#7384)
- data_creator fed to TorchTrainer now must return a dataloader rather than datasets.
- TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
- data_loader_config and batch_size are no longer parameters for TorchTrainer.
- TorchTrainer parallelism is now set by num_workers.
- All TorchTrainer args now must be named parameters.

Java

New Java actor API (#7414)
- @RayRemote annotation is removed.
- Instead of Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).
Allow passing internal config from raylet to Java worker. (#7532)
Enable direct call by default. (#7408)
Pass large object by reference. (#7595)

Others

Progress towards Ray Streaming, including a Python API. (#7070, #6755, #7152, #7582)
Progress towards GCS Service for GCS fault tolerance. (#7292, #7592, #7601, #7166)
Progress towards cross language call between Java and Python. (#7614, #7634)
Progress towards Windows compatibility. (#7529, #7509, #7658, #7315)
Improvement in K8s Operator. (#7521, #7621, #7498, #7459, #7622)
New documentation for Ray Dashboard. (#7304)

Known issues

Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.

Thanks

We thank the following contributors for their work on this release:
@rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @clarkzinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @tisonkun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz

ray-project/ray ray-0.8.3 Ray 0.8.3 on GitHub

Highlights

API change

Core

RLlib

Tune

Libraries

Java

Others

Known issues

Thanks

ray-project/ray ray-0.8.3
Ray 0.8.3

on GitHub