github ray-project/ray ray-0.8.3
Ray 0.8.3

latest releases: ray-2.20.0, ray-2.12.0, ray-2.11.0...
4 years ago

Highlights

  • Autoscaler has added Azure Support. (#7080, #7515, #7558, #7494)
    • Ray autoscaler helps you launch a distributed ray cluster using a single command line call!
    • It works on Azure, AWS, GCP, Kubernetes, Yarn, Slurm and local nodes.
  • Distributed reference counting is turned on by default. (#7628, #7337)
    • This means all ray objects are tracked and garbage collected only when all references go out of scope. It can be turned off with: ray.init(_internal_config=json.dumps({"distributed_ref_counting_enabled": 0})).
    • When the object store is full with objects that are still in scope, you can turn on least-recently-used eviction to force remove objects using ray.init(lru_evict=True).
  • A new command ray memory is added to help debug memory usage: (#7589)
> ray memory
-----------------------------------------------------------------------------------------------------
 Object ID                                Reference Type       Object Size   Reference Creation Site
=====================================================================================================
; worker pid=51230
ffffffffffffffffffffffff0100008801000000  PINNED_IN_MEMORY            8231   (deserialize task arg) __main__..sum_task
; driver pid=51174
45b95b1c8bd3a9c4ffffffff010000c801000000  USED_BY_PENDING_TASK           ?   (task call) memory_demo.py:<module>:13
ffffffffffffffffffffffff0100008801000000  USED_BY_PENDING_TASK        8231   (put object) memory_demo.py:<module>:6
ef0a6c221819881cffffffff010000c801000000  LOCAL_REFERENCE                ?   (task call) memory_demo.py:<module>:14
-----------------------------------------------------------------------------------------------------

API change

  • Change actor.__ray_kill__() to ray.kill(actor). (#7360)
  • Deprecate use_pickle flag for serialization. (#7474)
  • Remove experimental.NoReturn. (#7475)
  • Remove experimental.signal API. (#7477)

Core

  • Add Apache 2 license header to C++ files. (#7520)
  • Reduce per worker memory usage to 50MB. (#7573)
  • Option to fallback to LRU on OutOfMemory. (#7410)
  • Reference counting for actor handles. (#7434)
  • Reference counting for returning object IDs created by a different process. (#7221)
  • Use prctl(PR_SET_PDEATHSIG) on Linux instead of reaper. (#7150)
  • Route asyncio plasma through raylet instead of direct plasma connection. (#7234)
  • Remove static concurrency limit from gRPC server. (#7544)
  • Remove get_global_worker(), RuntimeContext. (#7638)
  • Fix known issues from 0.8.2 release:
    • Fix passing duplicate by-reference arguments. (#7306)
    • Fix Raise gRPC message size limit to 100MB. (#7269)

RLlib

  • New features:
    • Exploration API improvements. (#7373, #7314, #7380)
    • SAC: add discrete action support. (#7320, #7272)
    • Add high-performance external application connector. (#7641)
  • Bug fix highlights:
    • PPO torch memory leak and unnecessary torch.Tensor creation and gc'ing. (#7238)
    • Rename sample_batch_size => rollout_fragment_length. (#7503)
    • Fix bugs and speed up SegmentTree.

Tune

  • Integrate Dragonfly optimizer. (#5955)
  • Fix HyperBand errors. (#7563)
  • Access Trial Name, Trial ID inside trainable. (#7378)
  • Add a new repeater class for high variance trials. (#7366)
  • Prevent deletion of checkpoint from user-initiated restoration. (#7501)

Libraries

  • [Parallel Iterators] Allow for operator chaining after repartition. (#7268)
  • [Parallel Iterators] Repartition functionality. (#7163)
  • [Serve] @serve.route returns a handle, add handle.scale, handle.set_max_batch_size. (#7569)
  • [RaySGD] PyTorchTrainer --> TorchTrainer. (#7425)
  • [RaySGD] Custom training API. (#7211)
  • [RaySGD] Breaking User API changes: (#7384)
    • data_creator fed to TorchTrainer now must return a dataloader rather than datasets.
    • TorchTrainer automatically sets "DistributedSampler" if a DataLoader is returned.
    • data_loader_config and batch_size are no longer parameters for TorchTrainer.
    • TorchTrainer parallelism is now set by num_workers.
    • All TorchTrainer args now must be named parameters.

Java

  • New Java actor API (#7414)
    • @RayRemote annotation is removed.
    • Instead of Ray.call(ActorClass::method, actor), the new API is actor.call(ActorClass::method).
  • Allow passing internal config from raylet to Java worker. (#7532)
  • Enable direct call by default. (#7408)
  • Pass large object by reference. (#7595)

Others

Known issues

  • Ray currently doesn't work on Python 3.5.0, but works on 3.5.3 and above.

Thanks

We thank the following contributors for their work on this release:
@rkooo567, @maximsmol, @suquark, @mitchellstern, @micafan, @clarkzinzow, @Jimpachnet, @mwbrulhardt, @ujvl, @chaokunyang, @robertnishihara, @jovany-wang, @hyeonjames, @zhijunfu, @datayjz, @fyrestone, @eisber, @stephanie-wang, @allenyin55, @BalaBalaYi, @simon-mo, @thedrow, @ffbin, @amogkam, @tisonkun, @richardliaw, @ijrsvt, @wumuzi520, @mehrdadn, @raulchen, @landcold7, @ericl, @edoakes, @sven1977, @ashione, @jorenretel, @gramhagen, @kfstorm, @anthonyhsyu, @pcmoritz

Don't miss a new ray release

NewReleases is sending notifications on new releases.