Highlight
- Experimental support for Windows is now available for single node Ray usage. Check out the Windows section below for known issues and other details.
- Have you had troubles monitoring GPU or memory usage while you used Ray? The Ray dashboard now supports the GPU monitoring and a memory view.
- Want to use RLlib with Unity? RLlib officially supports the Unity3D adapter! Please check out the documentation.
- Ray Serve is ready for feedback! We've gotten feedback from many users, and Ray Serve is already being used in production. Please reach out to us with your use cases, ideas, documentation improvements, and feedback. We'd love to hear from you. Please do so on the Ray Slack and join #serve! Please see the Serve section below for more details.
Core
- We’ve introduced a new feature to automatically retry failed actor tasks after an actor has been restarted by Ray (by specifying
max_restarts
in@ray.remote
). Try it out withmax_task_retries=-1
where -1 indicates that the system can retry the task until it succeeds.
API Change
- To enable automatic restarts of a failed actor, you must now use
max_restarts
in the@ray.remote
decorator instead ofmax_reconstructions
. You can use -1 to indicate infinity, i.e., the system should always restart the actor if it fails unexpectedly. - We’ve merged the named and detached actor APIs. To create an actor that will survive past the duration of its job (a “detached” actor), specify
name=<str>
in its remote constructor (Actor.options(name='<str>').remote()
). To delete the actor, you can useray.kill
.
RLlib
- PyTorch: IMPALA PyTorch version and all
rllib/examples
scripts now work for either TensorFlow or PyTorch (--torch
command line option). - Switched to using distributed execution API by default (replaces Policy Optimizers) for all algorithms.
- Unity3D adapter (supports all Env types: multi-agent, external env, vectorized) with example scripts for running locally or in the cloud.
- Added support for variable length observation Spaces ("Repeated").
- Added support for arbitrarily nested action spaces.
- Added experimental GTrXL (Transformer/Attention net) support to RLlib + learning tests for PPO and IMPALA.
- QMIX now supports complex observation spaces.
API Change
- Retire
use_pytorch
andeager
flags in configs and replace these withframework=[tf|tfe|torch]
. - Deprecate PolicyOptimizers in favor of the new distributed execution API.
- Retired support for Model(V1) class. Custom Models should now only use the ModelV2 API. There is still a warning when using ModelV1, which will be changed into an error message in the next release.
- Retired TupleActions (in favor of arbitrarily nested action Spaces).
Ray Tune / RaySGD
- There is now a Dataset API for handling large datasets with RaySGD. (#7839)
- You can now filter by an average of the last results using the
ExperimentAnalysis
tool (#8445). - BayesOptSearch received numerous contributions, enabling preliminary random search and warm starting. (#8541, #8486, #8488)
API Changes
tune.report
is now the right way to use the Tune function API.tune.track
is deprecated (#8388)
Serve
- New APIs to inspect and manage Serve objects:
serve.create_endpoint
now requires specifying the backend directly. You can removeserve.set_traffic
if there's only one backend per endpoint. (#8764)serve.init
API cleanup, the following options were removed:serve.init
now supports namespacing withname
. You can run multiple serve clusters with different names on the same ray cluster. (#8449)- You can specify session affinity when splitting traffic with backends using
X-SERVE-SHARD-KEY
HTTP header. (#8449) - Various documentation improvements. Highlights:
Dashboard / Metrics
- The Machine View of the dashboard now shows information about GPU utilization such as:
- Average GPU/GRAM utilization at a node and cluster level
- Worker-level information about how many GPUs each worker is assigned as well as its GRAM use.
- The dashboard has a new Memory View tab that should be very useful for debugging memory issues. It has:
- Information about objects in the Ray object store, including size and call-site
- Information about reference counts and what is keeping an object pinned in the Ray object store.
Small changes
- IDLE workers get automatically sorted to the end of the worker list in the Machine View
Autoscaler
- Improved logging output. Errors are more clearly propagated and excess output has been reduced. (#7198, #8751, #8753)
- Added support for k8s services.
API Changes
ray up
accepts remote URLs that point to the desired cluster YAML. (#8279)
Windows support
- Windows wheels are now available for basic experimental usage (via
ray.init()
). - Windows support is currently unstable. Unusual, unattended, or production usage is not recommended.
- Various functionality may still lack support, including Ray Serve, Ray SGD, the autoscaler, the dashboard, non-ASCII file paths, etc.
- Please check the latest nightly wheels & known issues (#9114), and let us know if any issue you encounter has not yet been addressed.
- Wheels are available for Python 3.6, 3.7, and 3.8. (#8369)
- redis-py has been patched for Windows sockets. (#8386)
Others
- Moving towards highly available Ray (#8650, #8639, #8606, #8601, #8591, #8442)
- Java Support (#8730, #8640, #8637)
- Ray streaming improvements (#8612, #8594, #7464)
- Parallel iterator improvements (#8140, #7931, #8712)
Thanks
We thank the following contributors for their work on this release:
@pcmoritz, @akharitonov, @devanderhoff, @ffbin, @anabranch, @jasonjmcghee, @kfstorm, @mfitton, @alecbrick, @simon-mo, @konichuvak, @aniryou, @wuisawesome, @robertnishihara, @ramanNarasimhan77, @09wakharet, @richardliaw, @istoica, @ThomasLecat, @sven1977, @ceteri, @acxz, @iamhatesz, @JarnoRFB, @rkooo567, @mehrdadn, @thomasdesr, @janblumenkamp, @ujvl, @edoakes, @maximsmol, @krfricke, @amogkam, @gehring, @ijrsvt, @internetcoffeephone, @LucaCappelletti94, @chaokunyang, @WangTaoTheTonic, @fyrestone, @raulchen, @ConeyLiu, @stephanie-wang, @suquark, @ashione, @Coac, @JosephTLucas, @ericl, @AmeerHajAli, @pdames