Highlight
- Ray is moving towards 1.0! It has had several important naming changes.
ObjectID
s are now calledObjectRef
s because they are not just IDs.- The Ray Autoscaler is now called the Ray Cluster Launcher. The autoscaler will be a module of the Ray Cluster Launcher.
- The Ray Cluster Launcher now has a much cleaner and concise output style. Try it out with
ray up --log-new-style
. The new output style will be enabled by default (with opt-out) in a later release. - Windows is now officially supported by RLlib. Multi node support for Windows is still in progress.
Cluster Launcher/CLI (formerly autoscaler)
- Highlight: This release contains a new colorful, concise output style for
ray up
andray down
, available with the--log-new-style
flag. It will be enabled by default (with opt-out) in a later release. Full output style coverage for Cluster Launcher commands will also be available in a later release. (#9322, #9943, #9960, #9690) - Documentation improvements (with guides and new sections) (#9687
- Improved Cluster launcher docker support (#9001, #9105, #8840)
- Ray now has Docker images available on Docker hub. Please check out the ray image (#9732, #9556, #9458, #9281)
- Azure improvements (#8938)
- Improved on-prem cluster autoscaler (#9663)
- Add option for continuous sync of file mounts (#9544)
- Add
ray status
debug tool andray --version
(#9091, #8886). ray memory
now also supports redis_password (#9492)- Bug fixes for the Kubernetes cluster launcher mode (#9968)
- Various improvements: disabling the cluster config cache (#8117), Python API requires keyword arguments (#9256), removed fingerprint checking for SSH (#9133), Initial support for multiple worker types (#9096), various changes to the internal node provider interface (#9340, #9443)
Core
- Support Python type checking for Ray tasks (#9574)
- Rename ObjectID => ObjectRef (#9353)
- New GCS Actor manager on by default (#8845, #9883, #9715, #9473, #9275)
- Worker towards placement groups (#9039)
- Plasma store process is merged with raylet (#8939, #8897)
- Option to automatically reconstruct objects stored in plasma after a failure. See the documentation for more information. (#9394, #9557, #9488)
- Many bug fixes.
RLlib
- New algorithm: “Model-Agnostic Meta-Learning” (MAML). An algo that learns and generalizes well across a distribution of environments.
- New algorithm: “Model-Based Meta-Policy-Optimization” (MB-MPO). Our first model-based RL algo.
- Windows is now officially supported by RLlib.
- Native TensorFlow 2.x support. Use framework=”tf2” in your config to tap into TF2’s full potential. Also: SAC, DDPG, DQN Rainbow, ES, and ARS now run in TF1.x Eager mode.
- DQN PyTorch support for full Rainbow setup (including distributional DQN).
- Python type hints for Policy, Model, Offline, Evaluation, and Env classes.
- Deprecated “Policy Optimizer” package (in favor of new distributed execution API).
- Enhanced test coverage and stability.
- Flexible multi-agent replay modes and
replay_sequence_length
. We now allow a) storing sequences (over time) in replay buffers and retrieving “lock-stepped” multi-agent samples. - Environments: Unity3D soccer game (tuned example/benchmark) and DM Control Suite wrapper and examples.
- Various Bug fixes: QMIX not learning, DDPG torch bugs, IMPALA learning rate updates, PyTorch custom loss, PPO not learning MuJoCo due to action clipping bug, DQN w/o dueling layer error.
Tune
- API Changes:
- You can now stop experiments upon convergence with Bayesian Optimization (#8808)
DistributedTrainableCreator
, a simple wrapper for distributed parameter tuning with multi-node DistributedDataParallel models (#9550, #9739)- New integration and tutorial for using Ray Tune with Weights and Biases (Logger and native API) (#9725)
- Tune now provides a Scikit-learn compatible wrapper for hyperparameter tuning (#9129)
- New tutorials for integrations like XGBoost (#9060), multi GPU PyTorch (#9338), PyTorch Lightning (#9151, #9451), and Huggingface-Transformers (#9789)
- CLI Progress reporting improvements (#8802, #9537, #9525)
- Various bug fixes: handling of NaN values (#9381), Tensorboard logging improvements (#9297, #9691, #8918), enhanced cross-platform compatibility (#9141), re-structured testing (#9609), documentation reorganization and versioning (#9600, #9427, #9448)
RaySGD
Serve
- Horizontal scalability: Serve will now start one HTTP server per Ray node. (#9523)
- Various performance improvement matching Serve to FastAPI (#9490,#8709, #9531, #9479 ,#9225, #9216, #9485)
- API changes
serve.shadow_traffic(endpoint, backend, fraction)
duplicates and sends a fraction of the incoming traffic to a specific backend. (#9106)serve.shutdown()
cleanup the current Serve instance in Ray cluster. (#8766)- Exception will be raised if
num_replicas
exceeds the maximum resource in the cluster (#9005)
- Added doc examples for how to perform metric monitoring and model composition.
Dashboard
- Configurable Dashboard Port: The port on which the dashboard will run is now configurable using the argument
--dashboard-port
and the argumentdashboard_port
toray.init
- GPU monitoring improvements
- For machines with more than one GPU, the GPU and GRAM utilization is now broken out on a per-GPU basis.
- Assignments to physical GPUs are now shown at the worker level.
- Sortable Machine View: It is now possible to sort the machine view by almost any of its columns by clicking next to the title. In addition, whereas the workers are normally grouped by node, you can now ungroup them if you only want to see details about workers.
- Actor Search Bar: It is possible to search for actors by their title now (this is the class name of the actor in python in addition to the arguments it received.)
- Logical View UI Updates: This includes things like color-coded names for each of the actor states, a more grid-like layout, and tooltips for the various data.
- Sortable Memory View: Like the machine view, the memory view now has sortable columns and can be grouped / ungrouped by node.
Windows Support
Others
- Ray Streaming Library Improvements (#9240, #8910, #8780)
- Java Support Improvements (#9371, #9033, #9037, #9032, #8858, #9777, #9836, #9377)
- Parallel Iterator Improvements (#8964, #8978)
Thanks
We thank the following contributors for their work on this release:
@jsuarez5341, @amitsadaphule, @krfricke, @williamFalcon, @richardliaw, @heyitsmui, @mehrdadn, @robertnishihara, @gabrieleoliaro, @amogkam, @fyrestone, @mimoralea, @edoakes, @andrijazz, @ElektroChan89, @kisuke95, @justinkterry, @SongGuyang, @barakmich, @bloodymeli, @simon-mo, @TomVeniat, @lixin-wei, @alanwguo, @zhuohan123, @michaelzhiluo, @ijrsvt, @pcmoritz, @LecJackS, @sven1977, @ashione, @JerryLeeCS, @raphaelavalos, @stephanie-wang, @ruifangChen, @vnlitvinov, @yncxcw, @weepingwillowben, @goulou, @acmore, @wuisawesome, @gramhagen, @anabranch, @internetcoffeephone, @Alisahhh, @henktillman, @deanwampler, @p-christ, @Nicolaus93, @WangTaoTheTonic, @allenyin55, @kfstorm, @rkooo567, @ConeyLiu, @09wakharet, @piojanu, @mfitton, @KristianHolsheimer, @AmeerHajAli, @pdames, @ericl, @VishDev12, @suquark, @stefanbschneider, @raulchen, @dcfidalgo, @chappers, @aaarne, @chaokunyang, @sumanthratna, @clarkzinzow, @BalaBalaYi, @maximsmol, @zhongchun, @wumuzi520, @ffbin