github ray-project/ray ray-2.46.0
Ray-2.46.0

latest releases: ray-2.47.1, ray-2.47.0
one month ago

Release Highlights

The 2.46 Ray release comes with a couple core highlights:

  1. Ray Data now supports hash shuffling for repartition and aggregations, along with support for joins. This enables many new data processing workloads to be run on Ray Data. Please give it a try and let us know if you have any feedback!
  2. Ray Serve LLM now supports vLLM v1 to be forward-compatible with upcoming vLLM releases. This also opens up significant performance improvements that come with vLLM's v1 refactor.
  3. There is a new Train Grafana dashboard which provides in-depth metrics on Grafana for better metrics on training workloads.

Ray Libraries

Ray Data

🎉 New Features:

  • Adding support for hash-shuffle based repartitioning and aggregations (#52664)
  • Added support for Joins (using hash-shuffle) (#52728)
  • [LLM] vLLM support upgrades to 0.8.5 (#52344)

💫 Enhancements:

  • Add memory attribute to ExecutionResources (#51127)
  • Support ray_remote_args for read_tfrecords #52450
  • [data.dashboard] Skip reporting internal metrics (#52666)
  • Add PhysicalOperator.min_max_resource_usage_bounds (#52502)
  • Speed up printing the schema (#52612)
  • [data.dashboard] Dataset logger for worker (#52706)
  • Support new pyiceberg version (#51744)
  • Support num_cpus, memory, concurrency, batch_size for preprocess (#52574)

🔨 Fixes:

  • Handle Arrow Array null types in to_numpy (#52572)
  • Fix S3 serialization wrapper compatibility with RetryingPyFileSystem (#52568)
  • Fixing Optimizer to apply rules until plan stabilize; (#52663)
  • Fixing FuseOperators rule to properly handle the case of transformations drastically changing size of the dataset (#52570)

📖 Documentation:

  • [LLM] Improve concurrency settings, improve prompt to achieve better throughput (#52634)

Ray Train

🎉 New Features:

  • Add initial Train Grafana dashboard (#52709)

💫 Enhancements:

  • Lazily import torch FSDP for ray.train.torch module to improve performance and reduce unnecessary dependencies (#52707)
  • Deserialize the user-defined training function directly on workers, improving efficiency (#52684)

🔨 Fixes:

  • Fixed error when no arguments are passed into TorchTrainer (#52693)

📖 Documentation:

  • Added new XGBoostTrainer user guide (#52355)

🏗 Architecture refactoring:

  • Re-enabled isort for python/ray/train to maintain code formatting consistency (#52717)

Ray Tune

📖 Documentation:

  • Fixed typo in Ray Tune PyTorch Lightning docs (#52756)

Ray Serve

💫 Enhancements:

  • [LLM] Refactor LLMServer and LLMEngine to not diverge too much from vllm chat formatting logic (#52597)
  • Bump vllm from 0.8.2 to 0.8.5 in /python (#52344)
  • [LLM] Add router replicas and batch size to llm config (#52655)

🔨 Fixes:

  • Request cancellation not propagating correctly across deployments (#52591)
  • BackpressureError not properly propagated in FastAPI ingress deployments (#52397)
  • Hanging issue when awaiting deployment responses (#52561)
  • [Serve.llm] made Ray Serve LLM compatible with vLLM v1 (#52668)

📖 Documentation:

  • [Serve][LLM] Add doc for deploying DeepSeek (#52592)

RLLib

🎉 New Features:

  • Offline Evaluation with loss function for Offline RL pipeline. Introduces three new callbacks, on_offline_evaluate_start, on_offline_evaluate_end, on_offline_eval_runners_recreated (#52308)

💫 Enhancements:

  • New custom_data attribute for SingleAgentEpisode and MultiAgentEpisode to store custom metrics. Deprecates add|get_temporary_timestep_data() (#52603)

Ray Core

💫 Enhancements:

  • Only get serialization context once for all .remote args (#52690)
  • Add grpc server success and fail count metric (#52711)

🔨 Fixes:

  • Fix open leak for plasma store memory (shm/fallback) by workers (#52622)
  • Assure closing of unused pipe for dashboard subprocesses (#52678)
  • Expand protection against dead processes in reporter agent (#52657)
  • [cgraph] Separate metadata and data in cross-node shared memory transport (#52619)
  • Fix JobID check for detached actor tasks (#52405)
  • Fix potential log loss of tail_job_logs (#44709)

🏗 Architecture refactoring:

  • Cancel tasks when an owner dies instead of checking if an owner is dead during scheduling (#52516)
  • Unify GcsAioClient and GcsClient (#52735)
  • Remove worker context dependency from the task receiver (#52740)

Dashboard

🎉 New Features:

  • Ray Train Grafana Dashboard added with a few built-in metrics. More to come.

Thanks!

Thank you to everyone who contributed to this release!
@kevin85421, @edoakes, @wingkitlee0, @alexeykudinkin, @chris-ray-zhang, @sophie0730, @zcin, @raulchen, @matthewdeng, @abrarsheikh, @popojk, @Jay-ju, @ruisearch42, @eicherseiji, @lk-chen, @justinvyu, @dayshah, @kouroshHakha, @NeilGirdhar, @omatthew98, @ishaan-mehta, @davidxia, @ArthurBook, @GeneDer, @srinathk10, @dependabot[bot], @JoshKarpel, @aslonnie, @khluu, @can-anyscale, @israbbani, @saihaj, @MortalHappiness, @alanwguo, @bveeramani, @iamjustinhsu, @Ziy1-Tan, @xingyu-long, @simonsays1980, @fscnick, @chuang0221, @sven1977, @jjyao

Don't miss a new ray release

NewReleases is sending notifications on new releases.