Release Highlights
- Prefill disaggregation is now supported in initial support in Ray Serve LLM (#53092). This is critical for production LLM serving use cases.
- Ray Data features a variety of performance improvements (locality-based scheduling, non-blocking execution) as well as improvements to observability, preprocessors, and other stability fixes.
- Ray Serve now features custom request routing algorithms, which is critical for high throughput traffic for large model use cases.
Ray Libraries
Ray Data
π New Features:
- Add save modes support to file data sinks (#52900)
- Added flattening capability to the Concatenator preprocessor to support output vectorization use cases (#53378)
π« Enhancements:
- Re-enable Actor locality-based scheduling. This PR also improves algorithms for ranking the locations for the bundle. (#52861)
- Disable blocking pipeline by default until Actor Pool fully scales up to min actors (#52754)
- Progress bar and dashboard improvements to show name of partial functions properly(#52280)
π¨ Fixes:
- Make Ray Data
from_torch
respect Dataset len (#52804) - Fixing flaky aggregation test (#53383)
- Fix race condition bug in fault tolerance by disabling
on_exit
hook (#53249) - Fix
move_tensors_to_device
utility for the list/tuple[tensor] case (#53109) - Fix
ActorPool
scaling to avoid scaling down when the input queue is empty (#53009) - Fix internal queues accounting for all Operators w/ an internal queue (#52806)
- Fix backpressure for
FileBasedDatasource
. This fixes potential OOMs for workloads usingFileBasedDatasources
(#52852)
π Documentation:
- Fix working code snippets (#52748)
- Improve AggregateFnV2 docstrings and examples (#52911)
- Improved documentation for vectorizers and API visibility in Data (#52456)
Ray Train
π New Features:
- Added support for configuring Ray Train worker actor runtime environments. (#52421)
- Included Grafana panel data in Ray Train export for improved monitoring. (#53072)
- Introduced a structured logging environment variable to standardize log formats. (#52952)
- Added metrics for
TrainControllerState
to enhance observability. (#52805)
π« Enhancements:
- Logging of controller state transitions to aid in debugging and analysis. (#53344)
- Improved handling of
Noop
scaling decisions for smoother scaling logic. (#53180)
π¨ Fixes:
- Improved
move_tensors_to_device utility
to correctly handlelist
/tuple
of tensors. (#53109) - Fixed GPU transfer support for non-contiguous tensors. (#52548)
- Increased timeout in
test_torch_device_manager
to reduce flakiness. (#52917)
π Documentation:
- Added a note about PyTorch DataLoaderβs multiprocessing and forkserver usage. (#52924)
- Fixed various docstring format and indentation issues. (#52855, #52878)
- Removed unused "configuration-overview" documentation page. (#52912)
- General typo corrections. (#53048)
π Architecture refactoring:
- Deduplicated ML doctest runners in CI for efficiency. (#53157)
- Converted isort configuration to Ruff for consistency. (#52869)
- Removed unused
PARALLEL_CI
blocks and combined imports. (#53087, #52742)
Ray Tune
π« Enhancements:
- Updated
test_train_v2_integration
to use the correctRunConfig
. (#52882)
π Documentation:
- Replaced
session.report
withtune.report
and corrected import paths. (#52801) - Removed outdated graphics cards reference in docs. (#52922)
- Fixed various docstring format issues. (#52879)
Ray Serve
π New Features:
- Added support for implementing custom request routing algorithms. (#53251)
- Introduced an environment variable to prioritize custom resources during deployment scheduling. (#51978)
π« Enhancements:
- The ingress API now accepts a builder function in addition to an ASGI app object. (#52892)
π¨ Fixes:
- Fixed
runtime_env
validation forpy_modules
. (#53186) - Disallowed special characters in Serve deployment and application names. (#52702)
- Added a descriptive error message when a deployment name is not found. (#45181)
π Documentation:
- Updated the guide on serving models with Triton Server in Ray Serve.
- Added documentation for custom request routing algorithms.
Ray Serve/Data LLM
π New Features:
- Added initial support for prefill decode disaggregation (#53092)
- Expose vLLM Metrics to
serve.llm
API (#52719) - Embedding API (#52229)
π« Enhancements:
- Allow setting
name_prefix
inbuild_llm_deployment
(#53316) - Minor bug fix for 53144: stop tokens cannot be null (#53288)
- Add missing
repetition_penalty
vLLM sampling parameter (#53222) - Mitigate the serve.llm streaming overhead by properly batching stream chunks (#52766)
- Fix test_batch_vllm leaking resources by using larger
wait_for_min_actors_s
π¨ Fixes:
LLMRouter.check_health()
should checkLLMServer.check_health()
(#53358)- Fix runtime passthrough and auto-executor class selection (#53253)
- Update
check_health
return type (#53114) - Bug fix for duplication of
<bos>
token (#52853) - In stream batching, first part of the stream was always consumed and not streamed back from the router (#52848)
RLlib
π New Features:
- Add GPU inference to offline evaluation. (#52718)
π« Enhancements:
- Do-over of examples for connector pipelines. (#52604)
- Cleanup of meta learning classes and examples. (#52680)
π¨ Fixes:
- Fixed weight synching in offline evaluation. (#52757)
- Fixed bug in
split_and_zero_pad
utility function (related to complex structures vs simple values ornp.arrays
). (#52818)
Ray Core
π« Enhancements:
uv run
integration is now enabled by default, so you don't need to set theRAY_RUNTIME_ENV_HOOK
any more (#53060)- Record gcs process metrics (#53171)
π¨ Fixes:
- Improvements for using
RuntimeEnv
in the Job Submission API. (#52704) - Close unused pipe file descriptor of child processes of Raylet (#52700)
- Fix race condition when canceling task that hasn't started yet (#52703)
- Implement a thread pool and call the CPython API on all threads within the same concurrency group (#52575)
- cgraph: Fix execution schedules with collective operations (#53007)
- cgraph: Fix scalar tensor serialization edge case with
serialize_to_numpy_or_scalar
(#53160) - Fix the issue where a valid
RestartActor
rpc is ignored (#53330) - Fix reference counter crashes during worker graceful shutdown (#53002)
Dashboard
π New Features:
- train: Add dynolog for on-demand GPU profiling for Torch training (#53191)
π« Enhancements:
- Add configurability of 'orgId' param for requesting Grafana dashboards (#53236)
π¨ Fixes:
- Fix Grafana dashboards dropdowns for data and train dashboard (#52752)
- Fix dashboard for daylight savings (#52755)
Ray Container Images
π« Enhancements:
- Upgrade
h11
(#53361),requests
,starlette
,jinja2
(#52951),pyopenssl
andcryptography
(#52941) - Generate multi-arch image indexes (#52816)
Docs
π New Features:
- End-to-end example: Entity recognition with LLMs (#52342) - new end-to-end example
- End-to-end example: xgboost tutorial (#52383)
- End-to-end tutorial for audio transcription and LLM as judge curation (#53189)
π« Enhancements:
- Adds pydoclint to pre-commit (#52974)
Thanks!
Thank you to everyone who contributed to this release!
@NeilGirdhar, @ok-scale, @JiangJiaWei1103, @brandonscript, @eicherseiji, @ktyxx, @MichalPitr, @GeneDer, @rueian, @khluu, @bveeramani, @ArturNiederfahrenhorst, @c8ef, @lk-chen, @alanwguo, @simonsays1980, @codope, @ArthurBook, @kouroshHakha, @Yicheng-Lu-llll, @jujipotle, @aslonnie, @justinvyu, @machichima, @pcmoritz, @saihaj, @wingkitlee0, @omatthew98, @can-anyscale, @nadongjun, @chris-ray-zhang, @dizer-ti, @matthewdeng, @ryanaoleary, @janimo, @crypdick, @srinathk10, @cszhu, @TimothySeah, @iamjustinhsu, @mimiliaogo, @angelinalg, @gvspraveen, @kevin85421, @jjyao, @elliot-barn, @xingyu-long, @LeoLiao123, @thomasdesr, @ishaan-mehta, @noemotiovon, @hipudding, @davidxia, @omahs, @MengjinYan, @dengwxn, @MortalHappiness, @alhparsa, @emmanuel-ferdman, @alexeykudinkin, @KunWuLuan, @dev-goyal, @sven1977, @akyang-anyscale, @GokuMohandas, @raulchen, @abrarsheikh, @edoakes, @JoshKarpel, @bhmiller, @seanlaii, @ruisearch42, @dayshah, @Bye-legumes, @petern48, @richardliaw, @rclough, @israbbani, @jiwq