Release Highlights
Ray Data:
- We’ve implemented a variety of performance enhancements, including improved actor/node autoscaling with budget-aware decisions; faster/more accurate shuffle accounting; reduced Parquet metadata footprint; and out-of-order execution for higher throughput.
- We’ve also implemented anti/semi joins, stratified train_test_split, and added Snowflake connectors.
Ray Core:
- Performance/robustness cleanups around GCS publish path and raylet internals; simpler OpenTelemetry flagging; new user-facing API to wait for GPU tensor free; plus assorted test/infra tidy-ups
Ray Train:
- We’ve introduced a new JaxTrainer with SPMD support for TPUs.
Ray Serve:
- Custom Autoscaling per Deployment Serve now supports user-defined autoscaling policies via AutoscalingContext and AutoscalingPolicy, enabling fine-grained scaling logic at the deployment level. This is part of a large effort where we are adding support for autoscaling based on custom metrics in Serve, see this RFC for more details.
- Async Inference (Initial Support): Ray Serve introduces asynchronous inference execution, laying the foundation for better throughput and latency in async workloads. Please see this RFC for more details.
- Major Performance Gains: This version of ray serve brings double digit % performance improvements both in throughput and latency. See release notes for more details.
Ray Serve/Data LLM:
- We’ve refactored Ray Serve LLM to be fully compatible with the default vllm serve and also now supports vLLM=0.10.
- We’ve added a prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization; dynamic cache management via reset prefix cache remote methods; enhanced LMCacheConnectorV1 with kv_transfer_config support.
Ray Libraries
Ray Data
🎉 New Features:
- Wrapped batch indices in a BatchMetadata object to make per-batch metadata explicit. (#55643)
- Added support for Anti/Semi Join types. (#55272)
- Introduced an Issue Detection Framework. (#55155)
- Added an option to enable out-of-order execution for better performance. (#54504)
- Introduced a StreamingSplit logical operator for DAG rewrite. (#54994)
- Added a stratify parameter to train_test_split. (#54624)
- Added Snowflake connectors. (#51429)
- Updated Hudi integration to support incremental query. (#54301)
- Added an Actor location tracker. (#54590)
- Added BundleQueue.has_next. (#54710)
- Made DEFAULT_OBJECT_STORE_MEMORY_LIMIT_FRACTION configurable. (#54873)
- Added Expression support & a with_columns API. (#54322)
- Allocate GPU resources in ResourceManager. (#54445)
💫 Enhancements:
- Decoupled actor and node autoscaling; autoscaling now also considers budget. (#55673, #54902)
- Faster hash-shuffle resource usage calculation; more accurate shuffle progress totals. (#55503, #55543)
- Reduced Parquet metadata storage usage. (#54821)
- Export API improvements: refresh dataset/operator state, sanitize metadata, and truncate exported metadata. (#55355, #55379, #55216, #54623)
- Metrics & observability: task metric improvements, external-buffer block-count metric, row-based metrics, clearer operator names in logs, single debug log when aggregators are ready. (#55429, #55022, #54693, #52949, #54483)
- Dashboard: added “Max Bytes to Read” panel/budget, panels for blocks-per-task and bytes-per-block, and streaming executor duration. (#55024, #55020, #54614)
- Planner/execution & infra cleanups: ExecutionResources and StatsManager cleanup, planner interface refactor, node trackers init, removed ray.get in _MapWorker ctor, removed target_shuffle_max_block_size. (#54694, #55400, #55018, #54665, #54734, #55158)
- Behavior/interop tweaks: map_batches defaults to row_modification=False and avoids pushing past limit; limited operator pushdown; prefetch for PandasJSONDatasource; use cloudpickle for Arrow tensor extension ser/des; bumped Arrow to 21.0; schema warning tone change. (#54992, #54457, #54667, #54831, #55426, #54630)
- Removed randomize-blocks reorder rule for more stable behavior. (#55278)
🔨 Fixes:
- AutoscalingActorPool now properly downscales after execution. (#55565)
- StatsManager handles StatsActor loss on disconnect. (#55163)
- Handle missing chunks key when Databricks UC query returns zero rows. (#54526)
- Handle empty fragments in sampling when num_row_groups=0. (#54822)
- Restored handling of PyExtensionType to keep compatibility with previously written datasets. (#55498)
- Prevent negative resource budget when concurrency exceeds the global limit; fixed resource-manager log calculation. (#54986, #54878)
- Default write_parquet warning removed; handled unhashable types in OneHotEncoding. (#54864, #54863)
- Overwrite mode now maps to the correct Arrow behavior for parallel writes. (#55118)
- Added back from_daft Arrow-version checks. (#54907)
- Pandas chained in-place assignment warning resolved. (#54486)
- Test stability/infra: fixed flaky tests, adjusted bounds and sizes, added additional release tests/chaos variants for image workloads, increased join test size, adjusted sorting release test to produce 1 GB blocks. (#55485, #55489, #54806, #55120, #54716, #55402, #54971)
📖 Documentation:
- Added a user guide for aggregations. (#53568)
- Added a code snippet in docs for partitioned writes. (#55002)
- Updated links to Lance documentation. (#54836)
Ray Train
🎉 New Features:
- Introduced JaxTrainer with SPMD support on TPUs (#55207)
💫 Enhancements:
- ray.train.get_dataset_shard now lazily configures dataset sharding for better startup behavior (#55230)
- Clearer worker error logging (#55222)
- Fail fast when placement group requirements can never be satisfied (#54402)
- New ControllerError surfaced and handled via failure policy for improved resiliency (#54801, #54833)
- TrainStateActor periodically checks controller health and aborts when necessary (#53818)
🔨 Fixes:
- Resolve circular import in ray.train.v2.lightning.lightning_utils (#55668)
- Fix XGBoost v2 callback behavior (#54787)
- Suppress a spurious type error (#50994)
- Reduce test flakiness: remove randomness and bump a data-integration test size (#55315, #55633)
📖 Documentation:
- New LightGBMTrainer user guide (#54492)
- Fix code-snippet syntax highlighting (#54909)
- Minor correction in experiment-tracking guide comment (#54605)
🏗 Architecture refactoring:
- Public Train APIs routed through TrainFnUtils for consistency (#55226)
- LoggingManager utility for Train logging (#55121)
- Convert DEFAULT variables from strings to bools (#55581)
Ray Tune
🎉 New Features:
- Add video FPS support to WandbLoggerCallback (#53638)
💫 Enhancements:
- Typing: reset_config now explicitly returns bool (#54581)
- CheckpointManager supports recording scoring metric only (#54642)
🔨 Fixes:
📖 Documentation:
Ray Serve
🎉 New Features:
- Async inference support in Ray Serve (initial phase). Provides basic asynchronous inference execution, with follow-up work planned for failed/unprocessed queues and additional tests. #54824
- Per-deployment custom autoscaling controls. Introduces AutoscalingContext and AutoscalingPolicy classes, enabling user-defined autoscaling strategies at the deployment level. #55253
- Same event loop router. Adds option to run the Serve router in the same event loop as the proxy, yielding ~17% throughput improvement. #55030
💫 Enhancements:
- Async get_current_servable_instance(). Converts the FastAPI dependency to async def, removing threadpool overhead and boosting performance: 35% higher RPS and reduced latency. #55457
- Access log optimization. Cached contexts in request path logging improved request throughput by ~16% with lower average latency. #55166
- Batching improvements. Default batch wait timeout increased from 0.0s to 0.01s (10ms) to enable meaningful batching. #55126
- HTTP receive refactor. Cleaned up handling of replica-side HTTP receive tasks. #54543 / #54565
- Configurable replica router backoff. Added knobs for retry/backoff control when routing to replicas. #54723
- Autoscaling ergonomics. Marked per-deployment autoscaling metrics push interval config as deprecated for consistency. #55102
- Health check & env var safety. Introduced warnings for invalid/zero/negative environment variable values, with migration path planned for Ray 2.50.0. #55464, #54944
- Improved CLI UX. serve config now prints No configuration was found. instead of an empty string. #54767
🔨 Fixes:
- Removed brittle ray._private dependency usage. #55659
- HTTP route test fixes. Migrated to get_application_url() to avoid hardcoded URLs, reducing flakiness on Windows. #55623, #54974, #54924, #54911, #54704, #54903, #54882, #54877, #54631, #53933
- Semaphore bug fix. Corrected race where more workers than allowed could acquire the semaphore. #55147
- LongPollClient cancellation. Prevented spurious cancellation of listen_for_change. #54832
- Backpressure error code. gRPC now returns RESOURCE_EXHAUSTED instead of UNAVAILABLE on overload. #54537
- Logging improvements. Added request IDs to proxy access logs; avoided duplicate shutdown log lines. #54657, #54534
- Test stability. Various waits, deflakes, and sync fixes across Serve tests. #54794, #54522, #54585
📖 Documentation:
- Unexpected queuing behavior. Documented quirks in handle request queuing. #54542
🏗 Architecture refactoring:
- Router/handle internals refactored for clarity and future feature expansion. #55635
- Model composition benchmarks. Added benchmarking to track performance of common composition patterns. #55549
- Constants refactor. Utility functions moved out of constants.py for better readability and stricter env var validation. #54944, #55464
- Ray internals migration. Moved usage, ray_option_utils, and selected constants from _private to _common. #54915, #54578
Ray Serve/Data LLM
🎉 New Features:
- Prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization. (#55218, #55588)
- Reset prefix cache remote method for dynamic cache management. (#55658)
- LMCacheConnectorV1 support for kv_transfer_config to enhance key-value transfer configurations. (#54579)
- LLMServer and LLMEngine major refactor for 100% vLLM serve frontend compatibility. (#54554)
💫 Enhancements:
- vLLM engine upgrade to version 0.10.0 with improved performance and compatibility. (#55067)
- Enhanced error handling for invalid model_id parameters with clearer error messages. (#55589)
- Improved telemetry handling with better race condition management for push operations. (#55558)
- Optimized deployment defaults with better configuration values to prevent bottlenecks. (#54696)
- LoRA workflow improvements with refactored downloading and utility functions. (#54946)
- LLMServer refactor to synchronous initialization for better reliability. (#54835)
- Mistral tokenizer support for tekken tokenizer compatibility. (#54666)
- Smart batching logic that skips batching when batch_interval_ms == 0. (#54751)
- Dashboard enhancements with improved LLM metrics and monitoring capabilities. (#54797)
🔨 Fixes:
- Pyright linting corrections for Ray Serve LLM examples. (#55284)
- Test stability improvements for DeepSeek model and vLLM engine processor tests. (#55401, #55120)
- Serialization fixes for ChatCompletionRequest tool_calls ValidatorIterator objects. (#55538)
📖 Documentation:
- Prefix cache router documentation with comprehensive usage examples. (#55218)
- Multi-LoRA documentation improvements with clearer setup instructions. (#54788)
- STRICT_PACK strategy FAQ documentation explaining data.llm packing behavior. (#55505)
🏗 Architecture refactoring:
- Docker image optimizations with UCX and NCCL updates, plus GKE GPU operator compatibility paths. (#54598, #55206)
RLlib
🎉 New Features:
- Implemented Implicit Q-Learning (IQL). (#55304, #55422)
- DreamerV3 is now available in PyTorch. (#45463, #55140)
- Discrete actions support for SAC. (#53982)
💫 Enhancements:
- Upgraded RLlink protocol for external env/simulator training. (#53550)
- Performance improvements in Offline RL API through switching to
iter_torch_batches
. (#54277) - Added an example for curriculum learning in Atari Pong. (#55304)
🔨 Fixes:
- Corrected
TensorType
handling. (#55694) - Fixed a bug with multi-learner setups in Offline RL API. (#55693)
- Addressed
ImportError
in Atari examples. (#54967) - Fixed some bugs in the docs for IQL and CQL. (#55614)
- Increased default timesteps on two experiments. (#54185)
- Fixed
TorchMultiCategorical.to_deterministic
when having different number of categories and logits with time dimension. (#54414) - Added missing documentation for
SACConfig
'straining()
. (#53918) - Fixed bug in
restore_from_path
such that connector states are also restored on remoteEnvRunner
s. (#54672) - Fixed missing support for
config.count_steps_by = "agent_steps"
. (#54885) - Added missing colon to
CUBLAS_WORKSPACE_CONFIG
. (#53913) - Removed
rllib_contrib
completely from RLlib. (#55182)
🏗 Architecture refactoring:
- Deprecated TensorFlow support from new API stack. (#55042)
- Deprecated input/output specs from
RLModule
. (#55141) - Deprecated
--enable-new-api-stack
flag from all scripts. (#54853, #54702)
Ray Core
🎉 New Features:
💫 Enhancements:
- [core][gpu-objects] Garbage collection (#53911)
- [core] Support pip_install_options for pip (#53551)
- [core][gpu-objects] Move data transfers to a background thread (#54256)
- [core][gpu-objects] Pass tensor_transport to store_task_errors even if the actor task throws an exception (#55427)
- [core][gpu-objects] Exception handling for application errors (#55442)
- [core][gpu-object] Add a user-facing call to wait for tensor to be freed (#55076)
- [Core] Bind ray internal servers to the specified node ip instead of 0.0.0.0 which improves the security (#55178, #55210, #55298)
- [core] Fallback unserializable exceptions to their string representation (#55476)
🔨 Fixes:
- [core] Fix objects_valid check failure with except from BaseException (#55602)
- [core][gpu-objects] Avoid triggering a KeyError by the GPU object GC callback for intra-actor communication (#54556)
- [core] fix checking for uv existence during ray_runtime setup (#54141)
- [core][autoscaler][v1] add heartbeat timeout logic to determine node activity status (#54030)
- [core] prevent sending SIGTERM after calling Worker::MarkDead (#54377)
- [Core] Fixed the bug where the head was unable to submit tasks after redis is turned on. (#54267)
- [Core] [Azure] query for supported Microsoft.Network/virtualNetworks API versions instead of relying on resource_client.DEFAULT_API_VERSION (#54874)
- [core] Fix possible race by checking node cache status instead of just subscription (#54745)
- [core] Fix get actor timeout multiplier (#54525)
- [core]: Use a temporary file to share default worker path in runtime env (#53653)
- [core] Fix check fail when task buffer periodical runner runs before RayEvent is initialized (#55249)
- [core] Patch grpc with RAY_num_grpc_threads to control grpc thread count (#54988)
- [core][gpu-objects] Always write to GPUObjectStore to avoid _get_tensor_meta() from hanging indefinitely. (#55433)
- [Core] Core Worker GetObjStatus GRPC Fault Tolerance (#54567)
📖 Documentation:
- Added guide on using type hints with Ray Core. (#55013)
🏗 Architecture refactoring:
Dashboard
💫 Enhancements:
- Grafana: new Operator filter for Data; Prometheus adds
RayNodeType
label on for nodes. (#55493, #55192)
🔨 Fixes:
- Removed references to a deleted Data metrics panel. (#55478)
Ray Images
🎉 New Features:
💫 Enhancements:
- Upgraded protobuf to v4 (#54496)
Docs
💫 Enhancements:
- KubeRay docs: added InteractiveMode quick-start details; expanded Core type-hints guidance; Serve LLM example coverage; Data LLM batching FAQ (#55570, #55284)
🔨 Fixes:
Thanks!
Thank you to everyone who contributed to this release!
@pavitrabhalla, @Daraan, @Sparks0219, @daiping8, @abrarsheikh, @sven1977, @Toshaksha, @bveeramani, @MengjinYan, @GokuMohandas, @codope, @nadongjun, @SolitaryThinker, @matthewdeng, @elliot-barn, @isimluk, @avibasnet31, @OneSizeFitsQuorum, @Future-Outlier, @marosset, @jackfrancis, @kshanmol, @eicherseiji, @dayshah, @iamjustinhsu, @Qiaolin-Yu, @goutamvenkat-anyscale, @Yicheng-Lu-llll, @yantarou, @rclough, @zcin, @NeilGirdhar, @VarunBhandary, @400Ping, @akshay-anyscale, @vickytsang, @xushiyan, @JasonLi1909, @n-elia, @simonsays1980, @dragongu, @Kishanthan, @ruisearch42, @jectpro7, @TimothySeah, @liulehui, @rueian, @HollowMan6, @akyang-anyscale, @axreldable, @czgdp1807, @alanwguo, @justinvyu, @ok-scale, @my-vegetable-has-exploded, @landscapepainter, @fscnick, @machichima, @mpashkovskii, @ZacAttack, @gvspraveen, @sword865, @lmsh7, @Ziy1-Tan, @rebel-scottlee, @sampan-s-nayak, @coqian, @can-anyscale, @Bye-legumes, @win5923, @MortalHappiness, @angelinalg, @khluu, @aslonnie, @krishnakalyan3, @minosvasilias, @x-tong, @xinyuangui2, @raulchen, @Yangruipis, @edoakes, @kevin85421, @wingkitlee0, @Fokko, @cristianjd, @srinathk10, @owenowenisme, @JoshKarpel, @MengqingCao, @leopardracer, @westonpace, @LeslieWongCV, @VassilisVassiliadis, @crypdick, @alexeykudinkin, @mjacar, @kunling-anyscale, @saihaj, @kouroshHakha, @ema-pe, @markjm, @avigyabb, @dshepelev15, @mauvilsa, @omatthew98, @nrghosh, @ryanaoleary, @Aydin-ab, @lk-chen, @stephanie-wang, @harshit-anyscale, @jjyao, @bullgom, @Yevet, @israbbani