Release Highlights

Ray Data:

We’ve implemented a variety of performance enhancements, including improved actor/node autoscaling with budget-aware decisions; faster/more accurate shuffle accounting; reduced Parquet metadata footprint; and out-of-order execution for higher throughput.
We’ve also implemented anti/semi joins, stratified train_test_split, and added Snowflake connectors.

Ray Core:

Performance/robustness cleanups around GCS publish path and raylet internals; simpler OpenTelemetry flagging; new user-facing API to wait for GPU tensor free; plus assorted test/infra tidy-ups

Ray Train:

We’ve introduced a new JaxTrainer with SPMD support for TPUs.

Ray Serve:

Custom Autoscaling per Deployment Serve now supports user-defined autoscaling policies via AutoscalingContext and AutoscalingPolicy, enabling fine-grained scaling logic at the deployment level. This is part of a large effort where we are adding support for autoscaling based on custom metrics in Serve, see this RFC for more details.
Async Inference (Initial Support): Ray Serve introduces asynchronous inference execution, laying the foundation for better throughput and latency in async workloads. Please see this RFC for more details.
Major Performance Gains: This version of ray serve brings double digit % performance improvements both in throughput and latency. See release notes for more details.

Ray Serve/Data LLM:

We’ve refactored Ray Serve LLM to be fully compatible with the default vllm serve and also now supports vLLM=0.10.
We’ve added a prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization; dynamic cache management via reset prefix cache remote methods; enhanced LMCacheConnectorV1 with kv_transfer_config support.

Ray Libraries

Ray Data

🎉 New Features:

Wrapped batch indices in a BatchMetadata object to make per-batch metadata explicit. (#55643)
Added support for Anti/Semi Join types. (#55272)
Introduced an Issue Detection Framework. (#55155)
Added an option to enable out-of-order execution for better performance. (#54504)
Introduced a StreamingSplit logical operator for DAG rewrite. (#54994)
Added a stratify parameter to train_test_split. (#54624)
Added Snowflake connectors. (#51429)
Updated Hudi integration to support incremental query. (#54301)
Added an Actor location tracker. (#54590)
Added BundleQueue.has_next. (#54710)
Made DEFAULT_OBJECT_STORE_MEMORY_LIMIT_FRACTION configurable. (#54873)
Added Expression support & a with_columns API. (#54322)
Allocate GPU resources in ResourceManager. (#54445)

💫 Enhancements:

Decoupled actor and node autoscaling; autoscaling now also considers budget. (#55673, #54902)
Faster hash-shuffle resource usage calculation; more accurate shuffle progress totals. (#55503, #55543)
Reduced Parquet metadata storage usage. (#54821)
Export API improvements: refresh dataset/operator state, sanitize metadata, and truncate exported metadata. (#55355, #55379, #55216, #54623)
Metrics & observability: task metric improvements, external-buffer block-count metric, row-based metrics, clearer operator names in logs, single debug log when aggregators are ready. (#55429, #55022, #54693, #52949, #54483)
Dashboard: added “Max Bytes to Read” panel/budget, panels for blocks-per-task and bytes-per-block, and streaming executor duration. (#55024, #55020, #54614)
Planner/execution & infra cleanups: ExecutionResources and StatsManager cleanup, planner interface refactor, node trackers init, removed ray.get in _MapWorker ctor, removed target_shuffle_max_block_size. (#54694, #55400, #55018, #54665, #54734, #55158)
Behavior/interop tweaks: map_batches defaults to row_modification=False and avoids pushing past limit; limited operator pushdown; prefetch for PandasJSONDatasource; use cloudpickle for Arrow tensor extension ser/des; bumped Arrow to 21.0; schema warning tone change. (#54992, #54457, #54667, #54831, #55426, #54630)
Removed randomize-blocks reorder rule for more stable behavior. (#55278)

🔨 Fixes:

AutoscalingActorPool now properly downscales after execution. (#55565)
StatsManager handles StatsActor loss on disconnect. (#55163)
Handle missing chunks key when Databricks UC query returns zero rows. (#54526)
Handle empty fragments in sampling when num_row_groups=0. (#54822)
Restored handling of PyExtensionType to keep compatibility with previously written datasets. (#55498)
Prevent negative resource budget when concurrency exceeds the global limit; fixed resource-manager log calculation. (#54986, #54878)
Default write_parquet warning removed; handled unhashable types in OneHotEncoding. (#54864, #54863)
Overwrite mode now maps to the correct Arrow behavior for parallel writes. (#55118)
Added back from_daft Arrow-version checks. (#54907)
Pandas chained in-place assignment warning resolved. (#54486)
Test stability/infra: fixed flaky tests, adjusted bounds and sizes, added additional release tests/chaos variants for image workloads, increased join test size, adjusted sorting release test to produce 1 GB blocks. (#55485, #55489, #54806, #55120, #54716, #55402, #54971)

📖 Documentation:

Added a user guide for aggregations. (#53568)
Added a code snippet in docs for partitioned writes. (#55002)
Updated links to Lance documentation. (#54836)

Ray Train

🎉 New Features:

Introduced JaxTrainer with SPMD support on TPUs (#55207)

💫 Enhancements:

ray.train.get_dataset_shard now lazily configures dataset sharding for better startup behavior (#55230)
Clearer worker error logging (#55222)
Fail fast when placement group requirements can never be satisfied (#54402)
New ControllerError surfaced and handled via failure policy for improved resiliency (#54801, #54833)
TrainStateActor periodically checks controller health and aborts when necessary (#53818)

🔨 Fixes:

Resolve circular import in ray.train.v2.lightning.lightning_utils (#55668)
Fix XGBoost v2 callback behavior (#54787)
Suppress a spurious type error (#50994)
Reduce test flakiness: remove randomness and bump a data-integration test size (#55315, #55633)

📖 Documentation:

New LightGBMTrainer user guide (#54492)
Fix code-snippet syntax highlighting (#54909)
Minor correction in experiment-tracking guide comment (#54605)

🏗 Architecture refactoring:

Public Train APIs routed through TrainFnUtils for consistency (#55226)
LoggingManager utility for Train logging (#55121)
Convert DEFAULT variables from strings to bools (#55581)

Ray Tune

🎉 New Features:

Add video FPS support to WandbLoggerCallback (#53638)

💫 Enhancements:

Typing: reset_config now explicitly returns bool (#54581)
CheckpointManager supports recording scoring metric only (#54642)

🔨 Fixes:

Fix XGBoost v2 callback integration (#54787)
Correct type for RunConfig.progress_reporter (#48439)

📖 Documentation:

Minor fixes (#55125, #54942)

Ray Serve

🎉 New Features:

Async inference support in Ray Serve (initial phase). Provides basic asynchronous inference execution, with follow-up work planned for failed/unprocessed queues and additional tests. #54824
Per-deployment custom autoscaling controls. Introduces AutoscalingContext and AutoscalingPolicy classes, enabling user-defined autoscaling strategies at the deployment level. #55253
Same event loop router. Adds option to run the Serve router in the same event loop as the proxy, yielding ~17% throughput improvement. #55030

💫 Enhancements:

Async get_current_servable_instance(). Converts the FastAPI dependency to async def, removing threadpool overhead and boosting performance: 35% higher RPS and reduced latency. #55457
Access log optimization. Cached contexts in request path logging improved request throughput by ~16% with lower average latency. #55166
Batching improvements. Default batch wait timeout increased from 0.0s to 0.01s (10ms) to enable meaningful batching. #55126
HTTP receive refactor. Cleaned up handling of replica-side HTTP receive tasks. #54543 / #54565
Configurable replica router backoff. Added knobs for retry/backoff control when routing to replicas. #54723
Autoscaling ergonomics. Marked per-deployment autoscaling metrics push interval config as deprecated for consistency. #55102
Health check & env var safety. Introduced warnings for invalid/zero/negative environment variable values, with migration path planned for Ray 2.50.0. #55464, #54944
Improved CLI UX. serve config now prints No configuration was found. instead of an empty string. #54767

🔨 Fixes:

Removed brittle ray._private dependency usage. #55659
HTTP route test fixes. Migrated to get_application_url() to avoid hardcoded URLs, reducing flakiness on Windows. #55623, #54974, #54924, #54911, #54704, #54903, #54882, #54877, #54631, #53933
Semaphore bug fix. Corrected race where more workers than allowed could acquire the semaphore. #55147
LongPollClient cancellation. Prevented spurious cancellation of listen_for_change. #54832
Backpressure error code. gRPC now returns RESOURCE_EXHAUSTED instead of UNAVAILABLE on overload. #54537
Logging improvements. Added request IDs to proxy access logs; avoided duplicate shutdown log lines. #54657, #54534
Test stability. Various waits, deflakes, and sync fixes across Serve tests. #54794, #54522, #54585

📖 Documentation:

Unexpected queuing behavior. Documented quirks in handle request queuing. #54542

🏗 Architecture refactoring:

Router/handle internals refactored for clarity and future feature expansion. #55635
Model composition benchmarks. Added benchmarking to track performance of common composition patterns. #55549
Constants refactor. Utility functions moved out of constants.py for better readability and stricter env var validation. #54944, #55464
Ray internals migration. Moved usage, ray_option_utils, and selected constants from _private to _common. #54915, #54578

Ray Serve/Data LLM

🎉 New Features:

Prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization. (#55218, #55588)
Reset prefix cache remote method for dynamic cache management. (#55658)
LMCacheConnectorV1 support for kv_transfer_config to enhance key-value transfer configurations. (#54579)
LLMServer and LLMEngine major refactor for 100% vLLM serve frontend compatibility. (#54554)

💫 Enhancements:

vLLM engine upgrade to version 0.10.0 with improved performance and compatibility. (#55067)
Enhanced error handling for invalid model_id parameters with clearer error messages. (#55589)
Improved telemetry handling with better race condition management for push operations. (#55558)
Optimized deployment defaults with better configuration values to prevent bottlenecks. (#54696)
LoRA workflow improvements with refactored downloading and utility functions. (#54946)
LLMServer refactor to synchronous initialization for better reliability. (#54835)
Mistral tokenizer support for tekken tokenizer compatibility. (#54666)
Smart batching logic that skips batching when batch_interval_ms == 0. (#54751)
Dashboard enhancements with improved LLM metrics and monitoring capabilities. (#54797)

🔨 Fixes:

Pyright linting corrections for Ray Serve LLM examples. (#55284)
Test stability improvements for DeepSeek model and vLLM engine processor tests. (#55401, #55120)
Serialization fixes for ChatCompletionRequest tool_calls ValidatorIterator objects. (#55538)

📖 Documentation:

Prefix cache router documentation with comprehensive usage examples. (#55218)
Multi-LoRA documentation improvements with clearer setup instructions. (#54788)
STRICT_PACK strategy FAQ documentation explaining data.llm packing behavior. (#55505)

🏗 Architecture refactoring:

Docker image optimizations with UCX and NCCL updates, plus GKE GPU operator compatibility paths. (#54598, #55206)

RLlib

🎉 New Features:

Implemented Implicit Q-Learning (IQL). (#55304, #55422)
DreamerV3 is now available in PyTorch. (#45463, #55140)
Discrete actions support for SAC. (#53982)

💫 Enhancements:

Upgraded RLlink protocol for external env/simulator training. (#53550)
Performance improvements in Offline RL API through switching to iter_torch_batches. (#54277)
Added an example for curriculum learning in Atari Pong. (#55304)

🔨 Fixes:

Corrected TensorType handling. (#55694)
Fixed a bug with multi-learner setups in Offline RL API. (#55693)
Addressed ImportError in Atari examples. (#54967)
Fixed some bugs in the docs for IQL and CQL. (#55614)
Increased default timesteps on two experiments. (#54185)
Fixed TorchMultiCategorical.to_deterministic when having different number of categories and logits with time dimension. (#54414)
Added missing documentation for SACConfig's training(). (#53918)
Fixed bug in restore_from_path such that connector states are also restored on remote EnvRunners. (#54672)
Fixed missing support for config.count_steps_by = "agent_steps". (#54885)
Added missing colon to CUBLAS_WORKSPACE_CONFIG. (#53913)
Removed rllib_contrib completely from RLlib. (#55182)

🏗 Architecture refactoring:

Deprecated TensorFlow support from new API stack. (#55042)
Deprecated input/output specs from RLModule. (#55141)
Deprecated --enable-new-api-stack flag from all scripts. (#54853, #54702)

Ray Core

🎉 New Features:

[Core] Add AcceleratorManager implementation for Rebellions NPU (#53985 )

💫 Enhancements:

[core][gpu-objects] Garbage collection (#53911 )
[core] Support pip_install_options for pip (#53551 )
[core][gpu-objects] Move data transfers to a background thread (#54256 )
[core][gpu-objects] Pass tensor_transport to store_task_errors even if the actor task throws an exception (#55427)
[core][gpu-objects] Exception handling for application errors (#55442 )
[core][gpu-object] Add a user-facing call to wait for tensor to be freed (#55076)
[Core] Bind ray internal servers to the specified node ip instead of 0.0.0.0 which improves the security (#55178, #55210, #55298)
[core] Fallback unserializable exceptions to their string representation (#55476)

🔨 Fixes:

[core] Fix objects_valid check failure with except from BaseException (#55602)
[core][gpu-objects] Avoid triggering a KeyError by the GPU object GC callback for intra-actor communication (#54556)
[core] fix checking for uv existence during ray_runtime setup (#54141 )
[core][autoscaler][v1] add heartbeat timeout logic to determine node activity status (#54030)
[core] prevent sending SIGTERM after calling Worker::MarkDead (#54377 )
[Core] Fixed the bug where the head was unable to submit tasks after redis is turned on. (#54267)
[Core] [Azure] query for supported Microsoft.Network/virtualNetworks API versions instead of relying on resource_client.DEFAULT_API_VERSION (#54874)
[core] Fix possible race by checking node cache status instead of just subscription (#54745)
[core] Fix get actor timeout multiplier (#54525)
[core]: Use a temporary file to share default worker path in runtime env (#53653)
[core] Fix check fail when task buffer periodical runner runs before RayEvent is initialized (#55249)
[core] Patch grpc with RAY_num_grpc_threads to control grpc thread count (#54988)
[core][gpu-objects] Always write to GPUObjectStore to avoid _get_tensor_meta() from hanging indefinitely. (#55433)
[Core] Core Worker GetObjStatus GRPC Fault Tolerance (#54567 )

📖 Documentation:

Added guide on using type hints with Ray Core. (#55013)

🏗 Architecture refactoring:

Migrate metric collection from opencensus to opentelemetry (#53098, #53740)

Dashboard

💫 Enhancements:

Grafana: new Operator filter for Data; Prometheus adds RayNodeType label on for nodes. (#55493, #55192)

🔨 Fixes:

Removed references to a deleted Data metrics panel. (#55478)

Ray Images

🎉 New Features:

Added cuda 12.6 based images (#55425, #55444)

💫 Enhancements:

Upgraded protobuf to v4 (#54496)

Docs

💫 Enhancements:

KubeRay docs: added InteractiveMode quick-start details; expanded Core type-hints guidance; Serve LLM example coverage; Data LLM batching FAQ (#55570, #55284)

🔨 Fixes:

Various formatting/mis-highlighting and lints across Train/Tune/Serve LLM docs. (#55284, #54763)

Thanks!

Thank you to everyone who contributed to this release!
@pavitrabhalla, @Daraan, @Sparks0219, @daiping8, @abrarsheikh, @sven1977, @Toshaksha, @bveeramani, @MengjinYan, @GokuMohandas, @codope, @nadongjun, @SolitaryThinker, @matthewdeng, @elliot-barn, @isimluk, @avibasnet31, @OneSizeFitsQuorum, @Future-Outlier, @marosset, @jackfrancis, @kshanmol, @eicherseiji, @dayshah, @iamjustinhsu, @Qiaolin-Yu, @goutamvenkat-anyscale, @Yicheng-Lu-llll, @yantarou, @rclough, @zcin, @NeilGirdhar, @VarunBhandary, @400Ping, @akshay-anyscale, @vickytsang, @xushiyan, @JasonLi1909, @n-elia, @simonsays1980, @dragongu, @Kishanthan, @ruisearch42, @jectpro7, @TimothySeah, @liulehui, @rueian, @HollowMan6, @akyang-anyscale, @axreldable, @czgdp1807, @alanwguo, @justinvyu, @ok-scale, @my-vegetable-has-exploded, @landscapepainter, @fscnick, @machichima, @mpashkovskii, @ZacAttack, @gvspraveen, @sword865, @lmsh7, @Ziy1-Tan, @rebel-scottlee, @sampan-s-nayak, @coqian, @can-anyscale, @Bye-legumes, @win5923, @MortalHappiness, @angelinalg, @khluu, @aslonnie, @krishnakalyan3, @minosvasilias, @x-tong, @xinyuangui2, @raulchen, @Yangruipis, @edoakes, @kevin85421, @wingkitlee0, @Fokko, @cristianjd, @srinathk10, @owenowenisme, @JoshKarpel, @MengqingCao, @leopardracer, @westonpace, @LeslieWongCV, @VassilisVassiliadis, @crypdick, @alexeykudinkin, @mjacar, @kunling-anyscale, @saihaj, @kouroshHakha, @ema-pe, @markjm, @avigyabb, @dshepelev15, @mauvilsa, @omatthew98, @nrghosh, @ryanaoleary, @Aydin-ab, @lk-chen, @stephanie-wang, @harshit-anyscale, @jjyao, @bullgom, @Yevet, @israbbani

ray-project/ray ray-2.49.0 Ray-2.49.0 on GitHub

Release Highlights

Ray Libraries

Ray Data

Ray Train

Ray Tune

Ray Serve

Ray Serve/Data LLM

RLlib

Ray Core

Dashboard

Ray Images

Docs

Thanks!

ray-project/ray ray-2.49.0
Ray-2.49.0

on GitHub