github ray-project/ray ray-2.49.0
Ray-2.49.0

latest release: ray-2.49.1
8 days ago

Release Highlights

Ray Data:

  • We’ve implemented a variety of performance enhancements, including improved actor/node autoscaling with budget-aware decisions; faster/more accurate shuffle accounting; reduced Parquet metadata footprint; and out-of-order execution for higher throughput.
  • We’ve also implemented anti/semi joins, stratified train_test_split, and added Snowflake connectors.

Ray Core:

  • Performance/robustness cleanups around GCS publish path and raylet internals; simpler OpenTelemetry flagging; new user-facing API to wait for GPU tensor free; plus assorted test/infra tidy-ups

Ray Train:

  • We’ve introduced a new JaxTrainer with SPMD support for TPUs.

Ray Serve:

  • Custom Autoscaling per Deployment Serve now supports user-defined autoscaling policies via AutoscalingContext and AutoscalingPolicy, enabling fine-grained scaling logic at the deployment level. This is part of a large effort where we are adding support for autoscaling based on custom metrics in Serve, see this RFC for more details.
  • Async Inference (Initial Support): Ray Serve introduces asynchronous inference execution, laying the foundation for better throughput and latency in async workloads. Please see this RFC for more details.
  • Major Performance Gains: This version of ray serve brings double digit % performance improvements both in throughput and latency. See release notes for more details.

Ray Serve/Data LLM:

  • We’ve refactored Ray Serve LLM to be fully compatible with the default vllm serve and also now supports vLLM=0.10.
  • We’ve added a prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization; dynamic cache management via reset prefix cache remote methods; enhanced LMCacheConnectorV1 with kv_transfer_config support.

Ray Libraries

Ray Data

🎉 New Features:

  • Wrapped batch indices in a BatchMetadata object to make per-batch metadata explicit. (#55643)
  • Added support for Anti/Semi Join types. (#55272)
  • Introduced an Issue Detection Framework. (#55155)
  • Added an option to enable out-of-order execution for better performance. (#54504)
  • Introduced a StreamingSplit logical operator for DAG rewrite. (#54994)
  • Added a stratify parameter to train_test_split. (#54624)
  • Added Snowflake connectors. (#51429)
  • Updated Hudi integration to support incremental query. (#54301)
  • Added an Actor location tracker. (#54590)
  • Added BundleQueue.has_next. (#54710)
  • Made DEFAULT_OBJECT_STORE_MEMORY_LIMIT_FRACTION configurable. (#54873)
  • Added Expression support & a with_columns API. (#54322)
  • Allocate GPU resources in ResourceManager. (#54445)

💫 Enhancements:

  • Decoupled actor and node autoscaling; autoscaling now also considers budget. (#55673, #54902)
  • Faster hash-shuffle resource usage calculation; more accurate shuffle progress totals. (#55503, #55543)
  • Reduced Parquet metadata storage usage. (#54821)
  • Export API improvements: refresh dataset/operator state, sanitize metadata, and truncate exported metadata. (#55355, #55379, #55216, #54623)
  • Metrics & observability: task metric improvements, external-buffer block-count metric, row-based metrics, clearer operator names in logs, single debug log when aggregators are ready. (#55429, #55022, #54693, #52949, #54483)
  • Dashboard: added “Max Bytes to Read” panel/budget, panels for blocks-per-task and bytes-per-block, and streaming executor duration. (#55024, #55020, #54614)
  • Planner/execution & infra cleanups: ExecutionResources and StatsManager cleanup, planner interface refactor, node trackers init, removed ray.get in _MapWorker ctor, removed target_shuffle_max_block_size. (#54694, #55400, #55018, #54665, #54734, #55158)
  • Behavior/interop tweaks: map_batches defaults to row_modification=False and avoids pushing past limit; limited operator pushdown; prefetch for PandasJSONDatasource; use cloudpickle for Arrow tensor extension ser/des; bumped Arrow to 21.0; schema warning tone change. (#54992, #54457, #54667, #54831, #55426, #54630)
  • Removed randomize-blocks reorder rule for more stable behavior. (#55278)

🔨 Fixes:

  • AutoscalingActorPool now properly downscales after execution. (#55565)
  • StatsManager handles StatsActor loss on disconnect. (#55163)
  • Handle missing chunks key when Databricks UC query returns zero rows. (#54526)
  • Handle empty fragments in sampling when num_row_groups=0. (#54822)
  • Restored handling of PyExtensionType to keep compatibility with previously written datasets. (#55498)
  • Prevent negative resource budget when concurrency exceeds the global limit; fixed resource-manager log calculation. (#54986, #54878)
  • Default write_parquet warning removed; handled unhashable types in OneHotEncoding. (#54864, #54863)
  • Overwrite mode now maps to the correct Arrow behavior for parallel writes. (#55118)
  • Added back from_daft Arrow-version checks. (#54907)
  • Pandas chained in-place assignment warning resolved. (#54486)
  • Test stability/infra: fixed flaky tests, adjusted bounds and sizes, added additional release tests/chaos variants for image workloads, increased join test size, adjusted sorting release test to produce 1 GB blocks. (#55485, #55489, #54806, #55120, #54716, #55402, #54971)

📖 Documentation:

  • Added a user guide for aggregations. (#53568)
  • Added a code snippet in docs for partitioned writes. (#55002)
  • Updated links to Lance documentation. (#54836)

Ray Train

🎉 New Features:

  • Introduced JaxTrainer with SPMD support on TPUs (#55207)

💫 Enhancements:

  • ray.train.get_dataset_shard now lazily configures dataset sharding for better startup behavior (#55230)
  • Clearer worker error logging (#55222)
  • Fail fast when placement group requirements can never be satisfied (#54402)
  • New ControllerError surfaced and handled via failure policy for improved resiliency (#54801, #54833)
  • TrainStateActor periodically checks controller health and aborts when necessary (#53818)

🔨 Fixes:

  • Resolve circular import in ray.train.v2.lightning.lightning_utils (#55668)
  • Fix XGBoost v2 callback behavior (#54787)
  • Suppress a spurious type error (#50994)
  • Reduce test flakiness: remove randomness and bump a data-integration test size (#55315, #55633)

📖 Documentation:

  • New LightGBMTrainer user guide (#54492)
  • Fix code-snippet syntax highlighting (#54909)
  • Minor correction in experiment-tracking guide comment (#54605)

🏗 Architecture refactoring:

  • Public Train APIs routed through TrainFnUtils for consistency (#55226)
  • LoggingManager utility for Train logging (#55121)
  • Convert DEFAULT variables from strings to bools (#55581)

Ray Tune

🎉 New Features:

  • Add video FPS support to WandbLoggerCallback (#53638)

💫 Enhancements:

  • Typing: reset_config now explicitly returns bool (#54581)
  • CheckpointManager supports recording scoring metric only (#54642)

🔨 Fixes:

  • Fix XGBoost v2 callback integration (#54787)
  • Correct type for RunConfig.progress_reporter (#48439)

📖 Documentation:

Ray Serve

🎉 New Features:

  • Async inference support in Ray Serve (initial phase). Provides basic asynchronous inference execution, with follow-up work planned for failed/unprocessed queues and additional tests. #54824
  • Per-deployment custom autoscaling controls. Introduces AutoscalingContext and AutoscalingPolicy classes, enabling user-defined autoscaling strategies at the deployment level. #55253
  • Same event loop router. Adds option to run the Serve router in the same event loop as the proxy, yielding ~17% throughput improvement. #55030

💫 Enhancements:

  • Async get_current_servable_instance(). Converts the FastAPI dependency to async def, removing threadpool overhead and boosting performance: 35% higher RPS and reduced latency. #55457
  • Access log optimization. Cached contexts in request path logging improved request throughput by ~16% with lower average latency. #55166
  • Batching improvements. Default batch wait timeout increased from 0.0s to 0.01s (10ms) to enable meaningful batching. #55126
  • HTTP receive refactor. Cleaned up handling of replica-side HTTP receive tasks. #54543 / #54565
  • Configurable replica router backoff. Added knobs for retry/backoff control when routing to replicas. #54723
  • Autoscaling ergonomics. Marked per-deployment autoscaling metrics push interval config as deprecated for consistency. #55102
  • Health check & env var safety. Introduced warnings for invalid/zero/negative environment variable values, with migration path planned for Ray 2.50.0. #55464, #54944
  • Improved CLI UX. serve config now prints No configuration was found. instead of an empty string. #54767

🔨 Fixes:

  • Removed brittle ray._private dependency usage. #55659
  • HTTP route test fixes. Migrated to get_application_url() to avoid hardcoded URLs, reducing flakiness on Windows. #55623, #54974, #54924, #54911, #54704, #54903, #54882, #54877, #54631, #53933
  • Semaphore bug fix. Corrected race where more workers than allowed could acquire the semaphore. #55147
  • LongPollClient cancellation. Prevented spurious cancellation of listen_for_change. #54832
  • Backpressure error code. gRPC now returns RESOURCE_EXHAUSTED instead of UNAVAILABLE on overload. #54537
  • Logging improvements. Added request IDs to proxy access logs; avoided duplicate shutdown log lines. #54657, #54534
  • Test stability. Various waits, deflakes, and sync fixes across Serve tests. #54794, #54522, #54585

📖 Documentation:

  • Unexpected queuing behavior. Documented quirks in handle request queuing. #54542

🏗 Architecture refactoring:

  • Router/handle internals refactored for clarity and future feature expansion. #55635
  • Model composition benchmarks. Added benchmarking to track performance of common composition patterns. #55549
  • Constants refactor. Utility functions moved out of constants.py for better readability and stricter env var validation. #54944, #55464
  • Ray internals migration. Moved usage, ray_option_utils, and selected constants from _private to _common. #54915, #54578

Ray Serve/Data LLM

🎉 New Features:

  • Prefix cache-aware router with PrefixCacheAffinityRouter for optimized cache utilization. (#55218, #55588)
  • Reset prefix cache remote method for dynamic cache management. (#55658)
  • LMCacheConnectorV1 support for kv_transfer_config to enhance key-value transfer configurations. (#54579)
  • LLMServer and LLMEngine major refactor for 100% vLLM serve frontend compatibility. (#54554)

💫 Enhancements:

  • vLLM engine upgrade to version 0.10.0 with improved performance and compatibility. (#55067)
  • Enhanced error handling for invalid model_id parameters with clearer error messages. (#55589)
  • Improved telemetry handling with better race condition management for push operations. (#55558)
  • Optimized deployment defaults with better configuration values to prevent bottlenecks. (#54696)
  • LoRA workflow improvements with refactored downloading and utility functions. (#54946)
  • LLMServer refactor to synchronous initialization for better reliability. (#54835)
  • Mistral tokenizer support for tekken tokenizer compatibility. (#54666)
  • Smart batching logic that skips batching when batch_interval_ms == 0. (#54751)
  • Dashboard enhancements with improved LLM metrics and monitoring capabilities. (#54797)

🔨 Fixes:

  • Pyright linting corrections for Ray Serve LLM examples. (#55284)
  • Test stability improvements for DeepSeek model and vLLM engine processor tests. (#55401, #55120)
  • Serialization fixes for ChatCompletionRequest tool_calls ValidatorIterator objects. (#55538)

📖 Documentation:

  • Prefix cache router documentation with comprehensive usage examples. (#55218)
  • Multi-LoRA documentation improvements with clearer setup instructions. (#54788)
  • STRICT_PACK strategy FAQ documentation explaining data.llm packing behavior. (#55505)

🏗 Architecture refactoring:

  • Docker image optimizations with UCX and NCCL updates, plus GKE GPU operator compatibility paths. (#54598, #55206)

RLlib

🎉 New Features:

  • Implemented Implicit Q-Learning (IQL). (#55304, #55422)
  • DreamerV3 is now available in PyTorch. (#45463, #55140)
  • Discrete actions support for SAC. (#53982)

💫 Enhancements:

  • Upgraded RLlink protocol for external env/simulator training. (#53550)
  • Performance improvements in Offline RL API through switching to iter_torch_batches. (#54277)
  • Added an example for curriculum learning in Atari Pong. (#55304)

🔨 Fixes:

  • Corrected TensorType handling. (#55694)
  • Fixed a bug with multi-learner setups in Offline RL API. (#55693)
  • Addressed ImportError in Atari examples. (#54967)
  • Fixed some bugs in the docs for IQL and CQL. (#55614)
  • Increased default timesteps on two experiments. (#54185)
  • Fixed TorchMultiCategorical.to_deterministic when having different number of categories and logits with time dimension. (#54414)
  • Added missing documentation for SACConfig's training(). (#53918)
  • Fixed bug in restore_from_path such that connector states are also restored on remote EnvRunners. (#54672)
  • Fixed missing support for config.count_steps_by = "agent_steps". (#54885)
  • Added missing colon to CUBLAS_WORKSPACE_CONFIG. (#53913)
  • Removed rllib_contrib completely from RLlib. (#55182)

🏗 Architecture refactoring:

  • Deprecated TensorFlow support from new API stack. (#55042)
  • Deprecated input/output specs from RLModule. (#55141)
  • Deprecated --enable-new-api-stack flag from all scripts. (#54853, #54702)

Ray Core

🎉 New Features:

💫 Enhancements:

🔨 Fixes:

📖 Documentation:

  • Added guide on using type hints with Ray Core. (#55013)

🏗 Architecture refactoring:

  • Migrate metric collection from opencensus to opentelemetry (#53098, #53740)

Dashboard

💫 Enhancements:

  • Grafana: new Operator filter for Data; Prometheus adds RayNodeType label on for nodes. (#55493, #55192)

🔨 Fixes:

  • Removed references to a deleted Data metrics panel. (#55478)

Ray Images

🎉 New Features:

💫 Enhancements:

  • Upgraded protobuf to v4 (#54496)

Docs

💫 Enhancements:

  • KubeRay docs: added InteractiveMode quick-start details; expanded Core type-hints guidance; Serve LLM example coverage; Data LLM batching FAQ (#55570, #55284)

🔨 Fixes:

  • Various formatting/mis-highlighting and lints across Train/Tune/Serve LLM docs. (#55284, #54763)

Thanks!

Thank you to everyone who contributed to this release!
@pavitrabhalla, @Daraan, @Sparks0219, @daiping8, @abrarsheikh, @sven1977, @Toshaksha, @bveeramani, @MengjinYan, @GokuMohandas, @codope, @nadongjun, @SolitaryThinker, @matthewdeng, @elliot-barn, @isimluk, @avibasnet31, @OneSizeFitsQuorum, @Future-Outlier, @marosset, @jackfrancis, @kshanmol, @eicherseiji, @dayshah, @iamjustinhsu, @Qiaolin-Yu, @goutamvenkat-anyscale, @Yicheng-Lu-llll, @yantarou, @rclough, @zcin, @NeilGirdhar, @VarunBhandary, @400Ping, @akshay-anyscale, @vickytsang, @xushiyan, @JasonLi1909, @n-elia, @simonsays1980, @dragongu, @Kishanthan, @ruisearch42, @jectpro7, @TimothySeah, @liulehui, @rueian, @HollowMan6, @akyang-anyscale, @axreldable, @czgdp1807, @alanwguo, @justinvyu, @ok-scale, @my-vegetable-has-exploded, @landscapepainter, @fscnick, @machichima, @mpashkovskii, @ZacAttack, @gvspraveen, @sword865, @lmsh7, @Ziy1-Tan, @rebel-scottlee, @sampan-s-nayak, @coqian, @can-anyscale, @Bye-legumes, @win5923, @MortalHappiness, @angelinalg, @khluu, @aslonnie, @krishnakalyan3, @minosvasilias, @x-tong, @xinyuangui2, @raulchen, @Yangruipis, @edoakes, @kevin85421, @wingkitlee0, @Fokko, @cristianjd, @srinathk10, @owenowenisme, @JoshKarpel, @MengqingCao, @leopardracer, @westonpace, @LeslieWongCV, @VassilisVassiliadis, @crypdick, @alexeykudinkin, @mjacar, @kunling-anyscale, @saihaj, @kouroshHakha, @ema-pe, @markjm, @avigyabb, @dshepelev15, @mauvilsa, @omatthew98, @nrghosh, @ryanaoleary, @Aydin-ab, @lk-chen, @stephanie-wang, @harshit-anyscale, @jjyao, @bullgom, @Yevet, @israbbani

Don't miss a new ray release

NewReleases is sending notifications on new releases.