Highlights
- This release features new modules in Ray Serve and Ray Data for integration with large language models, marking the first step of addressing #50639. Existing Ray Data and Ray Serve have limited support for LLM deployments, where users have to manually configure and manage the underlying LLM engine. In this release, we offer APIs for both batch inference and serving of LLMs within Ray in
ray.data.llm
andray.serve.llm
. See the below notes for more details. - Ray Train V2 is available to try starting in Ray 2.43! Run your next Ray Train job with the
RAY_TRAIN_V2_ENABLED=1
environment variable. See the migration guide for more information. - A new integration with
uv run
that allows easily specifying Python dependencies for both driver and workers in a consistent way and enables quick iterations for development of Ray applications (#50160, 50462), check out our blog post
Ray Libraries
Ray Data
🎉 New Features:
- Ray Data LLM: We are introducing a new module in Ray Data for batch inference with LLMs. It offers a new
Processor
abstraction that interoperates with existing Ray Data pipelines. This abstraction can be configured two ways:- Using the
vLLMEngineProcessorConfig
, which configures vLLM to load model replicas for high throughput model inference - Using the
HttpRequestProcessorConfig
, which sends HTTP requests to an OpenAI-compatible endpoint for inference. - Documentation for these features can be found here.
- Using the
- Implement accurate memory accounting for
UnionOperator
(#50436) - Implement accurate memory accounting for all-to-all operations (#50290)
💫 Enhancements:
- Support class constructor args for filter() (#50245)
- Persist ParquetDatasource metadata. (#50332)
- Rebasing
ShufflingBatcher
ontotry_combine_chunked_columns
(#50296) - Improve warning message if required dependency isn't installed (#50464)
- Move data-related test logic out of core tests directory (#50482)
- Pass executor as an argument to ExecutionCallback (#50165)
- Add operator id info to task+actor (#50323)
- Abstracting common methods, removing duplication in
ArrowBlockAccessor
,PandasBlockAccessor
(#50498) - Warn if map UDF is too large (#50611)
- Replace
AggregateFn
withAggregateFnV2
, cleaning up Aggregation infrastructure (#50585) - Simplify Operator.repr (#50620)
- Adding in
TaskDurationStats
andon_execution_step
callback (#50766) - Print Resource Manager stats in release tests (#50801)
🔨 Fixes:
- Fix invalid escape sequences in
grouped_data.py
docstrings (#50392) - Deflake
test_map_batches_async_generator
(#50459) - Avoid memory leak with
pyarrow.infer_type
on datetime arrays (#50403) - Fix parquet partition cols to support tensors types (#50591)
- Fixing aggregation protocol to be appropriately associative (#50757)
📖 Documentation:
- Remove "Stable Diffusion Batch Prediction with Ray Data" example (#50460)
Ray Train
🎉 New Features:
- Ray Train V2 is available to try starting in Ray 2.43! Run your next Ray Train job with the
RAY_TRAIN_V2_ENABLED=1
environment variable. See the migration guide for more information.
💫 Enhancements:
- Add a training ingest benchmark release test (#50019, #50299) with a fault tolerance variant (#50399)
- Add telemetry for Trainer usage in V2 (#50321)
- Add pydantic as a
ray[train]
extra install (#46682) - Add state tracking to train v2 to make run status, run attempts, and training worker metadata observable (#50515)
🔨 Fixes:
- Increase doc test parallelism (#50326)
- Disable TF test for py312 (#50382)
- Increase test timeout to deflake (#50796)
📖 Documentation:
- Add missing xgboost pip install in example (#50232)
🏗 Architecture refactoring:
- Add deprecation warnings pointing to a migration guide for Ray Train V2 (#49455, #50101, #50322)
- Refactor internal Train controller state management (#50113, #50181, #50388)
Ray Tune
🔨 Fixes:
- Fix worker node failure test (#50109)
📖 Documentation:
- Update all doc examples off of ray.train imports (#50458)
- Update all ray/tune/examples off of ray.train imports (#50435)
- Fix typos in persistent storage guide (#50127)
- Remove Binder notebook links in Ray Tune docs (#50621)
🏗 Architecture refactoring:
- Update RLlib to use ray.tune imports instead of ray.air and ray.train (#49895)
Ray Serve
🎉 New Features:
- Ray Serve LLM: We are introducing a new module in Ray Serve to easily integrate open source LLMs in your Ray Serve deployment. This opens up a powerful capability of composing complex applications with multiple LLMs, which is a use case in emerging applications like agentic workflows. Ray Serve LLM offers a couple core components, including:
VLLMService
: A prebuilt deployment that offers a full-featured vLLM engine integration, with support for features such as LoRA multiplexing and multimodal language models.LLMRouter
: An out-of-the-box OpenAI compatible model router that can route across multiple LLM deployments.- Documentation can be found at https://docs.ray.io/en/releases-2.43.0/serve/llm/overview.html
💫 Enhancements:
- Add
required_resources
to REST API (#50058)
🔨 Fixes:
- Fix batched requests hanging after cancellation (#50054)
- Properly propagate backpressure error (#50311)
RLlib
🎉 New Features:
- Added env vectorization support for multi-agent (new API stack). (#50437)
💫 Enhancements:
- APPO/IMPALA various acceleration efforts. Reached 100k ts/sec on Atari benchmark with 400 EnvRunners and 16 (multi-node) GPU Learners: #50760, #50162, #50249, #50353, #50368, #50379, #50440, #50477, #50527, #50528, #50600, #50309
- Offline RL:
🔨 Fixes:
- Fix SPOT preemption tolerance for large AlgorithmConfig: Pass by reference to RolloutWorker (#50688)
on_workers/env_runners_recreated
callback would be called twice. (#50172)default_resource_request
: aggregator actors missing in placement group for local Learner. (#50219, #50475)
📖 Documentation:
- Docs re-do (new API stack):
Ray Core and Ray Clusters
Ray Core
💫 Enhancements:
- [Core] Enable users to configure python standard log attributes for structured logging (#49871)
- [Core] Prestart worker with runtime env (#49994)
- [compiled graphs] Support experimental_compile(_default_communicator=comm) (#50023)
- [Core] ray.util.Queue Empty and Full exceptions extend queue.Empty and Full (#50261)
- [Core] Initial port of Ray to Python 3.13 (#47984)
🔨 Fixes:
- [Core] Ignore stale ReportWorkerBacklogRequest (#50280)
- [Core] Fix check failure due to negative available resource (#50517)
Ray Clusters
📖 Documentation:
- Update the KubeRay docs to v1.3.0.
Ray Dashboard
🎉 New Features:
- Additional filters for job list page (#50283)
Thanks
Thank you to everyone who contributed to this release! 🥳
@liuxsh9, @justinrmiller, @CheyuWu, @400Ping, @scottsun94, @bveeramani, @bhmiller, @tylerfreckmann, @hefeiyun, @pcmoritz, @matthewdeng, @dentiny, @erictang000, @gvspraveen, @simonsays1980, @aslonnie, @shorbaji, @LeoLiao123, @justinvyu, @israbbani, @zcin, @ruisearch42, @khluu, @kouroshHakha, @sijieamoy, @SergeCroise, @raulchen, @anson627, @bluenote10, @allenyin55, @martinbomio, @rueian, @rynewang, @owenowenisme, @Betula-L, @alexeykudinkin, @crypdick, @jujipotle, @saihaj, @EricWiener, @kevin85421, @MengjinYan, @chris-ray-zhang, @SumanthRH, @chiayi, @comaniac, @angelinalg, @kenchung285, @tanmaychimurkar, @andrewsykim, @MortalHappiness, @sven1977, @richardliaw, @omatthew98, @fscnick, @akyang-anyscale, @cristianjd, @Jay-ju, @spencer-p, @win5923, @wxsms, @stfp, @letaoj, @JDarDagran, @jjyao, @srinathk10, @edoakes, @vincent0426, @dayshah, @davidxia, @DmitriGekhtman, @GeneDer, @HYLcool, @gameofby, @can-anyscale, @ryanaoleary, @eddyxu