github ray-project/kuberay v1.5.0

11 hours ago

Highlights

Ray Label Selector API

Ray v2.49 introduced a label selector API. Correspondingly, KubeRay v1.5 now features a top-level API for defining Ray labels and resources. This new top-level API is the preferred method going forward, replacing the previous practice of setting labels and custom resources within rayStartParams.

The new API will be consumed by the Ray autoscaler, improving autoscaling decisions based on task and actor label selectors. Furthermore, labels configured through this API are mirrored directly into the Pods. This mirroring allows users to more seamlessly combine Ray label selectors with standard Kubernetes label selectors when managing and interacting with their Ray clusters.

You can use the new API in the following way:

apiVersion: ray.io/v1
kind: RayCluster
spec:
  ...
  headGroupSpec:
    rayStartParams: {}
    resources:
      Custom1: "1"
    labels:
      ray.io/zone: us-west-2a
      ray.io/region: us-west-2
  workerGroupSpec:
  - replicas: 1
    rayStartParams: {}
    resources:
      Custom1: "1"
    labels:
      ray.io/zone: us-west-2a
      ray.io/region: us-west-2

RayJob Sidecar submission mode

The RayJob resource now supports a new value for spec.submissionMode called SidecarMode.
Sidecar mode directly addresses a key limitation in both K8sJobMode and HttpMode: the network connectivity requirement from an external Pod or the KubeRay operator for job submission. With Sidecar mode, job submission is orchestrated by injecting a sidecar container into the Head Pod. This solution eliminates the need for an external client to handle the submission process and reduces job submission failure due to network failures.

To use this feature, set spec.submissionMode to SidecarMode in your RayJob:

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: my-rayjob
spec:
  submissionMode: "SidecarMode"
  ...

Advanced deletion policies for RayJob

KubeRay now supports a more advanced and flexible API for expressing deletion policies within the RayJob specification. This new design moves beyond the singular boolean field, spec.shutdownAfterJobFinishes, and allows users to define different cleanup strategies using configurable TTL values based on the Ray job's status.

This API unlocks new use cases that require specific resource retention after a job completes or fails. For example, users can now implement policies that:

  • Preserve only the Head Pod for a set duration after job failure to facilitate debugging.
  • Retain the entire Ray Cluster for a longer TTL after a successful run for post-analysis or data retrieval.

By linking specific TTLs to Ray job statuses (e.g., success, failure) and strategies (e.g. DeleteWorkers, DeleteCluster, DeleteSelf), users gain fine-grained control over resource cleanup and cost management.

Below is an example of how to use this new, flexible API structure:

apiVersion: ray.io/v1
kind: RayJob
metadata:
  name: rayjob-deletion-rules
spec:
  deletionStrategy:
    deletionRules:
    - policy: DeleteWorkers
      condition:
        jobStatus: FAILED
        ttlSeconds: 100
    - policy: DeleteCluster
      condition:
        jobStatus: FAILED
        ttlSeconds: 600
    - policy: DeleteCluster
      condition:
        jobStatus: SUCCEEDED
        ttlSeconds: 0

This feature is disabled by default and requires enabling the RayJobDeletionPolicy feature gate.

Incremental upgrade support for RayService

KubeRay v1.5 introduces the capability to enable zero-downtime incremental upgrades for RayServices. This new feature improves the upgrade process by leveraging the Gateway API and Ray autoscaling to incrementally migrate user traffic from the existing Ray cluster to the newly upgraded one.

This approach is more efficient and reliable compared to the former mechanism. The previous method required creating the upgraded Ray cluster at its full capacity and then shifting all traffic at once, which could lead to disruptions and unnecessary resource usage. By contrast, the incremental approach gradually scales up the new cluster and migrates traffic in smaller, controlled steps, resulting in improved stability and resource utilization during upgrade.

To enable this feature, set the following fields in RayService:

apiVersion: ray.io/v1
kind: RayService
metadata:
  name: example-rayservice
spec:
  upgradeStrategy:
    type: "NewClusterWithIncrementalUpgrade"
    clusterUpgradeOptions:
      maxSurgePercent: 40 
      stepSizePercent: 5  
      intervalSeconds: 10
      gatewayClassName: "cluster-gateway"

This feature is disabled by default and requires enabling the RayServiceIncrementalUpgrade feature gate.

Improved multi-host support for RayCluster

Previous KubeRay versions supported multi-host worker groups via the numOfHosts API, but this capability lacked fundamental capabilities required for managing multi-host accelerators. Firstly, it lacked logical grouping of worker Pods belonging to the same multi-host unit (or slice). As a result, it was not possible to run operations like “replace all workers in this group”. In addition, there was no ordered indexing, which is often required for coordinating multi-host workers when using TPUs.

When using multi-host in KubeRay v1.5, KubeRay will automatically set the following labels for multi-host Ray workers:

labels:
  ray.io/worker-group-replica-name: tpu-group-af03de
  ray.io/worker-group-replica-index: 0
  ray.io/replica-host-index: 1

Below is a description of each label and its purpose:

  • ray.io/worker-group-replica-name: this label provides a unique identifier for each replica (i.e. host group or slice) in a worker group. The label enables KubeRay to rediscover all other pods in the same group and apply group operators.
  • ray.io/worker-group-replica-index: this label is an ordered replica index in the worker group. This label is particularly important for cases like multi-slice TPUs, where each slice must be aware of its slice index.
  • ray.io/replica-host-index: this label is an ordered host index per replica (host group or slice).

These changes collectively enable reliable, production-level scaling and management of multi-host GPU workers or TPU slices.

This feature is disabled by default and requires enabling the RayMultiHostIndexing feature gate.

Breaking Changes

For RayCluster objects created by a RayJob, KubeRay will no longer attempt to recreate the Head Pod if it fails or is deleted after its initial successful provisioning. To retry failed jobs, use spec.backoffLimit which will result in KubeRay provisioning a new RayCluster.

CHANGELOG

Don't miss a new kuberay release

NewReleases is sending notifications on new releases.