github ray-project/kuberay v1.5.0-rc.0

one day ago

Changelog

  • d59cbd8 Fix rayClusterScaleExpectation deletion to use request object when instance is nil (#4039)
  • 480e128 Inject the --block option to ray start command automatically (#932)
  • 7850773 Remove ray-cluster.without-block.yaml (#675)
  • 38ac168 [Telemetry] Inject env identifying KubeRay. #562
  • 97425fa AGC gateway api example (#4076)
  • dc203e2 Add DeepSeek example RayService (#3838)
  • 7c0aa63 Add FAQ page (#1150)
  • dcb97ce Add Grafana Dashboard for KubeRay Operator (#3676)
  • bf7e497 Add Helm chart unit tests to ray-cluster (#3374)
  • 113909f Add Helm chart unittests to CI (#3280)
  • d0e8b57 Add KubeRay e2e Test for custom idleTimeoutSeconds with v2 Autoscaler (#2725)
  • 6818a08 Add KubeRay related blogs (#1147)
  • 042e6b4 Add NumOfHosts to RayCluster helm-chart template (#1969)
  • 5adc91a Add NumOfHosts to WorkerGroupSpec (CRD change only) (#1834)
  • 80a6d58 Add Ray cluster spec for TPU pods (#1292)
  • abb5291 Add RayCluster YAML for verl example (#3833)
  • f232b5b Add RayClusterProvisioned Condition Type (#2301)
  • cbaf5d7 Add RayClusterReady Condition Type (#2271)
  • 732453e Add RayJob training example using pytorch resnet image classifier (#2107)
  • a0afea2 Add RayService Manifests for Stable Diffusion TPU Examples (#2198)
  • 753dc05 Add RayService sample test (#1377)
  • c835117 Add TPU to Known Custom Accelerators for generated rayStartCommand (#2495)
  • f05fa2e Add ray.io/originated-from labels (#1830)
  • 87407ac Add a document for profiling (#1299)
  • 08792ca Add a document to outline the default settings for rayStartParams in Kuberay (#1057)
  • b92d95a Add a flag to enable/disable worker init container injection (#1069)
  • 8851088 Add a grouping for 'google.golang.org/*' to avoid inconsistency between sub-projects (#3470)
  • ceb9f01 Add a sample RayJob to fine-tune a PyTorch lightning text classifier (#1891)
  • a851490 Add a test util function for killing the head Pod and wait (#3890)
  • 073de1f Add a util function to convert string and bytes array (#2621)
  • 3e20a9d Add a variant of the ray data processing job with GCSFuse CSI driver (#2401)
  • 7fe4050 Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. (#1102)
  • 8b61b73 Add all and worker node type to kubectl ray log (#2442)
  • 966d9b3 Add apply configurations to generated client (#1818)
  • ef0129e Add basic Helm chart unittests for kuberay-operator (#3253)
  • cd239ab Add basic e2e test for kubectl plugin (#2287)
  • e1edb4c Add batch-scheduler option, deprecate enable-batch-scheduler option (#2300)
  • e330c03 Add common containerEnv section to Helm Chart (#1932)
  • 70ef243 Add consistency check for deepcopy generated files (#1127)
  • 36267ed Add dashboard component to master (#3566)
  • cb2914d Add deletecollection for multi-namespace role (#2) (#2231)
  • 8bb3222 Add dependabot.yml for enabling "Dependabot version updates" (#3357)
  • 9a0b9d0 Add dnsConfig to head, worker and additional workers (#2377)
  • 80ce664 Add documentation for API Server monitoring (#1479)
  • 0bd28e7 Add documentations for the release process of Helm charts (#723)
  • 7f3fe8b Add e2e KubeRay operator upgrade test (#3060)
  • e9f3155 Add e2e test for kubectl ray job submit (#2614)
  • 4fc48ce Add e2e test make sure resource quota error is surfaced (#3087)
  • ff45923 Add end to end tests to apiserver (#1460)
  • 4b0f7cb Add env and patch permission. (#740)
  • 1bcfa9e Add env variable comment to kuberay-operator
  • d93c3c9 Add example and tutorial to explain how to create custom metrics for Prometheus (#914)
  • a34a42a Add flag leader-election-namespace (#1624)
  • a2ebc61 Add gofumpt instructions from internal doc (#1180)
  • 044008d Add instruction to skip unit tests in DEVELOPMENT.md (#1171)
  • abafd17 Add kubectl plugin with basic command and deprecate cli (#2243)
  • 3e68606 Add kubectl ray cluster log command (#2296)
  • 12babc8 Add kubectl ray create cluster (#2607)
  • 61a282f Add kubectl ray delete rayservice/job/cluster (#2635)
  • 8c64e60 Add kubectl-plugin pre-commit (#2255)
  • 11c75ea Add kuberay operator servicemonitor (#3717)
  • 25d5568 Add kubernetes dependency in python client library (#998)
  • c8f826b Add kubernetes event to inform user of upgrade strategy (#2592)
  • 106f8fd Add missing labels on RayCluster TPU manifests (#1987)
  • 4a12d78 Add more grouping to resolve inconsistencies when bumping versions (#3554)
  • 7856027 Add rayVersion in the RayCluster chart (#975)
  • 4021766 Add rayjob yaml generation to ray job submit command (#2644)
  • d22d752 Add release command and guidance for KubeRay cli (#834)
  • e9544fc Add reminders to avoid RBAC synchronization bug (#576)
  • 08da595 Add seccompProfile to KubeRay operator deployment for PSS compliance (#3931)
  • 522807d Add seccompProfile.type=RuntimeDefault to kuberay-operator. (#1955)
  • b5b4232 Add structured config and default sidecar container configuration (#1822)
  • 224a444 Add support for openshift routes (#1183)
  • 43ed246 Add support for parsing neuron core resource limit and pass it as ray… (#2409)
  • 2de3fe5 Add support for pvcs to apiserver (#1118)
  • 3cc6116 Add support for tolerations, env, annotations and labels (#1070)
  • aeba37e Add test for autoscaler and its desired state (#2601)
  • 76633c5 Add test for configurable k8s job backoff limit (#2134)
  • 865affa Add tools and docs for changelog generator (#833)
  • e36183d Add top-level Labels and Resources Structed fields to HeadGroupSpec and WorkerGroupSpec (#4106)
  • 36102a0 Add topology spread constraints test for RayCluster (#2472)
  • 658bd9e Add unit test for cluster get and add steps in workflows (#2263)
  • e6722b0 Add v4 TPU manifests samples (#1968)
  • 33ccc9a Add v6e TPU Ray CR Manifests (#2445)
  • b227924 Add vLLM TPU example RayService manifest (#3000)
  • f8ed876 Add validating webhook (#1584)
  • ecd6eca Add validation for RAY_enable_autoscaler_v2 environment variable (#3963)
  • d950d59 Add volcano taskSpec annotations to pod (#1754)
  • 925effe Add workerGroupSpec.idleTimeoutSeconds to v1 RayCluster CRD (#2558)
  • 4e1454e Added Pod securityContext value to Helm charts (#2160)
  • 1f728c5 Added Python API server client (#1561)
  • 280902f Added Ray-Serve Config For LLMs (#3517)
  • bc90674 Added security to the API server (#1677)
  • ccd88cc Added support for ephemeral volumes and ingress creation support (#1409)
  • 803374e Adding API server support for service account (#1148)
  • 9af8215 Adding a test for the document for the Pod security standard (#866)
  • 6e4ac23 Adding capability to create ray cluster with serve support -clean (#1672)
  • d10103d Adding example of manually setting up NGINX Ingress (#699)
  • 61adf56 Align Init Container's ImagePullPolicy with Ray Container's ImagePullPolicy (#1080)
  • 761559e Align RayJob's ManagedBy with RayCluster's ManagedBy. (#2630)
  • 584da5a Alkanso/python client (#901)
  • 59d703f Allow E2E tests to run with arbitrary k8s cluster (#1306)
  • c857ca4 Allow annotations in ray cluster helm chart (#574)
  • 847585d Allow app.kubernetes.io/component to be overriden (#3198)
  • 828afba Allow configuration of restartPolicy (#2197)
  • 153f35c Allow manually creating init containers in Kuberay helm charts (#1287)
  • ff66bcb Allow to install and remove operator via scripts (#1545)
  • 4892ac1 Api server makefile (#1301)
  • f0b5ea4 Api server refactor/allow multiple job statuses in jobe2e (#3363)
  • 7de5f10 Api server refactor/allow multiple job statuses in servicee2e (#3375)
  • 6901e4d Best practice for fault-tolerant redis with kuberay (#2684)
  • be4f988 Build Headless Service for Multi-Host TPU Worker Pods (#1920)
  • 51b64f6 Buildkite autoscaler e2e (#2199)
  • 6c235d8 Bump @babel/runtime from 7.24.1 to 7.27.1 in /dashboard (#3591)
  • 530318b Bump Kubernetes dependencies to v0.34.x (#4147)
  • 91245ad Bump braces from 3.0.2 to 3.0.3 in /dashboard (#3590)
  • 3858146 Bump crd-ref-docs to v0.2.0 for Go 1.24+ compatibility (#4029)
  • 410e8fb Bump github.com/Masterminds/semver/v3 in /ray-operator (#3500)
  • 168dd43 Bump github.com/emicklei/go-restful in /ray-operator (#1348)
  • 2590a0b Bump github.com/jarcoal/httpmock from 1.2.0 to 1.4.0 in /ray-operator (#3536)
  • 196f959 Bump github.com/onsi/gomega from 1.36.2 to 1.37.0 in /apiserver (#3475)
  • 00b4b14 Bump github.com/prometheus/client_golang in /apiserver (#3394)
  • 2a20425 Bump github.com/rs/zerolog from 1.33.0 to 1.34.0 in /apiserver (#3393)
  • 4a4471a Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 in /kubectl-plugin (#3499)
  • 587c6ff Bump go to 1.22.4 to fix ray-operator vulnerabilities (#2325)
  • 00c926e Bump go.mongodb.org/mongo-driver from 1.3.4 to 1.5.1 in /apiserver (#1407)
  • f6a5c73 Bump golang.org/x/net from 0.14.0 to 0.17.0 in /experimental (#1701)
  • 7e72627 Bump golang.org/x/net from 0.26.0 to 0.33.0 in /proto (#2723)
  • 3a6aac4 Bump golang.org/x/net from 0.33.0 to 0.38.0 in /experimental (#3407)
  • 4a3a373 Bump golang.org/x/net in /cli (#1405)
  • a8f730e Bump golang.org/x/net in /proto (#1345)
  • aafe2e0 Bump golang.org/x/net to v0.33.0 fix upstream vulnerability (#2799)
  • 26deb40 Bump golang.org/x/sys in /cli (#1347)
  • 2292e61 Bump golang.org/x/sys in /proto (#1346)
  • 53b7026 Bump golang.org/x/text from 0.3.5 to 0.3.8 in /proto (#1344)
  • a9255ce Bump google.golang.org/grpc from 1.64.0 to 1.64.1 in /cli (#2229)
  • 8bdd7de Bump google.golang.org/grpc from 1.64.0 to 1.64.1 in /experimental (#2248)
  • 0d16293 Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /cli (#1993)
  • 7d49b26 Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /experimental (#1992)
  • 6671427 Bump google.golang.org/protobuf from 1.34.2 to 1.36.6 in /experimental (#3395)
  • 8778327 Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 in /apiserver (#3391)
  • f605b6c Bump nanoid from 3.3.7 to 3.3.11 in /dashboard (#3589)
  • 05b77e1 Bump next from 15.2.3 to 15.2.4 in /dashboard (#3709)
  • 1c07bc1 Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.20.4 in /apiserver (#3392)
  • 06ccd09 Bump the golangci-lint version in the api server makefile (#1342)
  • 16e44d3 Bump the google-golang group across 5 directories with 3 updates (#3493)
  • 102d9e9 Bump the kubernetes group across 3 directories with 9 updates (#3390)
  • 8fcfb9d Bump tj-actions/verify-changed-files in /.github/workflows (#1795)
  • 3738f78 CVE fix - Upgrade golang.org/x/net (#2081)
  • df0565a Change Kuberay operator Deployment strategy type to Recreate (#566)
  • 085dbb5 Change the rules in role.yaml and multiple_namespaces_role.yaml to use the same template in _helpers.tpl to ensure consistency. (#2244)
  • 7c6aedf Changes required make a build after update of component-base (#3004)
  • bf3fd63 Check existing pods for suspended RayCluster before calling DeleteCollection (#1745)
  • db42cc5 Chore: fix indentation issues in RayJob sample YAML (#3874)
  • 046d4c4 Clean up WorkersToDelete field during the CI test (#1763)
  • 8282e6b Configuration Test Framework Prototype (#605)
  • f52b8bc Connect Ray client with TLS using Nginx Ingress on Kind cluster (#1051)
  • 86506d6 Convert byte slice and string without copy (#2628)
  • 1c6c4ae Correct sumGPUs to include MIGs in count (#3933)
  • cce10a6 Cross-reference docs. (#703)
  • 4b5085f Customize the Prometheus export port (#954)
  • 561a098 Delete [raycluster|rayjob|rayservice]_types_test.go unnecessary tests (#2935)
  • 3a7a17f Delete ray_v1alpha1_rayjob.batch-inference.yaml (#1360)
  • 7bc9c94 Dependencies: Upgrade golang.org/x packages (#1281)
  • 9831375 Deprecate Kuberay CLI for Ray Kubectl plugin (#2246)
  • 197fcc2 Do not update pod labels if they haven't changed (#1304)
  • 1cbac51 Documentation and example for running simple NLP service on kuberay (#1340)
  • 2ac9c44 Don't print redundant time unit in the log message (#2335)
  • ee0a895 Don’t assign the rayv1.Failed to the State field (#2258)
  • cf1c6f7 Downgrade kind from to v0.20.0 to v0.11.1 (#1313)
  • 33ba385 Drop unused configmaps/status permission + configurable binary path (#2478)
  • d1d9e29 Enable test framework to install operator with custom config and put operator in a namespace with enforced PSS in security testing (#876)
  • efed875 Enhancements to e2e test, adding Autoscaling (#1765)
  • dbcc686 Ensure all temp files are deleted after the compatibility test (#886)
  • 6490749 Ensure container ports without names are also included in the head node service (#891)
  • e93ebcc Example Pod to connect Ray client to remote a Ray cluster with TLS enabled (#994)
  • 2d52001 Example RayCluster spec with Labels and label_selector API (#4136)
  • 28d07c9 Expose entire head pod Service to the user (#1040)
  • c610f70 Expose security context in helm chart. (#773)
  • e430a93 Exposing Serve Service (#1117)
  • dc17fb4 Exposing min/max replica counts for default worker group (#1963)
  • ba50bfa Fall back to CPU requests if limit is not specified (#2365)
  • f6b4f17 Feature/cron scheduling rayjob 2426 (#3836)
  • 35fe6f9 Fix CI (#1145)
  • 271b25d Fix FromAsCasing warning. (#2830)
  • f5fb7d4 Fix Log to indicate we are Using DashboardPort in RayService (#2001)
  • 1ced2b9 Fix RayCluster auth sample to include --config-file in kube-rbac-proxy (#2604)
  • bc0562d Fix apiserver linter (#3296)
  • 348ef38 Fix broken link in documentation (#3697)
  • 7e21b5d Fix duplicated volume issue (#690)
  • 389ba00 Fix finalizer typo and re-create manifests (#631)
  • c9665df Fix for Sample YAML Config Test - 2.4.0 Failure due to 'suspend' Field (#1096)
  • 4182477 Fix for deprecate-cli deploy error (#2251)
  • 3571d52 Fix in HeadPod Service Generation logic which was causing frequent reconciliation (#1056)
  • 0d848f9 Fix incorrect comment in raycluster_controller.go (#3003)
  • 5769a65 Fix issue where unescaped semicolons caused task execution failures. (#3691)
  • 19ddf04 Fix issue with head pod not monitered by Prometheus under certain condition (#963)
  • 3c53af6 Fix issue with operator OOM restart (#946)
  • 7dcdb26 Fix light weight job submitter e2e flaky test (#4092)
  • 5bf70e8 Fix logging issue for FetchHeadServiceURL (#2216)
  • 648d841 Fix misconfiguration. (#602)
  • 9265154 Fix mkDocs (#1448)
  • ed4b75c Fix ray nightly image env var setup (#3826)
  • 2b8947c Fix release actions (#1323)
  • 422098d Fix typo (#1232)
  • 8b2acf5 Fix typo (#1241)
  • 9404492 Fix typo in DEVELOPMENT.md (#1698)
  • ed7f3db Fix upgrade gomega (#3483)
  • 047699f Fix v6e TPU Scripts and RayJob CRs (#2447)
  • 242c7b9 Fix versioning in sample manifests (#1857)
  • 8d25e9d Fix/make helm and kustomize consistent (#2624)
  • 321f985 Fix: Helm lint and test CI failed (#3505)
  • cc6b7ba Fix: Typo (#1295)
  • 86f896b Fixed download URL for Helm chart (#573)
  • 4432b78 Fixed processing of job submitter (#1562)
  • 1ec290f Fixed the issue with jobSubmitter resources (#1676)
  • 2c26cff Fixes to shorten generated Route name with consideration for namespace (#1883)
  • 91361e3 Fixing Python client handling of env from (#1845)
  • 5c9db54 Flip Min and max replicas for apiserver workerNodeSpec (#1638)
  • 4772827 Follow up 3992: Remove logs and add comments (#4006)
  • e7f0c2c Generate RayCluster Hash on KubeRay Version Change (#2320)
  • cb86f9f Get details of only declarative serve apps (#4084)
  • de675e0 Handle nil HostPath type in GetVolumeHostPathType and add unit tests. (#3965)
  • 2d38f51 Helm chart ray-cluster template reference fix (#1469)
  • 607ac1f Helm: add service type configuration to head group for ray-cluster (#614)
  • bdbf379 Improve Grafana Dashboard (#3734)
  • 1634d70 Improve flexibility in RayCluster yaml test (#1812)
  • eb66a26 Improve log message wording when service already exists during reconciliation (#4096)
  • 753429d Improve the observability of the init container (#1149)
  • 4199879 Include KUBERAY_VERSION in the user-agent (#2042)
  • 9c55fc4 Increase head node memory limit for RayService sample to avoid OOM (#4089)
  • 656602f Increase rayJob e2e timeout (#4124)
  • 8fb4ee9 Increased time precision using uint (#1675)
  • 6823da1 Init dashboardClientFunc and httpProxyClientFunc by the config arg (#2092)
  • 87dde22 Inject cluster name as an environment variable into head and worker pods (#934)
  • 928d690 Integrate with rayci (#3215)
  • 738801d Integration: KAI Scheduler (#3886)
  • ca9348d Kuberay 0.5.0 docs validation update docs for GCS FT (#1004)
  • c7edeae Make KubeRay Operator Image FIPS compliant (#1633)
  • 9662bd9 Make k8s job backoff limit configurable for RayJob (#2091)
  • 8ec59e5 Make sure kubectl ray logs only get ray container logs (#2649)
  • 1d98fec MobileNet example (#1175)
  • 2bb04c9 Move BatchSchedulerManager into reconciler option (#3935)
  • 5fde3c6 Move matching labels to association.go (#2734)
  • 249610c Numerous fixes to the API server to make RayJob APIs working (#1447)
  • 974bedf One word typo fix in docs and README (#1068)
  • a1ef760 Only build/push Multi Arch images when merging to master (#1764)
  • 510827f Only try once in HTTP health check commands (#3469)
  • e7fbf7d Operator support for openShift (#1371)
  • fe29409 Parametrize ray operator makefile to support other container engines (#1121)
  • 2c97ac3 Pin operator version in single namespace installation (#1210)
  • d0b6337 Pin to working config + stable release (#3885)
  • 44fc973 Post release 1.0.0 (#1651)
  • fc1e2d0 Post release 1.1.0 (#2040)
  • 14f96fc Properly set env field based on containerEnv values (#2175)
  • 56b4d14 Publish Multi Arch images (#1716)
  • 346ddd0 Ray serve gke gateway ingress (#1978)
  • b8f6d06 RayCluster Headless Worker Service Should PublishNotReadyAddresses (#2375)
  • 0a3c181 RayCluster Helm: Make volumeMounts and volumes optional for workers (#1689)
  • 15daa54 RayCluster updates status frequently (#1211)
  • 8e3296e RayClusterProvisioned status should be set while cluster is being provisioned for the first time (#2304)
  • 362da3d RayJob Volcano Integration (#3972)
  • acafbfe RayJob: don't delete submitter job when ShutdownAfterJobFinishes=true (#1881)
  • 0216b33 RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates (#1852)
  • 621e9c7 RayService event can't set redis password in both GCSFaultTolerance and rayStartParam (#3153)
  • 795db0d RayService object's Status is being updated due to frequent reconciliation (#1065)
  • aeb8b03 RayService: Omits Min and Max replicas from hash calculation (#2172)
  • 6c4a77d Rayjob event can't set redis password in both GCSFaultTolerance and rayStartParam (#3093)
  • 26372c2 Read cluster domain from env (#951)
  • 62ad934 Refactor Apiserver e2e run in cluster (#3529)
  • f0ff2c1 Refactor UpgradeStrategy to UpgradeSpec.Type (#2678)
  • 79c6c20 Refactor configuration test framework to follow Pylint conventions (#671)
  • ec642e7 Refactor multiple cases in single test function with array (#2857)
  • 160ab10 Refactor to Ensure Consistent Use of CRDType (#1892)
  • a7197c5 Refactor validateRayServiceSpec (#2711)
  • 5f158a6 Release v0.5.0 doc validation (#997)
  • 31c1e6a Release v0.5.0 doc validation part 2 (#999)
  • 9dd516d Release v0.5.0 python client library validation (#1006)
  • e4e8727 Release v0.6.0 doc validation (#1271)
  • f256ddd Remove GOARCH in ray-operator/Dockfile to support multi-arch images (#1442)
  • 728e1cb Remove ray-pod.tls.yaml (#3762)
  • 22cc61d Remove default option for batch scheduler name (#2371)
  • c3b17f3 Remove extranous arguments from examples (#2051)
  • e2e4208 Remove generate target from build/test targets (#1874)
  • 9109436 Remove helm-chart-releaser (#721)
  • 16fd58b Remove ingress.enabled from KubeRay operator chart (#812)
  • cc2e144 Remove kustomize from helm, as it is not required (#1370)
  • 2ae7574 Remove miniReplicas in raycluster-cluster.yaml (#1473)
  • fffe778 Remove preStop hooks from Ray CR Samples (#2724)
  • fb7a486 Remove redundant log line that is failing golangci-lint (#2366)
  • 1dbd949 Remove unecessary raycluster log in kai-scheduler logger (#3997)
  • 21a3611 Remove unused fields from KubeRay operator and RayCluster charts (#839)
  • 5b0b9af Remove unused icon from dashboard (#3599)
  • ffda626 Remove vLLM examples in favor of Ray Serve LLM (#3786)
  • 8be0a21 Removed use of the of BUILD_FLAGS in apiserver makefile (#1336)
  • 082389e Reorganize python client library (#984)
  • e4cf15f Replace kubectl wait command with RayClusterAddCREvent (#705)
  • 4b6f1df Reuse contexts across ray operator controllers (#1126)
  • c30fae2 Revert "Bump crd-ref-docs to v0.2.0 for Go 1.24+ compatibility (#4029)" (#4031)
  • e77b095 Revert "Disable async serve handler in Ray Service cluster (#447)" (#606)
  • 8b47826 Revert "Feature/cron scheduling rayjob 2426 (#3836)" (#3911)
  • ba1a000 Revert "Fix issue where unescaped semicolons caused task execution failures. (#3691)" (#3771)
  • a16f910 Revert "[BUG] Fix Dockerfile Error: WARN: FromAsCasing: 'as' and 'FROM' Keywords' Casing Do Not match (#2527)" (#2529)
  • 064e0ef Revert "[Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)" (#3573)
  • 493eb82 Revert "[CI] Skip redis raycluster test (#1465)" (#1490)
  • 5373748 Revert "[CRD] Delete CRD v1alpha1 (#1771)" (#1784)
  • d8ffec4 Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413)
  • 3479347 Revert "kubectl ray job submit: provide empty entrypoint (#3127)" (#3165)
  • 5d38eda Revise sample configs, increase memory requests, update Ray versions (#761)
  • b2a701d Rewrite detached actor test with go (#2722)
  • 4c2c046 Set imagePullPolicy in manager.yaml (#1710)
  • 6359d3c Show cluster name in kubectl get rayjob (#2065)
  • d0683a9 Single go.mod file (#3640)
  • 7f77e46 Standardize imports of github.com/ray-project/kuberay/ray-operator/apis/ray/v1alpha1 (#1112)
  • 979b909 Support --address flag for kubectl ray job submit (#3922)
  • 72a63ac Support Apache YuniKorn as one batch scheduler option (#2184)
  • 8694093 Support disable leader election for manager go binary via Values.yaml to mitigate kuberay restarts (#2262)
  • f27e4ac Support for Image pull policy (#2101)
  • df5577f Support gang scheduling with Apache YuniKorn (#2396)
  • 79f757c Support json structured logging (#1912)
  • 86abaab Support suspension of RayClusters (#1711)
  • 55b99e6 Support to set QPS and burst by configuration. (#3969)
  • 413b8ab Support uppercase default resource names for top-level Resources (#4137)
  • dbd6b72 TPU Multi-Host Support (#1913)
  • 3a2be0b Update APIServer docs for release v0.4.0 (#778)
  • ff89298 Update Autoscaler YAML for the Autoscaler tutorial (#1400)
  • 91921f2 Update CHANGELOG for v1.0.0 (#1650)
  • 0becdd8 Update Dockerfile to address closed CVEs (#1488)
  • 2057d76 Update Dockerfiles to address CVE-2023-44487 (HTTP/2 Rapid Reset) (#1540)
  • 25eb751 Update GCS fault tolerance YAML (#1404)
  • 87c5541 Update KubeRay release documentation (#3226)
  • 0c16aa6 Update KubeRay versions. (#821)
  • 7f986b6 Update Kuberay doc to version 1.0.0 rc.0 (#1441)
  • ba5f7e0 Update RayCluster values.yaml (#3950)
  • fbdf317 Update RayServices section title (#3906)
  • 6d0c637 Update TPU Ray CR manifests to use Ray 2.41.0 (#2965)
  • 714aea6 Update V6e TPU Ray Samples (#2448)
  • 729c1b7 Update Volcano integration doc (#1380)
  • 02135a4 Update apiserver chart location in readme (#896)
  • b438b50 Update bug-report.yml (#1906)
  • af8fb0c Update contribution doc to show users how to reach out via slack (#936)
  • aae9fac Update doc and base image for Go 1.19 (#1330)
  • 06a0564 Update feature-request.yml (#1907)
  • 3283254 Update gcs-ft.md (#777)
  • 247b7ca Update grafana dashboards to ray 2.49.2 + add README instructions on how to do the update (#4111)
  • bab00be Update kind version (#1957)
  • bde5e9a Update kuberay mcad integration doc (#1373)
  • 38e3527 Update latest release to v1.0.0-rc.0 in tests (#1467)
  • 15ce568 Update operator development instruction (#1458)
  • be10373 Update overwrite-container-cmd example (#1722)
  • 7a185af Update ray operator Dockerfile (#1213)
  • 944a042 Update ray-operator documentation and image version in ray-cluster.heterogeneous.yaml (#585)
  • b12a722 Update samples to use Ray 2.41.0 images (#2964)
  • 2e35bff Update securityContext values.yaml for kuberay-operator to safe defaults. (#1896)
  • e4a9645 Update swagger-initializer.js (#2543)
  • 12c0a90 Update test config (#654)
  • e6b2920 Update update-ray-job.kueue-toy-sample.yaml (#3782)
  • b9f0209 Update v6e-256 KubeRay Sample (#2466)
  • 6b12c18 Updated API server documentation (#1435)
  • 71984fb Updated default timeout seconds for probes (#2265)
  • f144145 Updates to the apiserver swagger-ui (#1410)
  • 2793492 Updating logrus and net packages in go.mod (#1495)
  • e3bdc83 Upgrade Kubernetes dependencies to v0.28.3 and Golang to 1.20 (#1648)
  • f2d94ff Upgrade dependencies to address CVEs (#1865)
  • b73daa9 Upgrade golang linter for precommit hook (#3319)
  • 1213d15 Upgrade manifests kustomize v5 (#2352)
  • 31d8a8c Upgrade to Go 1.19 (#1325)
  • ce960e2 Upgrade to address High CVEs (#1731)
  • 9be8abd Use Go 1.24.0 in go module (#3835)
  • a4893a8 Use ImplementationSpecific in ray-cluster.separate-ingress.yaml (#3781)
  • 838bc19 Use a default user agent 'kuberay-operator' instead of the default user-agent from controller-runtime (#1982)
  • 0fa7d3f Use ctrl log and create logger in function in kai-scheduler (#3995)
  • 4845306 Use ctrl logger in Volcano scheduler to include context (#4023)
  • 747708b Use helm-docs to generate README for chart kuberay-operator automatically (#3331)
  • c1dbdf1 Use standard golang image as build image and distroless image as base image for kuberay operator. (#1967)
  • 2e173a1 Use webhook.CustomValidator instead of deprecated webhook.Validator. (#2803)
  • 6cbb5df User longer exec probe timeouts for Head pods (#2353)
  • f0abc1d [0.4.0 Release] Minor doc improvements (#780)
  • 6f5047c [0.4.0 release] Update changelog for KubeRay 0.4.0 (#836)
  • c45fcf0 [1/N] [Lint] Group imports by sections (#3428)
  • 732a675 [1/N][apiserver] Fix half of linter issues for apiserver (#3328)
  • b16de0c [2.5.0 Release] Change version numbers 2.4.0 -> 2.5.0 (#1151)
  • a068e7b [2/N] [Lint] Group imports by sections (#3429)
  • 8944703 [2/N] [apiserver] Fix second-half apiserver lint (#3338)
  • 05c5e6b [3/N] [Lint] Group imports by sections (#3430)
  • 1eac370 [API Server] Add Ray Job output - start/end time and ray cluster name (#2533)
  • f3353b2 [API Server] Add security context to Ray Cluster (#2538)
  • 773a475 [API Server] Add v2 related helm (#3677)
  • 846416e [API Server] consolidate e2e test (#3674)
  • a8ec758 [APIServer][Docs] Identify API server as community-managed and optional (#753)
  • 5c0e2e9 [APIserver] [Ray Job] Added Job submission support to the API server (#1639)
  • 796bf06 [Apiserver] Determine the minimum resource requirements for KubeRay API server e2e tests (#3526)
  • 2ba0dd7 [Apiserver] Set the right amount of resource in e2e test (#3465)
  • a361dc3 [Apiserver] Use Eventually from Gomega instead of wait from apimachinery (#3433)
  • af6a005 [Apiserver][Refactor] Use polling in autoscaler e2e test (#3402)
  • 5f51977 [Autoscaler V2] Polish Autoscaler V2 YAML (#2064)
  • d125ab7 [Autoscaler] Improve TestRayClusterAutoscalerAddNewWorkerGroup (#3682)
  • 9e14ba6 [Autoscaler] Print the value of WorkerGroupSpec.Replicas (#3005)
  • c159491 [Autoscaler][Sample] Add comment for AUTOSCALER_UPDATE_INTERVAL_S (#3294)
  • 759ab3a [Autoscaler][Sample] Add comment for RAY_LOGGER_LEVEL (#4104)
  • 3f69f01 [Autoscaler][Test] Fix flaky idleTimeoutSeconds test (#2862)
  • 9c55794 [BUG] Fix Dockerfile Error: WARN: FromAsCasing: 'as' and 'FROM' Keywords' Casing Do Not match (#2527)
  • 9c28b7d [Benchmark] KubeRay memory / scalability benchmark (#1324)
  • 8ad2c1b [Bug] Add default value for entrypoint flags in job_submit.go (#3808)
  • 20636f9 [Bug] All worker Pods are deleted if using KubeRay v1.0.0 CRD with KubeRay operator v1.1.0 image (#2087)
  • 7ad3acf [Bug] Allow zero replica for workers for Helm (#968)
  • 2586468 [Bug] Autoscaler doesn't support TLS (#1119)
  • cceb7a5 [Bug] Avoid assigning an entry to a map that is nil (#1715)
  • ec40186 [Bug] Change image repository for make deploy (#2059)
  • f56c66f [Bug] Clean up WorkersToDelete after the scaling process finishes (#1747)
  • 39562b5 [Bug] Enable ResourceQuota by adding Resources for the health-check init container (#1043)
  • 3f7b34c [Bug] Fail to create ingress due to the deprecation of the ingress.class annotation (#646)
  • 7fd3927 [Bug] Fix RayCluster with an overridden app.kubernetes.io/name (#2147) (#2166)
  • af0c7a2 [Bug] Fix flakiness of RayService e2e tests (#1385)
  • b0096b0 [Bug] Fix flaky sample YAML tests (#1590)
  • f3ec71b [Bug] Fix flaky test: should be able to update all Pods to Running (#893)
  • c420135 [Bug] Fix null map handling in BuildServiceForHeadPod function (#1095)
  • f1e961a [Bug] Fix rebase error (#1897)
  • c683ad1 [Bug] Fix the filename of text summarizer YAML (#1415)
  • cf41e24 [Bug] Issue with glibc version GLIBC_2.34 and GLIBC_2.32 not found in earlier operator tags (#2272)
  • e4d4839 [Bug] KubeRay does not work on M1 macs. (#869)
  • 791ea37 [Bug] KubeRay operator failed to watch endpoint (#2080)
  • c22fbfa [Bug] KubeRay operator fails to get serve deployment status due to 500 Internal Server Error (#1173)
  • 7aea947 [Bug] KubeRay tries to create ClusterRoleBinding when singleNamespaceInstall and rbacEnable are set to true (#1190)
  • a0e59be [Bug] Long image pull time will trigger blue-green upgrade after the head is ready (#1231)
  • e2a6ae8 [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_detached_actor flaky (#619)
  • 1ab5a00 [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_ray_serve flaky (#650)
  • d46b431 [Bug] Modification of nameOverride will cause label selector mismatch for head node (#572)
  • cbc9b0b [Bug] Pod reconciliation fails if worker pod name is supplied (#587)
  • 47b4e80 [Bug] Ray operator crashes when specifying RayCluster with resources.limits but no resources.requests (#2077)
  • 52af139 [Bug] RayService restarts repeatedly with Autoscaler (#1037)
  • ac56e33 [Bug] RayService with GCS FT HA issue (#1551)
  • 2bd5c9e [Bug] Re-enable flaky kubectl plugin e2e test "should reconnect after pod connection is lost" (#3116)
  • 79c7c87 [Bug] Re-enable flaky kubectl plugin e2e test in kubectl_ray_job_submit_test.go (#3124)
  • a87f9a6 [Bug] Reconciler error when changing the value of nameOverride in values.yaml of helm installation for Ray Cluster (#1966)
  • 0cabd14 [Bug] Service (Serve) changing port from 8000 to 9000 doesn't work (#1081)
  • 60de974 [Bug] Shallow copy causes different worker configurations (#714)
  • 01b4883 [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted (#4141) (#4156)
  • 5dab94c [Bug] Submitter K8s Job fails even though the RayJob has a JobDeploymentStatus Complete and a JobStatus SUCCEEDED (#1919)
  • d05964c [Bug] TestRayServiceInPlaceUpdate is flaky (#2620)
  • 457d67a [Bug] Update wait function in test_detached_actor (#635)
  • 82c925b [Bug] autoscaler not working properly in rayjob (#1064)
  • 3581b91 [Bug] client_golang used by KubeRay has a vulnerability (#728)
  • 2b136c9 [Bug] compatibility test for the nightly Ray image fails (#1055)
  • 1186737 [Bug] error: git cmd when following docs (#831)
  • ddb5e52 [Bug] fix RayActorOptionSpec.items.spec.serveConfig.deployments.rayActorOptions.memory int32 data type (#1220)
  • c880029 [Bug] kubectl plugin e2e test is flaky (#3147)
  • 17264a6 [Bug] label rayNodeType is useless (#698)
  • 0672956 [Bug] rayStartParams is required at this moment. (#1031)
  • bc6be0e [Bug][Autoscaler] Operator does not remove workers (#1139)
  • 0d813b4 [Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)
  • deec37c [Bug][Doc] Increase default operator resource requirements, improve docs (#727)
  • ca929e9 [Bug][Doc] fix the link error of operator document (#1046)
  • d632ac1 [Bug][GCS FT] Clean up the Redis key before the head Pod is deleted (#1989)
  • 2019b4b [Bug][GCS FT] Worker pods crash unexpectedly when gcs_server on head pod is killed (#1036)
  • 0e959cf [Bug][RayCluster] Fix RAY_REDIS_ADDRESS parsing with redis scheme and multiple addresses (#1556)
  • 664b19a [Bug][RayJob] Avoid nil pointer dereference (#1756)
  • 5da4a04 [Bug][RayJob] Check dashboard readiness before creating job pod (#1381) (#1429)
  • 5a974fc [Bug][RayJob] Fix FailedToGetJobStatus by allowing transition to Running (#1583)
  • f106737 [Bug][RayJob] RayJob with custom head service name (#1332)
  • 9b26ba7 [Bug][RayService] KubeRay does not recreate Serve applications if a head Pod without GCS FT recovers from a failure. (#1420)
  • c9802e9 [Bug][apiserver] fix apiserver create rayservice missing serve port (#734)
  • 72ca169 [Bug][breaking change] Unauthorized 401 error on fetching Ray Custom Resources from K8s API server (#1128)
  • f6a172f [Bug][k8s compatibility] k8s v1.20.7 ClusterIP svc do not updated under RayService (#1110)
  • 3875356 [Bug][kubectl-plugin] Wrong behavior for InteractiveMode RayJob with BackoffLimit set (#3555)
  • 99505a5 [Build][kubectl-plugin] Add release script for kubectl plugin (#2407)
  • 4b75753 [CI] Add kind-in-Docker test to Buildkite CI (#1243)
  • 1d1b8ce [CI] Add apiserver e2e test to buildkite (#3351)
  • ba6a7a2 [CI] Add shellcheck and fix error of it (#2933)
  • 4ca05ab [CI] Add workflow to manually trigger release image push (#801)
  • 268a776 [CI] Auto download golang tools in pre-commit (#2917)
  • bd7feba [CI] Bump Go version to 1.23 to support E2E Operator Version Upgrade tests (#3406)
  • 0e53381 [CI] Change Pre-commit-shellcheck-to-shellcheck-py (#2974)
  • 4db24e5 [CI] Composable kube resource logger when test failed (#3070)
  • 7542c5e [CI] Create release tag for ray-operator Go module (#1574)
  • e595ee4 [CI] Deflaky TestRayServiceGCSFaultTolerance (#2660)
  • 03f1a2e [CI] Don't need to publish the security proxy image (#1885)
  • b56a973 [CI] Don't push new images to DockerHub (#1923)
  • 05e9279 [CI] Downgrade runner image from ubuntu-latest to ubuntu-22.04 (#2714)
  • 00abf6e [CI] Enable testifylint empty rule (#2908)
  • 1830a6d [CI] Enable testifylint error-nil rule (#2907)
  • 3e97888 [CI] Enable testifylint expected-actual rule (#2914)
  • 67ed6ce [CI] Enable testifylint float-compare rule (#2910)
  • 17d6067 [CI] Enable testifylint require-error rule (#2909)
  • bc2bd71 [CI] Enable testifylint bool-compare rule (#2911)
  • 2ac2a92 [CI] Enable testifylint formatter rule (#2915)
  • cdee6f4 [CI] Enable testifylint len rule (#2945)
  • 4d2795b [CI] Enable testifylint rule (#2896)
  • 67596d3 [CI] Fix MultiArch image push (#3575)
  • a1e8c56 [CI] Fix RayService CI (#2525)
  • 2a9e647 [CI] Fix apiserver test in image-release process (#1880)
  • 02909a2 [CI] Fix autoscaler e2e test flakiness caused by timeout (#3668)
  • 894f31e [CI] Fix image release pipeline (#1878)
  • 0ac9942 [CI] Fix lint error (require-error) (#2931)
  • 535a405 [CI] Fix variable initializations used in test case declarations (#1775)
  • abd3f87 [CI] Fix: /etc/docker/daemon.json: No such file or directory (#3565)
  • f3ed172 [CI] Generate CRD json schema separately in pre-commit (#2930)
  • 08b9908 [CI] Install kuberay operator in buildkite test (#1308)
  • 4bd2dab [CI] Jail flaky test: TestRayServiceInPlaceUpdate (#2638)
  • fa67724 [CI] Make release.yaml only be triggered manually (#2798)
  • 353e87f [CI] Move e2e tests to buildkite (#2639)
  • cce897b [CI] Only run test_ray_serve for Ray 2.6.0 and later (#1288)
  • c7a6894 [CI] Pin crd-ref-docs to v0.0.10 (#1988)
  • 39a8480 [CI] Pin go version in CRD consistency check (#794)
  • 3db8d23 [CI] Pin kustomize to v5.3.0 (#2067)
  • 4f85055 [CI] Publish KubeRay operator / apiserver images to Quay (#1307)
  • dec8137 [CI] Reenable rayjob sample yaml latest test (#1464)
  • 2c5a6d0 [CI] Refactor pipeline and test RayCluster sample yamls (#1321)
  • 77d0bba [CI] Remove RayService tests from comopatibility-test.py (#1395)
  • 56cdfb6 [CI] Remove compatibility-test.py and modified CI (#2882)
  • 0e9d177 [CI] Remove create tag step from release (#3249)
  • 629bc8f [CI] Remove extraPortMappings from kind configurations (#1366)
  • 2e23506 [CI] Remove test_security.py and all python test dependencies in CI (#3123)
  • 1fdf04c [CI] Remove unnecessary kind load $RAY_IMAGE for e2e sample YAML tests (#1863)
  • 085c29d [CI] Remove unnecessary release.yaml workflow (#1168)
  • a1cf47d [CI] Remove unnecessary sample YAML symbolic links (#2118)
  • 0b61523 [CI] Replace lint CI with pre-commit (#2129)
  • df7cfe1 [CI] Run sample job YAML tests in buildkite (#1315)
  • 4bb1226 [CI] Skip kubectl plugin flaky e2e tests (#2800)
  • 84c35ac [CI] Skip redis raycluster test (#1465)
  • 21058dc [CI] Skip the flaky compatibility test test_detached_actor until https://github.com/ray-project/ray/issues/41343 (#1694)
  • 75a63a5 [CI] Split Autoscaler e2e tests into 2 buildkite runners (#3715)
  • 0288281 [CI] Stop publishing images to DockerHub (#1926)
  • da763f2 [CI] Stop to publish new images to DockerHub (#1702)
  • 83f3095 [CI] Unjail TestRayServiceInPlaceUpdate (#2650)
  • 4fbdb9e [CI] Update latest ray version 2.5.0 -> 2.6.3 (#1320)
  • 0561ba1 [CI] Upload logs as artifacts to BuildKite (#3405)
  • ef7cf5e [CI] Use golang:1.24-bookworm (Debian 12) in CI for Python-3.11 support (#3949)
  • e801dc1 [CI] Use quay as the default image registry (#1939)
  • 9e37e19 [CI] Verify kubectl in kind-in-docker step (#1305)
  • a9aa9a3 [CI] apply resource logger to ray cluster test (#3075)
  • 945698b [CI] apply resource logger to ray service test (#3081)
  • 23c9e5b [CI] dump failed test k8s resources (#3025)
  • 3114a0c [CI] fix locust versions (#3100)
  • 60bc89d [CI] fix missing Go module release step (#3644)
  • c764021 [CI] split rayservice e2e test into another runner and decrease timeout to 30m (#2667)
  • e9073fc [CI] stream operator logs from kind in go e2e tests (#1793)
  • 03969c9 [CI]: Kuberay operator e2e tests (#1575)
  • f123a44 [CI]: change kubectl plugin e2e test to buildkite (#2861)
  • bd53766 [CI][#2905] Improvement: enable testifylint compares rule (#2977)
  • f82e7ea [CI][Buildkite] An example test for Buildkite (#919)
  • 1a8895e [CI][Buildkite] Fix the PATH issue (#952)
  • 0e1c248 [CI][GitHub-Actions] Upgrade actions/upload-artifact to v4 (#2373)
  • cbde878 [CI][HELM] Use chart-testing to install Helm charts (#3412)
  • e96dedc [CI][Hotfix] Increase the timeout of Test E2E from 30m to 1h (#2664)
  • 1d4a403 [CI][RayService] deflaky the TestAutoscalingRayService (#3119)
  • d723f50 [CRD] Delete CRD v1alpha1 (#1771)
  • 77e299b [CRD] Inject CRD version to the Autoscaler sidecar container (#1496)
  • 96c4d66 [CRD] Set maxDescLen to 0 (#1449)
  • 7b00aca [CRD] Sync v1alpha1 CRD with v1 CRD (#1788)
  • b7bc7ae [CRD][1/n] Create v1 CRDs (#1481)
  • 1184bc8 [CRD][2/n] Update from CRD v1alpha1 to v1 (#1482)
  • 7336ea6 [Chore] Add RayJob InteractiveMode sample yaml (#3062)
  • 491fbde [Chore] Add golangci-lint rules (#2128)
  • d901fd0 [Chore] Add kubectl plugin and dashboard to components in issue template (#3678)
  • 49a5725 [Chore] Add pre-commit hooks (#2127)
  • ca98d1f [Chore] Create example Modin RayJob (#2221)
  • e0318a3 [Chore] Delete redundant pod existance checking (#2113)
  • 41c9e91 [Chore] Fix golangci-lint rule: gosec (#2163)
  • fb58429 [Chore] Fix lint errors caused by casting int to int32 (#2368)
  • 445b941 [Chore] Improve the appearance of compute resources status in the output of kubectl describe (#1802)
  • 2b31c30 [Chore] Make error as a local variable (#2841)
  • 80ab11c [Chore] Modify pre-commit yaml to allow golangci-lint version with prefix "v" (#2824)
  • b16fb3f [Chore] Remove CHANGELOG.md (#3819)
  • 3471f99 [Chore] Remove duplicate make command (#4145)
  • e02751a [Chore] Run operator outside the cluster (#2090)
  • 5d3d9d3 [Chore] Turn off golangci-lint rules except ray-operator (#2138)
  • 7a43534 [Chore] Turn off no-commit-to-branch rule (#2139)
  • 7cc3548 [Chore] Upgrade Ray to 2.46.0 follow-up (#3722)
  • 35e913a [Chore] Use Ray 2.9.0 for Apache YuniKorn example (#2427)
  • 949875a [Chore] Use new golangci-lint rules only for ray-operator (#2152)
  • 6eeca32 [Chore] Use safe YAML for helm-chart-verify-rbac (#2230)
  • 5894146 [Chore] make err as local variable in if-statement (#2718)
  • d2ae625 [Chore] make ingressClassName as a local variable (#2815)
  • dd46cb4 [Chore] remove redundant var declaration (#2811)
  • 6350033 [Chore] remove unnecessary line break in log (#2709)
  • 5db3012 [Chore] specify the capacity on calling make (#2719)
  • 20ed56f [Chore] update comment for headGroupSpec and entrypoint (#2802)
  • d97e37a [Chore][CI] Limit the release-image-build github workflow to only take tag as input (#3117)
  • 9b0eda4 [Chore][CI] Remove StreamKubeRayOperatorLogs (#2637)
  • 0c09b05 [Chore][CI] Upgrade ray version to 2.40 except for TestRayServiceInPlaceUpdate (#2629)
  • 7b81970 [Chore][Comment] Fix wrong comment (#2294)
  • 54ba287 [Chore][Linter] Upgrade golangci-lint to 1.60.3 (#2362)
  • 784b7f3 [Chore][Log] Delete error loggings right before returned errors (#2103)
  • b08a5ae [Chore][Minor] Add .gitignore to kubectl-plugin (#2383)
  • ca7db14 [Chore][RayJob] Remove the TODO of verifying the schema of RayJobInfo because it is already correct (#1911)
  • 3514856 [Chore][Sample-yaml] Upgrade pytorch-lightning to 1.8.5 for ray-job.pytorch-distributed-training.yaml (#3796)
  • 296d480 [Chore][Samples] Rename ray-cluster.mini.yaml and add workerGroupSpecs (#2100)
  • 708d758 [Chore][YuniKorn] Add sample yaml file for Apache YuniKorn (#2412)
  • 135f129 [Chore][kubectl-plugin] Fix wrong homepage link in krew template file (#2461)
  • ab17363 [Chore][precommit] Replace grep with awk in pre-commit hooks for BSD compatibility (#2541)
  • ea0b9c5 [Community] Add KubeRay community guide (#3859)
  • 38a07e9 [Community][2/N] Governance model (#3977)
  • 30c5d74 [Compatibility] Update Redis image for compatibility tests (#2852)
  • c88b174 [DOCS] Apiserver improve docs readability (#3564)
  • d1b07df [DOCS] KubeRay APIServer V2 document (#3594)
  • 4ac20b3 [DOCS] document step to do before running e2e test (#3385)
  • aeab361 [Dashboard-client] Add proper error checking in dashboard client (#3953)
  • 39d7e71 [Dashboard-client] replace http method from string to constant (#3961)
  • b87480e [Doc] Add helm update command to chart validation step in release process (#1165)
  • 6565845 [Doc] Add a YAML to explain why some worker pod are not ready in RayService (#3139)
  • f5e0ef5 [Doc] Add blogs and talks to readme (#1691)
  • 1359dd5 [Doc] Add git fetch --tags command to release instructions (#1164)
  • 41018bc [Doc] Add gke bucket yaml (#1372)
  • 44ff72c [Doc] Cannot build kuberay with Go 1.16 (#575)
  • e52dd3b [Doc] Copyedit dev guide (#1012)
  • ffac2c8 [Doc] Delete unused docs (#1440)
  • 83fea90 [Doc] Deprecate ServiceUnhealthySecondThreshold and DeploymentUnhealthySecondThreshold (#1688)
  • e9a2698 [Doc] Develop Ray Serve Python script on KubeRay (#1250)
  • 9c53a72 [Doc] Fix Doc Typos (#2060)
  • 7391341 [Doc] Fix Yaml Typos (#2049)
  • 856a33e [Doc] Fix release doc format (#1578)
  • b26f106 [Doc] Fix the order of comments in sample Job YAML file (#1242)
  • 1ee5f95 [Doc] GKE GPU cluster setup (#1223)
  • 04388da [Doc] Improve DEVELOPMENT.md by adding more guidances (#1794)
  • c16cac4 [Doc] Improve FAQ page and RayService troubleshooting guide (#1225)
  • 3b81601 [Doc] Improve RayService doc (#1235)
  • cb12484 [Doc] Reference helm chart version in helm-chart/kuberay-operator/README.md.gotmpl with go template (#3763)
  • 73eef73 [Doc] Remove KubeRay CLI references and add Python client details (#2521)
  • 3754d34 [Doc] Support CRD docs generation (#1625)
  • cc1ff48 [Doc] Support consistency check for API reference in CI (#1655)
  • d78d34f [Doc] Update README (#1433)
  • be22ecf [Doc] Update README (#3695)
  • 6e1f1bd [Doc] Update nav to include missing files and reorganize nav (#1011)
  • 9425e7f [Doc] Update release docs (#1621)
  • 6c0fbbe [Doc] Update version from 0.4.0 to 0.5.0 on remaining kuberay docs files (#1018)
  • 7a1e322 [Doc] Upload a screenshot for the Serve page in Ray dashboard (#1236)
  • adde70c [Doc] [RayJob] Add documentation for submitterPodTemplate (#1228)
  • d55dfc3 [Doc] add ray cluster uv sample yaml (#3720)
  • f3ebea7 [Doc][CI] Align K8s version in Doc and CI with minimal required version (#3628)
  • 98496f4 [Doc][Fix] correct the indention of storageClass in ray-cluster.persistent-redis.yaml (#3780)
  • 167a71d [Doc][Website] Add complete document link (#1224)
  • fa26bb2 [Doc][Website] Update KubeRay introduction and fix layout issues (#1042)
  • 8430410 [Docs] Add kubectl plugin create cluster sample yaml config files (#3804)
  • fd4ab91 [Docs] Align development guide with Makefile docker-build logic (#3248)
  • 89e980f [Docs] Correct command to load KubeRay operator image (#3387)
  • 192d1ea [Docs] Revise release note docs (#835)
  • 36f32ed [Docs] Update Security Guidance on Dashboard Ingress (#1413)
  • 0532645 [Docs] add sample RayCluster using kube-rbac-proxy for dashboard access control (#2578)
  • ebf8a53 [Docs] add sample RayCluster with FluentBit sidecar to persist Ray logs (#2602)
  • c693140 [Docs] update development md (#3230)
  • 7fb46ab [Docs][Development] Delete linting docs (#2145)
  • f37a4cc [Docs][kubectl-plugin] Add doc for install via Krew (#2458)
  • dcbdbfc [Docs][kubectl-plugin] Add instructions for downloading from GitHub release (#2450)
  • 06367a3 [Docs][ray-operator] Add types of tests and debug tips to development doc (#3401)
  • 0a56cd4 [Enhancement] GPU RayCluster doesn't work on GKE Autopilot (#1470)
  • eb59de4 [Enhancement] Remove unused variables in constant.go (#1474)
  • e009704 [Experimental] Fix Makefile tool check: replace -s with test -s (#3970)
  • 9e68367 [FEAT] show event message when raycluster not found in clusterSelector in rayjob (#4125)
  • 9321b2d [FIX][DOC] development markdown example (#2687)
  • 35b96f1 [Feat] Add e2e test for applying ray-job.interactive-mode.yaml (#3779)
  • b81af7c [Feat] Add sample yaml for RayJob clusterSelector config (#2505)
  • 6186a7d [Feat] Deprecate ForcedClusterUpgrade (#2075)
  • f3430b0 [Feat] Remove RayService sample YAML Python tests (#2565)
  • 2278768 [Feat]: Add a field to configure whether to add a proxy actor on the head Pod to the K8s serve service or not (#2598)
  • 5d3bceb [Feat][Kubectl-Plugin] Implement kubectl session for RayJob and RayService (#2379)
  • 6786350 [Feat][Kubectl-Plugin]Implement kubectl ray job submit (#2394)
  • ea314d7 [Feat][RayCluster] Introduce the RayClusterStatus.Conditions field (#2214)
  • d2b3338 [Feat][RayCluster] Make the Head service headless (#2117)
  • ca39dc9 [Feat][RayCluster] Use a new RayClusterReplicaFailure condition to reflect the result of reconcilePods (#2259)
  • cc94c6a [Feat][RayJob] Delete RayJob CR after job termination (#2225)
  • cf4a877 [Feat][RayJob] UserMode SubmissionMode (#2364)
  • 6079dc5 [Feat][Sample-yaml] Deprecated python sample yaml test cleanup (#2507)
  • bc61ad9 [Feat][apiserver] Support CORS config (#3711)
  • 84839a8 [Feat][kubectl-plugin] Add Long, Example, shell completion for kubectl ray log (#2405)
  • 4e3340c [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray get node & workergroup (#3154)
  • f69885b [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray session (#2390)
  • 800ac16 [Feat][kubectl-plugin] Add instructions for static shell completion (#2384)
  • bee1b71 [Feat][kubectl-plugin] Add kubectl ray version command (#2424)
  • 32d8cde [Feat][kubectl-plugin] Create cluster with TPUs (--worker-tpu, --num-of-hosts) and TPUs' validation (#3258)
  • 090fad0 [Feat][kubectl-plugin] Include LICENSE file into kubectl plugin tar (#2422)
  • 6e8b0b0 [Feat][kubectl-plugin] Retry port-forward when connection lost (#2704)
  • 52e330b [Feat][kubectl-plugin] Support -v flag for kubectl ray job submit (#3524)
  • 39d42fb [Feature] Add Kubernetes manifest validation in pre-commit. (#2380)
  • f7edc22 [Feature] Add ManagedBy field to RayCluster (#2597)
  • e6af2cc [Feature] Add ManagedBy field to RayJob (#2589)
  • 4bce739 [Feature] Add a chart-test script to enable chart lint error reproduction on laptop (#563)
  • 99abccf [Feature] Add a flag to make zero downtime upgrades optional (#1564)
  • 0bbdec2 [Feature] Add allow CORS in apiserversdk (#4059)
  • 3fe9605 [Feature] Add an e2e test for Autoscaler to scale up by manually updating (#2634)
  • 9d25660 [Feature] Add an e2e test for K8s Job submitter failures (#2688)
  • 96d1ac2 [Feature] Add an example for RayService high availability (#1566)
  • dcaf6a5 [Feature] Add apiserver unit test(pkg/util/cluster.go) (#3348)
  • d56356b [Feature] Add cleanup for terminated RayJob/RayCluster metrics (#3923)
  • ed44425 [Feature] Add default init container in workers to wait for GCS to be ready (#973)
  • 6b3836e [Feature] Add e2e test for UpdateRayService function (#3446)
  • 6687955 [Feature] Add e2e test for setting RayCluster deletion delay in RayService (#3912)
  • 0474e8d [Feature] Add e2e tests for Autoscaler V2 (#2588)
  • c85646f [Feature] Add eslint and Prettier to ray dashboard (#3975)
  • 5990b05 [Feature] Add initializing timeout for RayService (#4143)
  • cd9b2e8 [Feature] Add python client test to action (#993)
  • a0ee1c8 [Feature] Add service account section in helm chart (#969)
  • f45155b [Feature] Add timeout for apiserver grpc server (#3427)
  • 39e8028 [Feature] Add timestamps for logs in e2e tests (#3006)
  • 7db8f69 [Feature] Add unit test for update service request validation (#3546)
  • e11a9b7 [Feature] Adding RAY_CLOUD_INSTANCE_ID as unique id for Ray node (#1759)
  • de8bc26 [Feature] Allow RayCluster Helm chart to specify different images for different worker groups (#1352)
  • 002e375 [Feature] Allow custom labels&annotations for kuberay operator (#1276)
  • c13498b [Feature] Auto detect MIG GPUs and pass them into Ray’s logical resources. (#3567)
  • 34e394f [Feature] Consistency check for RBAC (#577)
  • 633ff63 [Feature] Define a general-purpose cleanup method for CREvent (#849)
  • e128863 [Feature] Disable zero downtime upgrade for a RayService using RayServiceSpec (#2468)
  • 13eb7b2 [Feature] Display reconcile failures as events (ServiceAccount) (#2290)
  • b6b00c8 [Feature] Docker support for chart-testing (#623)
  • 2600854 [Feature] Enable namespaced installs via helm chart (#860)
  • 40775c5 [Feature] Expose initContainer image in RayCluster chart (#674)
  • 2ee95cc [Feature] Fix auto upgrade prometheus (#3449)
  • 6cbb8e7 [Feature] Fix dependency upgrade for gomock (#3558)
  • 4714892 [Feature] Improve and fix Prometheus & Grafana integrations (#895)
  • 244003b [Feature] Improve observability for flaky RayJob test (#1587)
  • c6df15e [Feature] Improve the observability of integration tests (#775)
  • 551de65 [Feature] Include CR UID in kuberay metrics (#4003)
  • c6bafa3 [Feature] Make Ray and Logs links proxy to their Ray dashboards (#4112)
  • 3aebd8c [Feature] Make head serviceType optional (#851)
  • 1ed0b7f [Feature] Make replicas optional for WorkerGroupSpec (#1443)
  • 3bb01e8 [Feature] Manually fix controller runtime package upgrade (#3448)
  • a53d942 [Feature] Manually fix net package upgrade (#3447)
  • 1a94b43 [Feature] Manually upgrade k8s package group (#3486)
  • 8c8222c [Feature] Move some functions from prototype test framework to a new utils file (#837)
  • c552d3c [Feature] Override the block option of rayStartParams to true (#1718)
  • 2fb9465 [Feature] Print KubeRay logs in Buildkite runner when tests fail (#2690)
  • 49e7520 [Feature] Provide multi-arch images for apiserver and security proxy (#4131)
  • 78b9828 [Feature] REP 54: Add PodName to the HeadInfo (#2266)
  • ad06bbd [Feature] Ray container must be the first application container (#1379)
  • fd27b75 [Feature] Ray restricted podsecuritystandards for enterprise security and Kubeflow integration (#750)
  • 4fdb87d [Feature] RayService HA test - GCS fault tolerance + kill GCS process (#2590)
  • dd7ed90 [Feature] Refactor test framework & test kuberay-operator chart with configuration framework (#759)
  • ffcf704 [Feature] Remove Docker container and NodePort from compatibility test (#844)
  • 3129b87 [Feature] Remove checking CRD in Volcano scheduler initialization (#4011)
  • d0debd1 [Feature] Replace service name with Fully Qualified Domain Name (#938)
  • 1d3f537 [Feature] Run config tests with the latest release of KubeRay operator (#858)
  • ea6e8d1 [Feature] Running end-to-end tests on local machine (#589)
  • 8a35f18 [Feature] Separate controller namespace and CRD namespaces for KubeRay-Operator Dashboard (#4088)
  • fd06b5b [Feature] Set default appProtocol for Ray head service to tcp (#668)
  • 6691b70 [Feature] Split ray.io/originated-from into ray.io/originated-from-cr-name and ray.io/originated-from-crd (#1864)
  • a9beafb [Feature] Support ARM image for test (#2699)
  • f22a75a [Feature] Support Volcano Network Topology Aware Scheduling for kuberay (#4105)
  • d6aef8b [Feature] Support Volcano for batch scheduling (#755)
  • 017e58f [Feature] Support configurable RayCluster deletion delay in RayService (#3864)
  • baccb09 [Feature] Support environment variables for KubeRay operator chart (#978)
  • a45e4ab [Feature] Support for overwriting the generated ray start command with a user-specified container command (#1704)
  • 6c9f859 [Feature] Support inject specific env vars to all Ray containers in all RayCluster CRs by configuration (#4103)
  • 9bc5d85 [Feature] Support suspend in RayJob (#926)
  • 4aa53f4 [Feature] Sync for manifests and helm chart (#564)
  • 56b2f61 [Feature] Sync logs to local file (#632)
  • ca6d792 [Feature] TLS authentication (#989)
  • b4b1ce7 [Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones (#678)
  • 65a7703 [Feature] Test sample RayService YAML to catch invalid or out of date one (#731)
  • 71e260f [Feature] The default ImagePullPolicy should be IfNotPresent (#947)
  • f6a401a [Feature] Upgrade ginkgo (#3503)
  • 37cf2ac [Feature] Upgrade golang version (#3461)
  • 9620772 [Feature] Upgrade grpc gateway version manually (#3491)
  • 1be2ae0 [Feature] Upgrade net package (#3485)
  • f4412f6 [Feature] Use image of Ray head container as the default Ray Autoscaler container (#1401)
  • 92c2907 [Feature] Validation of RayFTEnabled is false and GcsFaultToleranceOption is not nil (#2726)
  • fa74914 [Feature] Warn Users When Updating the RayClusterSpec in RayJob CR (#1778)
  • 36b112e [Feature] Watch CR in multiple namespaces with namespaced RBAC resources (#1106)
  • 27728d7 [Feature] [API Server] Support activeDeadlineSeconds in API Server RayJob resource (#3335)
  • bc17cd9 [Feature] [Fix] Ensure Correct Logs Display for Go Test Logs in Buildkite Runner (#2837)
  • bbdff70 [Feature] [KubeRay DashBoard] Reimplement and replace the Compute Template section in the New Job (#4119)
  • 89f5fba [Feature] [RayJobs] Use finalizers to implement stopping a job upon cluster deletion (#735)
  • b6bcf10 [Feature] [scheduler-plugins] Support second scheduler mode (#3852)
  • 09aad7e [Feature] integrate RayDashboard with apiserver V2 (#4054)
  • 2db5c5d [Feature] update yarn version from v1 to latest (#3945)
  • f2d7c1f [Feature]: Add a new event type FailedToDeleteWorkerPodCollection (#2680)
  • 536ca35 [Feature][APIServer v2] Support Compute Template in APIServer v2 (#3959)
  • 491c488 [Feature][APIServer] Support decimal memory values in KubeRay APIServer (#3956)
  • 8a31bfd [Feature][APIServer] add retry for http client (#3551)
  • 5a766fd [Feature][Doc] Access S3 bucket from Pods in EKS (#958)
  • 2a84a4b [Feature][Doc] End-to-end KubeRay Operator development process on Kind (#826)
  • f4b2823 [Feature][Doc] Explain that RBAC should be synchronized manually (#641)
  • 1c648a3 [Feature][Doc] Kubeflow integration (#937)
  • 3ac1b5a [Feature][Docs] AWS Application Load Balancer (ALB) support (#658)
  • 0564748 [Feature][Docs] Explain how to specify container command for head pod (#912)
  • cfa1203 [Feature][GCS FT] Best-effort redis cleanup job for 5 minutes (#1766)
  • 72ba3a3 [Feature][GCS FT] Clean up Redis once a GCS FT-Enabled RayCluster is deleted (#1412)
  • 310911c [Feature][Helm] Align the key of minReplicas and maxReplicas (#663)
  • 0adc508 [Feature][Helm] Enable sidecar configuration in Helm chart (#604)
  • 4e9fdb0 [Feature][Helm] Expose the autoscalerOptions (#666)
  • 5ca90b3 [Feature][Hotfix] Add observedGeneration to the status of CRDs (#979)
  • 9835cc8 [Feature][Observability] Scrape Autoscaler and Dashboard metrics (#1493)
  • 692138b [Feature][Ray-operator] Improve RayJob validation for shutdownAfterJobFinishes and ttlSecondsAfterFinished (#3653)
  • 5231dbf [Feature][RayCluster]: Deprecate the RayCluster .Status.State field (#2288)
  • d025792 [Feature][RayCluster]: Generate GCS FT Redis Cleanup Job creation events (#2382)
  • 5062a8c [Feature][RayCluster]: Implement the HeadReady condition (#2261)
  • b5f14f1 [Feature][RayCluster]: introduce RayClusterSuspending and RayClusterSuspended conditions (#2403)
  • b2dbb15 [Feature][RayJob] Remove the deprecated RuntimeEnv from CRD. Use RuntimeEnvYAML instead. (#1792)
  • fab00b5 [Feature][RayJob] Support light-weight job submission (#1893)
  • 809bfb2 [Feature][RayJob] Support light-weight job submission with entrypoint_num_cpus, entrypoint_num_gpus and entrypoint_resources (#1904)
  • 6d5020f [Feature][RayJob] Use Use RayContainerIndex instead of 0 (#1427)
  • 73e6c5d [Feature][RayJob]: Generate submitter and RayCluster creation/deletion events (#2389)
  • 1283a62 [Feature][RayService] Set default ports (#3262)
  • 72e9933 [Feature][autoscaler v2] Set RAY_NODE_TYPE_NAME when starting ray node (#1973)
  • da78df4 [Feature][kubectl-plugin] Expose setting shutdownAfterJobFinishes and ttlSecondsAfterFinished in ray job submit (#3627)
  • 22c2b45 [Feature][kubectl-plugin] Implement kubectl ray session (#2298)
  • c86b03b [Feature][kubectl-plugin] Quick fix for Job Submission ID (#2469)
  • 4e5a916 [Feature][kubectl-plugin] add KubeRay operator version query (#2443)
  • 6ca956b [Feature][kubectl-plugin] e2e test for 'kubectl ray log' (#2486)
  • 1bc821e [Feature][kubectl-plugin] return usage error when no entrypoint input (#2503)
  • 5cb2f56 [Feature][kubectl-plugin]'ray log command' Add check and cleanup directory when no ray node exist (#2473)
  • fdf7251 [Fix] Adjust crd path to verify changed files (#3103)
  • a56b091 [Fix] Consistent parsing of custom accelerator resources (#2464)
  • 6e70fd2 [Fix] Directly fail if RayJob metadata is invalid (#3981)
  • 2f2c1a2 [Fix] RayCluster fails to transit Status.State to Ready when numOfHosts > 1 (#3353)
  • 2300814 [Fix] Standardize Buildkite Display Format Across All Tests (#2992)
  • 9068102 [Fix] Update Ray Service Troubleshooting Link (#2727)
  • fd9c90c [Fix] Use go 1.22 on Buildkite autoscaler e2e tests (#2211)
  • 7d53e78 [Fix] changelog-generator.py failed to parse some commit messages (#3818)
  • 9559227 [Fix][CI] E2E tests do not reflect error (#3021)
  • 795f799 [Fix][CI] Fix ray operator image build error by setting up docker buildx (#3750)
  • 93e32d0 [Fix][CI] Fix revive error (#2183)
  • 5124ef8 [Fix][CI] Redirect stderr to stdout in Test Autoscaler E2E (nightly operator) (#3074)
  • f6bf32f [Fix][CI] kubectl plugin krew index CI error (#3015)
  • dea87ff [Fix][Envtest] Decorate container nodes with Ordered (#2285)
  • b903d40 [Fix][HelmChart] Move service.headService -> head.headService in values.yaml (#1998)
  • 990ffe3 [Fix][Helm] Fix ClusterRole for volcano if .Values.batchScheduler.name is set (#2474)
  • c7fe15b [Fix][Operator] Explictly wait for pod not found for satisfying the delete scale exectation (#3520)
  • 6c168a0 [Fix][RayCluster] Make the RayClusterReplicaFailureReason to capture the correct reason (#2282)
  • 084368a [Fix][RayCluster] fix missing pod name in CreatedWorkerPod and FailedToCreateWorkerPod events (#3057)
  • 96fbbc1 [Fix][RayJob] Invalid quote for RayJob submitter (#2949)
  • 40f5ddb [Fix][RayService] Raise error if spec.rayClusterConfig.headGroupSpec.headService.metadata.name is set (#2440)
  • efbd35e [Fix][RayService] Use LRU cache for ServeConfigs (#2683)
  • baa2cc6 [Fix][Release] Fix Krew release indenetation error (#3823)
  • b8c4e5c [Fix][Release] Fix KubeRay dahsboard image build pipeline (#3702)
  • a69252e [Fix][Sample-Yaml] Increase ray head CPU resource for pytorch minst (#2330)
  • f687794 [Fix][kubectl-plugin] Create separate namespaces for each kubectl plugin e2e test (#2745)
  • 3efef20 [Fix][kubectl-plugin] Don't print wrapped error for job submit startup (#3027)
  • c8d34f4 [Fix][kubectl-plugin] Fix no context nil error SIGSEGV in tests (#2892)
  • 909f66e [Fix][kubectl-plugin] Release bot opens PRs to Krew repo with unexpected whitespace changes (#3090)
  • abb0bf4 [Fix][kubectl-plugin] Remove controller-runtime logger warning in kubectl ray job submit (#3669)
  • b8484af [Fix][kubectl-plugin] Remove filepath.Clean for ray job submit workingDir (#3518)
  • 029cd78 [Fix][kubectl-plugin] make tests use a temporary kube config (#2894)
  • a860884 [Fix][kubectl-plugin] ray job submit runtime-env-json null error (#3063)
  • c094153 [Fix][kubectl-plugin]: make version handle digests (#2876)
  • d8b7c69 [Fix][precommit] Fix pre-commit golangci-lint always success (#2140)
  • 4b46822 [Fix]remove broken link in doc (#3519)
  • a614b1d [Follow Up][Test] Support to set QPS and burst by configuration (#3999)
  • 4bbaa06 [GCS FT] Add e2e tests for configuring GCS FT with annotations (#2766)
  • 1c9de23 [GCS FT] Consider the case of sidecar containers (#1386)
  • fe26dc4 [GCS FT] Enhance observability of redis cleanup job (#1709)
  • 55b1d39 [GCS FT] Give readiness / liveness probes good default values (#1364)
  • 6fa2d3a [GCS FT] Improve GCS FT cleanup UX (#1592)
  • 7f95a6c [GCS FT] More validations for configuring GCS FT with envs and annotations (#2772)
  • a81ea81 [GCS FT] Redis e2e cleanup check (#2773)
  • 937297c [GCS FT] Unify configuring Gcs FT into a single function (#2755)
  • e79e0b9 [GCS FT][Refactor] Redefine the behavior for deleting Pods and stop listening to Kubernetes events (#1341)
  • 6375221 [Golang] Remove go get (#1283)
  • 10cc898 [Grafana] Add a Cluster variable to the Grafana Dashboard to enable filtering of different RayClusters (#2685)
  • f6637d7 [Grafana] Add flag for enabling auto load dashboards (#3689)
  • 9f013a3 [Grafana] Allow auto-load dashboard jsons (#3643)
  • e89ae34 [Grafana] Update Grafana dashboard (#2106)
  • 848d400 [Grafana] Update Grafana dashboard (#3726)
  • 3425b4b [Grafana] Use PodMonitor instead of ServiceMonitor for the Head Node to avoid metric duplication (#2689)
  • 9e4e709 [Grafana] Use Range option instead of instant (#4062)
  • 627f529 [Grafana][Observability] Embed Grafana dashboard panels into Ray dashboard (#1278)
  • 17ee134 [HELM] Add Helm unit tests for chart kuberay-apiserver (#3361)
  • 9658af3 [HELM] Define name templates for all resources (#3381)
  • 5265ee0 [HELM] Fix serviceAccount name inconsistency in templates (#3451)
  • 75ea7ae [HELM] Typo correction (operatorComand -> operatorCommand) (#3450)
  • 22f570e [Helm Chart] Set honorLabel of serviceMonitor to true (#3805)
  • 9d46862 [Helm] Add gcsFaultToleranceOptions in RayCluster chart (#3881)
  • ef9206c [Helm] Add missing environment variables to operator chart (#3867)
  • 6114969 [Helm] Add priorityClassName for kuberay-operator chart (#3703)
  • 3a512da [Helm] Clean up RayCluster Helm chart ahead of KubeRay 0.4.0 release (#751)
  • 799f073 [Helm] Enable leader election when leaderElectionEnabled is not set (#2284)
  • 9296c22 [Helm] Make Kube Client QPS and Burst configurable for kuberay-operator (#4002)
  • ae91985 [Helm] Make reconcile concurrency configurable for kuberay-operator (#3962)
  • b65e4a0 [Helm] Use helm-docs to generate README for chart api-server automatically (#3916)
  • a099da3 [Helm] Use helm-docs to generate README for chart ray-cluster automatically (#3887)
  • 6db864d [Helm] add sizeLimit for emptyDir (#2532)
  • cde251a [Helm][RBAC] Introduce the option crNamespacedRbacEnable to enable or disable the creation of Role/RoleBinding for RayCluster preparation (#1162)
  • 831b55b [Helm][ray-cluster] Fix parsing envFrom field in additionalWorkerGroups (#1039)
  • 3a925f3 [Hotfix] Extend Autoscaler e2e tests timeout (#3665)
  • 981c943 [Hotfix] Increase the timeout of the ProxyActor health check (#2082)
  • 165291e [Hotfix][Bug] Avoid unnecessary zero-downtime upgrade (#1581)
  • 1fe5ae7 [Hotfix][Bug] suspend is not a stateless operation (#1741)
  • 9ad6b1b [Hotfix][CI] Pin setup-envtest dep (#2038)
  • 00dc45a [Hotfix][release blocker][RayJob] HTTP client from submitting jobs before dashboard initialization completes (#1000)
  • 12b9df2 [Kueue] Add a sample YAML for Kueue toy sample (#1956)
  • 4fc1799 [Logging] Avoid using fmt.Sprintf inside logging functions (#2508)
  • df03863 [Logging] Remove duplicate info in CR logs (#2531)
  • 23b08e0 [Logging] add context info for yunikorn logger (#2522)
  • 105e880 [Metric] kuberay_job_deployment_status (#3656)
  • d40692f [Metrics] Remove serviceMonitor.yaml (#3795)
  • fdd4bdb [Minor] Remove redundant variable (#2281)
  • 7905bcf [N/N][Lint] Group imports by sections (#3454)
  • 0775292 [Nit] Remove redundant code snippet (#1810)
  • f77ee03 [Perf] Add NUM_WORKERS and CPUS_PER_WORKER env to the mnist workload (#2126)
  • afab558 [Perf] Add a CPU-based image resizing workload using Ray Data (#2135)
  • c099de4 [Perf] Add a CPU-based training workload (#2116)
  • b5f237d [Perf] Improve perf-test YAMLs and README (#2110)
  • c83b1bd [Post Ray 2.2.0 Release] Update Ray versions to Ray 2.2.0 (#822)
  • 8da54d4 [Post Ray 2.3 Release] Update Ray versions to Ray 2.3.0 (#925)
  • 473dfdb [Post Ray 2.4 Release] Update Ray versions to Ray 2.4.0 (#1049)
  • cc4155b [Post Ray 2.7.0 Release] Update Ray versions to Ray 2.7.0 (#1423)
  • 666679f [Post Ray 2.8.0 Release] Update Ray versions to Ray 2.8.0 (#1678)
  • df3cc35 [Post release v0.5.0] Remove block from rayStartParams (#1015)
  • ba814ef [Post release v0.5.0] Remove block from rayStartParams for python client and KubeRay operator tests (#1050)
  • 67a0f44 [Post release v0.5.0] Remove serviceType (#1013)
  • 4234e5b [Post release v0.5.0] Update CHANGELOG.md (#1026)
  • dfc197f [Post release v0.5.0] Update release doc (#1028)
  • 72d1c21 [Post release v0.6.0] Update CHANGELOG.md (#1274)
  • ded9454 [Post v0.5.0] Remove init containers from YAML files (#1010)
  • 74496a0 [Post v1.0.0-rc.1] Reenable sample YAML tests for latest release and update some docs (#1544)
  • 4eed014 [Post v1.1.0] Run the sample YAML tests with KubeRay v1.1.0 (#2039)
  • 022ff0d [Prometheus] Add kuberay_cluster_provisioned_duration_seconds metric (#3212)
  • 2409109 [Prometheus] Add kuberay_cluster_info metric (#3535)
  • f7102b2 [Prometheus] Add serviceMonitor for KubeRay Operator (#3530)
  • 6cefc40 [Prometheus] Refactor kuberay_cluster_provisioned_duration_seconds (#3497)
  • 238cb4e [Quay] Sanity check for KubeRay repository setup (#1300)
  • fb1463f [REFACTOR]: refactor execute pod cmd with client-go function (#2467)
  • eef1d89 [Ray 2.3.0] Update --redis-password for RayCluster (#929)
  • 827814c [Ray 2.9.0 Release] Update Ray versions from 2.8.0 to 2.9.0 (#1770)
  • f652d5d [Ray Observability] Disk usage in Dashboard (#1152)
  • 7d0eae4 [Ray-operator] Feature flag login bash (#3679)
  • d4784a5 [RayCluster controller] Add headServiceAnnotations field to RayCluster CR (#841)
  • d1eeaab [RayCluster controller] [Bug] Unconditionally reconcile RayCluster every 60s instead of only upon change (#850)
  • 944b60c [RayCluster] Add multi-host indexing labels (#3998)
  • ffac341 [RayCluster] Add serviceName to status.headInfo (#2089)
  • 6463f25 [RayCluster] IsAutoscalingEnabled takes RayClusterSpec (#3111)
  • 13df016 [RayCluster] Make headpod name back to non-deterministic (#3872)
  • b9d8b1a [RayCluster] Make headpod name deterministic (#3028)
  • 04f5b71 [RayCluster] Toggle usage of deterministic/non-deterministic head pod name with feature flag (#3873)
  • d169f5c [RayCluster] Update sample yamls to use the new gcsFaultToleranceOptions option (#2856)
  • 829aad5 [RayCluster] Validate GCSFaultToleranceOptions and redis password (#2754)
  • eba1459 [RayCluster] Validate RayClusterSpec for empty containers and GCS FT (#2749)
  • 8d60d61 [RayCluster] don't allow overriding ray.io/cluster label (#2555)
  • 8feef9d [RayCluster] e2e test for GCS FT with Redis Username (#2855)
  • 9bd31cf [RayCluster] grant pods and pods/resize patch permissions for IPPR (#3960)
  • 28c729f [RayCluster] improve generated pod names for Ray clusters
  • a788963 [RayCluster] support suspending worker groups (#2663)
  • f11a1f5 [RayCluster] yunikorn batchscheduler respect gang scheduling (#4075)
  • 94636bd [RayCluster]Upgrade volcano to 1.11.0 (#3159)
  • 8362483 [RayCluster][CI] add e2e tests for RayClusterStatusCondition (#2661)
  • c62910f [RayCluster][CI] add e2e tests for the RayClusterSuspended status condition (#2686)
  • 801f081 [RayCluster][Expectation] Add a test to ensure expectations work well during scaling down (#3543)
  • c2f3823 [RayCluster][Feature] Make RayClusterStatusConditions feature gate Beta and enabled by default (#2562)
  • 7a768f9 [RayCluster][Feature] add GcsFaultToleranceOptions to the RayCluster CRD [1/N] (#2715)
  • 991b9c7 [RayCluster][Feature] add redis password to head pod from GcsFaultToleranceOptions (#2731)
  • 0055bf3 [RayCluster][Feature] add redis username to head pod from GcsFaultToleranceOptions (#2760)
  • 7bb82db [RayCluster][Feature] reject redis username to head pod out side of GcsFaultToleranceOptions (#2796)
  • 82e2554 [RayCluster][Feature] setup GCS FT annotations and the RAY_REDIS_ADDRESS env by the GcsFaultToleranceOptions (#2721)
  • d86ea62 [RayCluster][Feature] skip suspending worker groups if the in-tree autoscaler is enabled (#2748)
  • a4d7dd0 [RayCluster][Fix] Add expectations of RayCluster (#2150)
  • 42f299a [RayCluster][Fix] DesiredReplicas, MinReplicas and MaxReplicas should respect workerGroupSpec.Suspend (#2728)
  • 6c1c16e [RayCluster][Fix] evicted head-pod can be recreated or restarted (#2217)
  • b5bcb86 [RayCluster][Fix] leave .Status.State untouched when there is a reconcile error (#2622)
  • ae880c4 [RayCluster][Refactor] use RayClusterAllPodsAssociationOptions instead (#2756)
  • 17809bc [RayCluster][Status][1/n] Remove ClusterState Unhealthy (#2068)
  • 4da1838 [RayJob] Add Cluster Name For Rayjob. (#2046)
  • 8f06197 [RayJob] Add Failure Feedback (log and event) for Failed k8s Creation Task (#2306)
  • fe981a2 [RayJob] Add JobDeploymentStatusFailed Status and Reason Field to Enhance Observability for Flyte/RayJob Integration (#1942)
  • 1f44bdc [RayJob] Add Tests for Atomic Suspend Operation (#2050)
  • bb5b788 [RayJob] Add RayJobInfo to RayJob CRD status (#3673)
  • f53b42a [RayJob] Add additional print columns for RayJob (#1895)
  • 775715b [RayJob] Add default CPU and memory for job submitter pod (#1319)
  • 7386427 [RayJob] Add e2e sample yaml test for shutdownAfterJobFinishes (#1269)
  • aa17363 [RayJob] Add field to expose entrypoint num cpus in rayjob (#1359)
  • 5de4a42 [RayJob] Add runtime env YAML field (#1338)
  • 8682b2d [RayJob] Add spec.backoffLimit for retrying RayJobs with new clusters (#2192)
  • 9a4de56 [RayJob] ClusterSelector shouldn't support SidecarMode (#4074)
  • b33642f [RayJob] Deflaky RayJob e2e tests (#2963)
  • af4f6ac [RayJob] Delete the Kubernetes Job and its Pods immediately when suspending (#1791)
  • 9382c1f [RayJob] Enable job log streaming by setting PYTHONUNBUFFERED in job container (#1375)
  • 58a3ff0 [RayJob] Enhance RayJob DeletionStrategy to Support Multi-Stage Deletion (#4040)
  • 528abc3 [RayJob] Fix RayJob status reconciliation (#1539)
  • 370fc44 [RayJob] Follow up of RayJob deletion policy PR (#2763)
  • edfc34f [RayJob] Improve dashboard client log (#1903)
  • 4fb4578 [RayJob] Inject RAY_SUBMISSION_ID env variable for user provided submitter template (#1868)
  • 91fcd3e [RayJob] Propagate error traceback string when GetJobInfo doesn't return valid JSON (#943)
  • f191a75 [RayJob] RayJob deletion policy validation (#2771)
  • 931f970 [RayJob] Refactor Rayjob E2E Tests to Use Server-Side Apply (#1927)
  • 34a8d9f [RayJob] Rewrite RayJob envtest (#1916)
  • 2281d9e [RayJob] Set missing CPU limit (#1899)
  • f9b2cb1 [RayJob] Set the timeout of the HTTP client from 2 mins to 2 seconds (#1910)
  • 7639b9d [RayJob] Sidecar Mode (#3971)
  • 2583d85 [RayJob] Submit job using K8s job instead of checking Status and using DashboardHTTPClient (#1177)
  • 0074129 [RayJob] Support ActiveDeadlineSeconds (#1933)
  • f5d7131 [RayJob] Support deletion policies based on job status (#3731)
  • c78c75b [RayJob] Transition to Complete if the JobStatus is STOPPED (#1855)
  • 6b027b4 [RayJob] Unified checkBackoffLimitAndUpdateStatusIfNeeded codepath and add an e2e test for retry (#2215)
  • 55a6688 [RayJob] UserMode -> InteractiveMode and check rayjob.spec.jobId instead of annotation (#2446)
  • 024aaef [RayJob] Validate RayJob spec (#1813)
  • 6087689 [RayJob] Validate whether runtimeEnvYAML is a valid YAML string (#1898)
  • c9fa013 [RayJob] Yunikorn Integration (#3948)
  • d0f1c3c [RayJob] [Doc] Add real-world Ray Job use case tutorial for KubeRay (#1361)
  • 7f15e13 [RayJob] add Failing RayJob in HTTPMode e2e test for rayjob with retry (#2242)
  • 27b1dca [RayJob] add Failing submitter K8s Job e2e test for rayjob with retry (#2226)
  • bd33d54 [RayJob] add Light-weight RayJob Submitter (#3943)
  • 1efaf68 [RayJob] add RayJob pass Deadline e2e-test with retry (#2241)
  • 7e04b22 [RayJob] allow create verb for services/proxy, which is required for HTTPMode (#2321)
  • 0106303 [RayJob] avoid RayCluster resource leak in k8s job mode(#3903) (#4080)
  • 0544f8b [RayJob] implement deletion policy API (#2643)
  • 72a7767 [RayJob] remove redundant RayJob status-transition logs in reconciler (#3976)
  • 1c5f3e8 [RayJob]: Add RayJob with RayCluster spec e2e test (#1636)
  • 631cd7c [RayJob]: Always use target RayCluster image as default RayJob submitter image (#1548)
  • b0fee80 [RayJob][10/n] Add finalizer to the RayJob when the RayJob status is JobDeploymentStatusNew (#1780)
  • c0b6b0d [RayJob][Chore] make err as a local variable (#2789)
  • 0274faa [RayJob][Doc] Fix RayJob sample config. (#807)
  • bcc8c09 [RayJob][Fix] Use --no-wait for job submission to avoid carrying the error return code to the log tailing (#3216)
  • c45d959 [RayJob][Kueue] Move limitation check to validateRayJobSpec (#1854)
  • 0ed5e7e [RayJob][Refactor] use ray job status and ray jog lobs to be tolerant of duplicated job submissions (#2579)
  • ce4ec27 [RayJob][Status][1/n] Redefine the definition of JobDeploymentStatusComplete (#1719)
  • 4ff389b [RayJob][Status][11/n] Refactor the suspend operation (#1782)
  • 349068d [RayJob][Status][12/n] Resume suspended RayJob (#1783)
  • 448e33d [RayJob][Status][13/n] Make suspend operation atomic by introducing the new status Suspending (#1798)
  • a9c7abb [RayJob][Status][14/n] Decouple the Initializing status and Running status (#1801)
  • f654665 [RayJob][Status][15/n] Unify the codepath for the status transition to Suspended (#1805)
  • 83327f2 [RayJob][Status][16/n] Refactor Running status (#1807)
  • 1eed068 [RayJob][Status][17/n] Unify the codepath for status updates (#1814)
  • c55f3cc [RayJob][Status][18/n] Control the entire lifecycle of the Kubernetes submitter Job using KubeRay (#1831)
  • d5d7e5f [RayJob][Status][19/n] Transition to Complete if the K8s Job fails (#1833)
  • ba42038 [RayJob][Status][2/n] Redefine ready for RayCluster to avoid using HTTP requests to check dashboard status (#1733)
  • 8760d90 [RayJob][Status][3/n] Define JobDeploymentStatusInitializing (#1737)
  • 62bbc13 [RayJob][Status][4/n] Remove some JobDeploymentStatus and updateState function calls (#1743)
  • 1594e88 [RayJob][Status][5/n] Refactor getOrCreateK8sJob (#1750)
  • d49a7af [RayJob][Status][6/n] Redefine JobDeploymentStatusComplete and clean up K8s Job after TTL (#1762)
  • 59503c6 [RayJob][Status][7/n] Define JobDeploymentStatusNew explicitly (#1772)
  • cac7648 [RayJob][Status][8/n] Only a RayJob with the status Running can transition to Complete at this moment (#1774)
  • 6af407d [RayJob][Status][9/n] RayJob should not pass any changes to RayCluster (#1776)
  • 3e64de3 [RayJob][Test] make sure annotation populated to RayCluster (#3199)
  • 7f33c1d [RayJob][Test] refactor TestValidateRayJobSpec with table test (#3223)
  • 834aed3 [RayService] Add New Status: NumServeEndpoints (#1901)
  • bbb65b4 [RayService] Add RayService High Availability Test Doc (#1986)
  • e062d07 [RayService] Add RayService alb ingress CR (#1169)
  • 1c276f7 [RayService] Add a safeguard and remove the dead code to ensure that both clusters are not empty before reconciling serve (#2778)
  • c13949e [RayService] Add an envtest for RayService happy path (#2868)
  • c807790 [RayService] Add an envtest for autoscaler (#2872)
  • 62faf27 [RayService] Add checks of RayService conditions in e2e tests (#2864)
  • 8db4f6d [RayService] Add e2e tests (#1167)
  • 0ee3983 [RayService] Add logs and remove in-place update for the TestOldHeadPodFailDuringUpgrade e2e test (#2819)
  • 44c0d50 [RayService] Add support for multi-app config in yaml-string format (#1156)
  • 77a1023 [RayService] Add unit tests for isZeroDowntimeUpgradeEnabled (#2871)
  • 355de9a [RayService] Add zero-downtime triggered test after rayVersion is updated (#2881)
  • de3e037 [RayService] Address Recent Flakiness in RayService Zero Downtime Rollout Test (#1979)
  • a6cf6e0 [RayService] Allow updating WorkerGroupSpecs without rolling out new cluster (#1734)
  • f46f328 [RayService] Always check the readiness of head Pods for both pending / active clusters if cluster exists (#2783)
  • edd332b [RayService] Avoid Duplicate Serve Service (#1867)
  • 1a4254c [RayService] Avoid passing RayServiceStatus to functions in reconcileServe (#2828)
  • f932962 [RayService] Avoid sending health check requests to the head Pod when excludeHeadPodFromServeSvc is true (#2776)
  • 80cab41 [RayService] Calculate status based on K8s resources (#2818)
  • 47d55fe [RayService] Change runtime env for e2e autoscaling test (#1178)
  • 850fd48 [RayService] Compare cached hashed config before triggering update (#655)
  • 5a5534f [RayService] Create k8s events after creating/updating k8s resources (#2873)
  • 41ee4db [RayService] Deflaky RayService envtest (#2962)
  • 2970e36 [RayService] Deprecate the built-in ingress support of RayService (#1843)
  • a03f721 [RayService] Fixed issue where the custom serve port is not reflected in the serve health check for worker Pods (#1816)
  • deb29bd [RayService] Ignore deployments status to decide whether to deploy serve application (#1014)
  • 33ee672 [RayService] Mark ServiceStatus as deprecated (#2863)
  • 18bee57 [RayService] Merge initConditions into calculateConditions (#2866)
  • 0143fef [RayService] More envtests that follow the most common scenario in the RayService code path (#2880)
  • 96a2ce6 [RayService] Move HTTP Proxy's Health Check to Readiness Probe for Wokers (#1808)
  • 81d7608 [RayService] Move cleanUpRayClusterInstance from reconcileRayCluster to Reconcile (#2838)
  • e1bee82 [RayService] Move the cluster switch logic from reconcileServe to Reconcile (#2777)
  • 495c0aa [RayService] Move the update of RayClusterStatus to calculateStatus (#2826)
  • a31e094 [RayService] Passing serve applications to calculateStatus and avoid calling Status().Update(...) inside reconcileServe (#2831)
  • 5b8e9c6 [RayService] Refactor createRayClusterInstance (#2874)
  • 1265980 [RayService] Refactor reconcileRayCluster to avoid updating CR status in the function (#2859)
  • bb31661 [RayService] Refactor updateRayClusterInstance (#2875)
  • ab93442 [RayService] Refactor envtests (#2888)
  • 8e1b922 [RayService] Refactor fake http proxy client and test (#2636)
  • e616dc4 [RayService] Refactor to Rely More on RayService Status in RayService E2E Tests (#1928)
  • bccb358 [RayService] Refactor unit tests for ShouldPrepareNewCluster (#2928)
  • 17a534d [RayService] Remove WaitForServeDeploymentReady (#2842)
  • 26cdacd [RayService] Remove HealthLastUpdateTime from ServeDeploymentStatus (#2825)
  • 9263dc6 [RayService] Remove updateStatusForActiveCluster (#2827)
  • 9c9797b [RayService] Remove everything related to Ray Serve V1 API (#1790)
  • 019a6cd [RayService] Remove outdated env tests (#2886)
  • 3c080dc [RayService] Remove serve v1 API (#1779)
  • 9e4aa8a [RayService] Remove the dependencies between constructRayClusterForRayService and the reconciler to make it more unit testable (#2853)
  • 7ef5654 [RayService] Rename Restarting to PreparingNewCluster (#2785)
  • 3959509 [RayService] Revisit the conditions under which a RayService is considered unhealthy and the default threshold (#1293)
  • 2f8ee7f [RayService] Setting observedGeneration inside calculateStatus (#2869)
  • 0df4d8a [RayService] Skip update events without change (#811)
  • c3f3736 [RayService] Stable Diffusion example (#1181)
  • b0649c4 [RayService] Submit requests to the Dashboard after the head Pod is running and ready (#1074)
  • 2acc219 [RayService] Support Incremental Zero-Downtime Upgrades (#3166)
  • 7940407 [RayService] Track whether Serve app is ready before switching clusters (#730)
  • 64da63b [RayService] Trim Redis Cleanup job less than 63 chars (#2846)
  • 7fd79f8 [RayService] Unify multi-app and single-app codepath (#1787)
  • 46355ed [RayService] Unify the cluster switch over logic together (#2805)
  • ecd1539 [RayService] Update docs to use multi-app (#1179)
  • 25f787b [RayService] Use DashboardPort for RayService instead of DashboardAgentPort (#1742)
  • f7cf955 [RayService] Use Ready condition in e2e tests (#2849)
  • 8ea39da [RayService] Use Ready condition in e2e tests (#2854)
  • 4e912b9 [RayService] Use original ClusterIP for new head service (#2343)
  • 9be883f [RayService] Use waitGroup to ensure goroutine completion in rayservice_ha_test (#2657)
  • b753f1a [RayService] a safeguard for preventing overriding the pending cluster during a upgrade (#2887)
  • f88b2fe [RayService] adapter vllm 0.6.1.post2 (#2823)
  • b66763d [RayService] don't update serveConfigV2 in current ray cluster if ray… (#3559)
  • a612670 [RayService] e2e for check the readiness of head Pods for both pending / active clusters (#2806)
  • 8f75ad5 [RayService] e2e for redeploying RayServe application after recreating a new Head Pod (#2834)
  • 78d030a [RayService] fix kubebuilder printcolumn annotations for RayService (#1981)
  • 0056fbf [RayService] make RayClusterSpec required (#3169)
  • 19924c3 [RayService] make checkIfNeedSubmitServeApplications more unit testable (#2822)
  • e11fe54 [RayService] refactor envtest by adding a util function rayServiceTemplate (#2833)
  • d64bf59 [RayService] reword the comment on ServiceStatus = rayv1.Running (#2848)
  • 2e8f532 [RayService][Bug] Serve Service May Select Pods That Are Actually Unready for Serving Traffic (#1856)
  • 19054cb [RayService][Doc] RayService troubleshooting handbook (#1221)
  • 73f4f21 [RayService][HA] Fix flaky tests (#1823)
  • 6c2281c [RayService][Health-Check][1/n] Offload the health check responsibilities to K8s and RayCluster (#1656)
  • 4557a01 [RayService][Health-Check][2/n] Remove the hotfix to prevent unnecessary HTTP requests (#1658)
  • aa42f8b [RayService][Health-Check][3/n] Update the definition of HealthLastUpdateTime for DashboardStatus (#1659)
  • 07d14de [RayService][Health-Check][4/n] Remove the health check for Ray Serve applications (#1660)
  • 584132c [RayService][Health-Check][5/n] Remove unused variable deploymentUnhealthySecondThreshold (#1664)
  • ed56a95 [RayService][Health-Check][6/n] Remove ServiceUnhealthySecondThreshold (#1665)
  • 2767768 [RayService][Health-Check][7/n] Remove LastUpdateTime from multiple places (#1666)
  • c54c3d9 [RayService][Health-Check][8/n] Add readiness / liveness probes (#1674)
  • aad2fc6 [RayService][Hotfix] Hotfix for Flaky Zero Downtime Rollout Test (#1837)
  • dd7789c [RayService][Observability] Add actionable logging messages for users when they do not specify ports for Ray Serve (#1218)
  • 384a921 [RayService][Observability] Add more logging for RayService troubleshooting (#1230)
  • 45d3a4f [RayService][Observability] Add more loggings about networking issues (#1282)
  • 881008f [RayService][Refactor] Avoid flooding Kubernetes events (#2546)
  • 3c8904c [RayService][Refactor] Change the ServeConfigs to nested map (#2591)
  • 75dbbdf [RayService][Refactor] Remove ctrlResult (#2545)
  • c620582 [RayService][Status][1/n] Remove DashboardStatus (#1839)
  • 0575bd1 [RayService][Status][2/n] Remove WaitForDashboard (#1840)
  • 57c6397 [RayService][Test] create curl pod waiting until running (#3740)
  • 594eafc [RayService][Test] make sure annotation populated to RayCluster (#3210)
  • c3b3354 [RayService][Test] util for creating empty RayClusterSpec in test (#3182)
  • 39d1456 [RayService][refactor] Remove updateState (#2705)
  • da6b356 [Refactor] Add a util function IsAutoscalingEnabled and refactor validations of RayJob deletion policy (#2775)
  • c814963 [Refactor] Define the value type of the concurrent map explicitly to avoid type conversion (#1789)
  • c5d7de6 [Refactor] Do not use RAYCLUSTER_DEFAULT_REQUEUE_SECONDS_ENV as timeout of status check in tests (#1755)
  • b898828 [Refactor] Eliminate redundant range variable capture with Go 1.22 scoped iteration (#4044)
  • 7a96221 [Refactor] Encapsulate RayCluster metrics in a custom Prometheus collector (#3310)
  • 0b72901 [Refactor] Encapsulate RayJob metrics in a custom Prometheus collector (#3444)
  • b875b85 [Refactor] Extract KubectlApplyYaml and yaml deserialization to support package (#2498)
  • 59ae107 [Refactor] Fix CreatedWorkerPod for worker Pod deletion event and refactor logs (#2346)
  • f38951f [Refactor] Follow-up for PR 1930 (#2124)
  • a616a45 [Refactor] Format API server Makefile for consistency (#3435)
  • a83d3c1 [Refactor] Improve API server developer experience (#3458)
  • 4ff8316 [Refactor] Improve developer experience of API server e2e-test (#3466)
  • 4492fe2 [Refactor] Make port name variables consistent and meaningful (#1389)
  • 7f02eb7 [Refactor] Merge raycluster_gcs_ft_test.go and raycluster_gcsft_test.go (#3008)
  • 298539d [Refactor] Move ValidateRayJobStatus to validation.go and create its unit test (#2813)
  • 8c53bd5 [Refactor] Move ValidateRayClusterSpec to validation.go and its unit test to validation_test.go (#2790)
  • 8dd2496 [Refactor] Move validateRayClusterStatus function to validation.go and move unit test to validation_test.go (#2780)
  • 84f7368 [Refactor] Move constant.go from common to utils to avoid circular dependency (#1726)
  • 3d1c6c3 [Refactor] Move function ValidateRayJobSpec to validation.go and its unit test (#2812)
  • 28ab5c9 [Refactor] Move functions that don’t rely on the controller to non-controller member functions (#2747)
  • 0867021 [Refactor] Move test name from map key to struct field (#2865)
  • 5ccf361 [Refactor] Move validateRayServiceSpec to validation.go and its unit test to validation_test.go (#2816)
  • 3a1fedb [Refactor] Parameterize TestGetAndCheckServeStatus (#1450)
  • b775821 [Refactor] RayJob Spec ClusterSelector validation logic (#4032)
  • 83104b7 [Refactor] Refactor testRayJob global variable to avoid test side effects (#4017)
  • 3d533b4 [Refactor] Remove Dashboard Agent service (#1207)
  • 3748746 [Refactor] Remove any unnecessary logger (#1894)
  • bafb009 [Refactor] Remove cleanupInvalidVolumeMounts (#2104)
  • 03eb92c [Refactor] Remove duplicate definition of get_ray_cluster_status (#3608)
  • eee9d94 [Refactor] Remove global utils.GetRayXXXClientFuncs (#1727)
  • 5007993 [Refactor] Rename EnableAgentService to EnableServeService (#1673)
  • 76889ca [Refactor] Rename raycluster_controller_fake_test.go to XXX_unit_test.go (#2074)
  • 4836d01 [Refactor] Renaming RayHttpProxyClient attribute UseProxy #1980 (#2093)
  • 542f246 [Refactor] Replace Hard-Coded HTTP Values with Constants (#2702)
  • 0f2f441 [Refactor] Rewrite RayCluster envtest (#1949)
  • dcc8b71 [Refactor] Run golangci-li...

Don't miss a new kuberay release

NewReleases is sending notifications on new releases.