Changelog
- d59cbd8 Fix rayClusterScaleExpectation deletion to use request object when instance is nil (#4039)
- 480e128 Inject the --block option to ray start command automatically (#932)
- 7850773 Remove ray-cluster.without-block.yaml (#675)
- 38ac168 [Telemetry] Inject env identifying KubeRay. #562
- 97425fa AGC gateway api example (#4076)
- dc203e2 Add DeepSeek example RayService (#3838)
- 7c0aa63 Add FAQ page (#1150)
- dcb97ce Add Grafana Dashboard for KubeRay Operator (#3676)
- bf7e497 Add Helm chart unit tests to ray-cluster (#3374)
- 113909f Add Helm chart unittests to CI (#3280)
- d0e8b57 Add KubeRay e2e Test for custom idleTimeoutSeconds with v2 Autoscaler (#2725)
- 6818a08 Add KubeRay related blogs (#1147)
- 042e6b4 Add NumOfHosts to RayCluster helm-chart template (#1969)
- 5adc91a Add NumOfHosts to WorkerGroupSpec (CRD change only) (#1834)
- 80a6d58 Add Ray cluster spec for TPU pods (#1292)
- abb5291 Add RayCluster YAML for verl example (#3833)
- f232b5b Add RayClusterProvisioned Condition Type (#2301)
- cbaf5d7 Add RayClusterReady Condition Type (#2271)
- 732453e Add RayJob training example using pytorch resnet image classifier (#2107)
- a0afea2 Add RayService Manifests for Stable Diffusion TPU Examples (#2198)
- 753dc05 Add RayService sample test (#1377)
- c835117 Add TPU to Known Custom Accelerators for generated rayStartCommand (#2495)
- f05fa2e Add ray.io/originated-fromlabels (#1830)
- 87407ac Add a document for profiling (#1299)
- 08792ca Add a document to outline the default settings for rayStartParamsin Kuberay (#1057)
- b92d95a Add a flag to enable/disable worker init container injection (#1069)
- 8851088 Add a grouping for 'google.golang.org/*' to avoid inconsistency between sub-projects (#3470)
- ceb9f01 Add a sample RayJob to fine-tune a PyTorch lightning text classifier (#1891)
- a851490 Add a test util function for killing the head Pod and wait (#3890)
- 073de1f Add a util function to convert string and bytes array (#2621)
- 3e20a9d Add a variant of the ray data processing job with GCSFuse CSI driver (#2401)
- 7fe4050 Add a warning to discourage users from launching a KubeRay-incompatible autoscaler. (#1102)
- 8b61b73 Add all and worker node type to kubectl ray log (#2442)
- 966d9b3 Add apply configurations to generated client (#1818)
- ef0129e Add basic Helm chart unittests for kuberay-operator (#3253)
- cd239ab Add basic e2e test for kubectl plugin (#2287)
- e1edb4c Add batch-scheduler option, deprecate enable-batch-scheduler option (#2300)
- e330c03 Add common containerEnv section to Helm Chart (#1932)
- 70ef243 Add consistency check for deepcopy generated files (#1127)
- 36267ed Add dashboard component to master (#3566)
- cb2914d Add deletecollection for multi-namespace role (#2) (#2231)
- 8bb3222 Add dependabot.yml for enabling "Dependabot version updates" (#3357)
- 9a0b9d0 Add dnsConfig to head, worker and additional workers (#2377)
- 80ce664 Add documentation for API Server monitoring (#1479)
- 0bd28e7 Add documentations for the release process of Helm charts (#723)
- 7f3fe8b Add e2e KubeRay operator upgrade test (#3060)
- e9f3155 Add e2e test for kubectl ray job submit (#2614)
- 4fc48ce Add e2e test make sure resource quota error is surfaced (#3087)
- ff45923 Add end to end tests to apiserver (#1460)
- 4b0f7cb Add env and patch permission. (#740)
- 1bcfa9e Add env variable comment to kuberay-operator
- d93c3c9 Add example and tutorial to explain how to create custom metrics for Prometheus (#914)
- a34a42a Add flag leader-election-namespace (#1624)
- a2ebc61 Add gofumpt instructions from internal doc (#1180)
- 044008d Add instruction to skip unit tests in DEVELOPMENT.md (#1171)
- abafd17 Add kubectl plugin with basic command and deprecate cli (#2243)
- 3e68606 Add kubectl ray cluster log command (#2296)
- 12babc8 Add kubectl ray create cluster (#2607)
- 61a282f Add kubectl ray delete rayservice/job/cluster (#2635)
- 8c64e60 Add kubectl-plugin pre-commit (#2255)
- 11c75ea Add kuberay operator servicemonitor (#3717)
- 25d5568 Add kubernetes dependency in python client library (#998)
- c8f826b Add kubernetes event to inform user of upgrade strategy (#2592)
- 106f8fd Add missing labels on RayCluster TPU manifests (#1987)
- 4a12d78 Add more grouping to resolve inconsistencies when bumping versions (#3554)
- 7856027 Add rayVersion in the RayCluster chart (#975)
- 4021766 Add rayjob yaml generation to ray job submit command (#2644)
- d22d752 Add release command and guidance for KubeRay cli (#834)
- e9544fc Add reminders to avoid RBAC synchronization bug (#576)
- 08da595 Add seccompProfile to KubeRay operator deployment for PSS compliance (#3931)
- 522807d Add seccompProfile.type=RuntimeDefault to kuberay-operator. (#1955)
- b5b4232 Add structured config and default sidecar container configuration (#1822)
- 224a444 Add support for openshift routes (#1183)
- 43ed246 Add support for parsing neuron core resource limit and pass it as ray… (#2409)
- 2de3fe5 Add support for pvcs to apiserver (#1118)
- 3cc6116 Add support for tolerations, env, annotations and labels (#1070)
- aeba37e Add test for autoscaler and its desired state (#2601)
- 76633c5 Add test for configurable k8s job backoff limit (#2134)
- 865affa Add tools and docs for changelog generator (#833)
- e36183d Add top-level Labels and Resources Structed fields to HeadGroupSpecandWorkerGroupSpec(#4106)
- 36102a0 Add topology spread constraints test for RayCluster (#2472)
- 658bd9e Add unit test for cluster get and add steps in workflows (#2263)
- e6722b0 Add v4 TPU manifests samples (#1968)
- 33ccc9a Add v6e TPU Ray CR Manifests (#2445)
- b227924 Add vLLM TPU example RayService manifest (#3000)
- f8ed876 Add validating webhook (#1584)
- ecd6eca Add validation for RAY_enable_autoscaler_v2 environment variable (#3963)
- d950d59 Add volcano taskSpec annotations to pod (#1754)
- 925effe Add workerGroupSpec.idleTimeoutSeconds to v1 RayCluster CRD (#2558)
- 4e1454e Added Pod securityContext value to Helm charts (#2160)
- 1f728c5 Added Python API server client (#1561)
- 280902f Added Ray-Serve Config For LLMs (#3517)
- bc90674 Added security to the API server (#1677)
- ccd88cc Added support for ephemeral volumes and ingress creation support (#1409)
- 803374e Adding API server support for service account (#1148)
- 9af8215 Adding a test for the document for the Pod security standard (#866)
- 6e4ac23 Adding capability to create ray cluster with serve support -clean (#1672)
- d10103d Adding example of manually setting up NGINX Ingress (#699)
- 61adf56 Align Init Container's ImagePullPolicy with Ray Container's ImagePullPolicy (#1080)
- 761559e Align RayJob's ManagedBy with RayCluster's ManagedBy. (#2630)
- 584da5a Alkanso/python client (#901)
- 59d703f Allow E2E tests to run with arbitrary k8s cluster (#1306)
- c857ca4 Allow annotations in ray cluster helm chart (#574)
- 847585d Allow app.kubernetes.io/component to be overriden (#3198)
- 828afba Allow configuration of restartPolicy (#2197)
- 153f35c Allow manually creating init containers in Kuberay helm charts (#1287)
- ff66bcb Allow to install and remove operator via scripts (#1545)
- 4892ac1 Api server makefile (#1301)
- f0b5ea4 Api server refactor/allow multiple job statuses in jobe2e (#3363)
- 7de5f10 Api server refactor/allow multiple job statuses in servicee2e (#3375)
- 6901e4d Best practice for fault-tolerant redis with kuberay (#2684)
- be4f988 Build Headless Service for Multi-Host TPU Worker Pods (#1920)
- 51b64f6 Buildkite autoscaler e2e (#2199)
- 6c235d8 Bump @babel/runtime from 7.24.1 to 7.27.1 in /dashboard (#3591)
- 530318b Bump Kubernetes dependencies to v0.34.x (#4147)
- 91245ad Bump braces from 3.0.2 to 3.0.3 in /dashboard (#3590)
- 3858146 Bump crd-ref-docs to v0.2.0 for Go 1.24+ compatibility (#4029)
- 410e8fb Bump github.com/Masterminds/semver/v3 in /ray-operator (#3500)
- 168dd43 Bump github.com/emicklei/go-restful in /ray-operator (#1348)
- 2590a0b Bump github.com/jarcoal/httpmock from 1.2.0 to 1.4.0 in /ray-operator (#3536)
- 196f959 Bump github.com/onsi/gomega from 1.36.2 to 1.37.0 in /apiserver (#3475)
- 00b4b14 Bump github.com/prometheus/client_golang in /apiserver (#3394)
- 2a20425 Bump github.com/rs/zerolog from 1.33.0 to 1.34.0 in /apiserver (#3393)
- 4a4471a Bump github.com/spf13/cobra from 1.8.1 to 1.9.1 in /kubectl-plugin (#3499)
- 587c6ff Bump go to 1.22.4 to fix ray-operator vulnerabilities (#2325)
- 00c926e Bump go.mongodb.org/mongo-driver from 1.3.4 to 1.5.1 in /apiserver (#1407)
- f6a5c73 Bump golang.org/x/net from 0.14.0 to 0.17.0 in /experimental (#1701)
- 7e72627 Bump golang.org/x/net from 0.26.0 to 0.33.0 in /proto (#2723)
- 3a6aac4 Bump golang.org/x/net from 0.33.0 to 0.38.0 in /experimental (#3407)
- 4a3a373 Bump golang.org/x/net in /cli (#1405)
- a8f730e Bump golang.org/x/net in /proto (#1345)
- aafe2e0 Bump golang.org/x/net to v0.33.0 fix upstream vulnerability (#2799)
- 26deb40 Bump golang.org/x/sys in /cli (#1347)
- 2292e61 Bump golang.org/x/sys in /proto (#1346)
- 53b7026 Bump golang.org/x/text from 0.3.5 to 0.3.8 in /proto (#1344)
- a9255ce Bump google.golang.org/grpc from 1.64.0 to 1.64.1 in /cli (#2229)
- 8bdd7de Bump google.golang.org/grpc from 1.64.0 to 1.64.1 in /experimental (#2248)
- 0d16293 Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /cli (#1993)
- 7d49b26 Bump google.golang.org/protobuf from 1.32.0 to 1.33.0 in /experimental (#1992)
- 6671427 Bump google.golang.org/protobuf from 1.34.2 to 1.36.6 in /experimental (#3395)
- 8778327 Bump google.golang.org/protobuf from 1.36.5 to 1.36.6 in /apiserver (#3391)
- f605b6c Bump nanoid from 3.3.7 to 3.3.11 in /dashboard (#3589)
- 05b77e1 Bump next from 15.2.3 to 15.2.4 in /dashboard (#3709)
- 1c07bc1 Bump sigs.k8s.io/controller-runtime from 0.19.0 to 0.20.4 in /apiserver (#3392)
- 06ccd09 Bump the golangci-lint version in the api server makefile (#1342)
- 16e44d3 Bump the google-golang group across 5 directories with 3 updates (#3493)
- 102d9e9 Bump the kubernetes group across 3 directories with 9 updates (#3390)
- 8fcfb9d Bump tj-actions/verify-changed-files in /.github/workflows (#1795)
- 3738f78 CVE fix - Upgrade golang.org/x/net (#2081)
- df0565a Change Kuberay operator Deployment strategy type to Recreate (#566)
- 085dbb5 Change the rules in role.yamlandmultiple_namespaces_role.yamlto use the same template in_helpers.tplto ensure consistency. (#2244)
- 7c6aedf Changes required make a build after update of component-base (#3004)
- bf3fd63 Check existing pods for suspended RayCluster before calling DeleteCollection (#1745)
- db42cc5 Chore: fix indentation issues in RayJob sample YAML (#3874)
- 046d4c4 Clean up WorkersToDelete field during the CI test (#1763)
- 8282e6b Configuration Test Framework Prototype (#605)
- f52b8bc Connect Ray client with TLS using Nginx Ingress on Kind cluster (#1051)
- 86506d6 Convert byte slice and string without copy (#2628)
- 1c6c4ae Correct sumGPUsto include MIGs in count (#3933)
- cce10a6 Cross-reference docs. (#703)
- 4b5085f Customize the Prometheus export port (#954)
- 561a098 Delete [raycluster|rayjob|rayservice]_types_test.gounnecessary tests (#2935)
- 3a7a17f Delete ray_v1alpha1_rayjob.batch-inference.yaml (#1360)
- 7bc9c94 Dependencies: Upgrade golang.org/x packages (#1281)
- 9831375 Deprecate Kuberay CLI for Ray Kubectl plugin (#2246)
- 197fcc2 Do not update pod labels if they haven't changed (#1304)
- 1cbac51 Documentation and example for running simple NLP service on kuberay (#1340)
- 2ac9c44 Don't print redundant time unit in the log message (#2335)
- ee0a895 Don’t assign the rayv1.Failed to the State field (#2258)
- cf1c6f7 Downgrade kindfrom tov0.20.0tov0.11.1(#1313)
- 33ba385 Drop unused configmaps/status permission + configurable binary path (#2478)
- d1d9e29 Enable test framework to install operator with custom config and put operator in a namespace with enforced PSS in security testing (#876)
- efed875 Enhancements to e2e test, adding Autoscaling (#1765)
- dbcc686 Ensure all temp files are deleted after the compatibility test (#886)
- 6490749 Ensure container ports without names are also included in the head node service (#891)
- e93ebcc Example Pod to connect Ray client to remote a Ray cluster with TLS enabled (#994)
- 2d52001 Example RayCluster spec with Labelsandlabel_selectorAPI (#4136)
- 28d07c9 Expose entire head pod Service to the user (#1040)
- c610f70 Expose security context in helm chart. (#773)
- e430a93 Exposing Serve Service (#1117)
- dc17fb4 Exposing min/max replica counts for default worker group (#1963)
- ba50bfa Fall back to CPU requests if limit is not specified (#2365)
- f6b4f17 Feature/cron scheduling rayjob 2426 (#3836)
- 35fe6f9 Fix CI (#1145)
- 271b25d Fix FromAsCasing warning. (#2830)
- f5fb7d4 Fix Log to indicate we are Using DashboardPort in RayService (#2001)
- 1ced2b9 Fix RayCluster auth sample to include --config-file in kube-rbac-proxy (#2604)
- bc0562d Fix apiserver linter (#3296)
- 348ef38 Fix broken link in documentation (#3697)
- 7e21b5d Fix duplicated volume issue (#690)
- 389ba00 Fix finalizer typo and re-create manifests (#631)
- c9665df Fix for Sample YAML Config Test - 2.4.0 Failure due to 'suspend' Field (#1096)
- 4182477 Fix for deprecate-cli deploy error (#2251)
- 3571d52 Fix in HeadPod Service Generation logic which was causing frequent reconciliation (#1056)
- 0d848f9 Fix incorrect comment in raycluster_controller.go (#3003)
- 5769a65 Fix issue where unescaped semicolons caused task execution failures. (#3691)
- 19ddf04 Fix issue with head pod not monitered by Prometheus under certain condition (#963)
- 3c53af6 Fix issue with operator OOM restart (#946)
- 7dcdb26 Fix light weight job submitter e2e flaky test (#4092)
- 5bf70e8 Fix logging issue for FetchHeadServiceURL (#2216)
- 648d841 Fix misconfiguration. (#602)
- 9265154 Fix mkDocs (#1448)
- ed4b75c Fix ray nightly image env var setup (#3826)
- 2b8947c Fix release actions (#1323)
- 422098d Fix typo (#1232)
- 8b2acf5 Fix typo (#1241)
- 9404492 Fix typo in DEVELOPMENT.md (#1698)
- ed7f3db Fix upgrade gomega (#3483)
- 047699f Fix v6e TPU Scripts and RayJob CRs (#2447)
- 242c7b9 Fix versioning in sample manifests (#1857)
- 8d25e9d Fix/make helm and kustomize consistent (#2624)
- 321f985 Fix: Helm lint and test CI failed (#3505)
- cc6b7ba Fix: Typo (#1295)
- 86f896b Fixed download URL for Helm chart (#573)
- 4432b78 Fixed processing of job submitter (#1562)
- 1ec290f Fixed the issue with jobSubmitter resources (#1676)
- 2c26cff Fixes to shorten generated Route name with consideration for namespace (#1883)
- 91361e3 Fixing Python client handling of env from (#1845)
- 5c9db54 Flip Min and max replicas for apiserver workerNodeSpec (#1638)
- 4772827 Follow up 3992: Remove logs and add comments (#4006)
- e7f0c2c Generate RayCluster Hash on KubeRay Version Change (#2320)
- cb86f9f Get details of only declarative serve apps (#4084)
- de675e0 Handle nil HostPath type in GetVolumeHostPathType and add unit tests. (#3965)
- 2d38f51 Helm chart ray-cluster template reference fix (#1469)
- 607ac1f Helm: add service type configuration to head group for ray-cluster (#614)
- bdbf379 Improve Grafana Dashboard (#3734)
- 1634d70 Improve flexibility in RayCluster yaml test (#1812)
- eb66a26 Improve log message wording when service already exists during reconciliation (#4096)
- 753429d Improve the observability of the init container (#1149)
- 4199879 Include KUBERAY_VERSION in the user-agent (#2042)
- 9c55fc4 Increase head node memory limit for RayService sample to avoid OOM (#4089)
- 656602f Increase rayJob e2e timeout (#4124)
- 8fb4ee9 Increased time precision using uint (#1675)
- 6823da1 Init dashboardClientFunc and httpProxyClientFunc by the config arg (#2092)
- 87dde22 Inject cluster name as an environment variable into head and worker pods (#934)
- 928d690 Integrate with rayci (#3215)
- 738801d Integration: KAI Scheduler (#3886)
- ca9348d Kuberay 0.5.0 docs validation update docs for GCS FT (#1004)
- c7edeae Make KubeRay Operator Image FIPS compliant (#1633)
- 9662bd9 Make k8s job backoff limit configurable for RayJob (#2091)
- 8ec59e5 Make sure kubectl ray logs only get ray container logs (#2649)
- 1d98fec MobileNet example (#1175)
- 2bb04c9 Move BatchSchedulerManagerinto reconciler option (#3935)
- 5fde3c6 Move matching labels to association.go (#2734)
- 249610c Numerous fixes to the API server to make RayJob APIs working (#1447)
- 974bedf One word typo fix in docs and README (#1068)
- a1ef760 Only build/push Multi Arch images when merging to master (#1764)
- 510827f Only try once in HTTP health check commands (#3469)
- e7fbf7d Operator support for openShift (#1371)
- fe29409 Parametrize ray operator makefile to support other container engines (#1121)
- 2c97ac3 Pin operator version in single namespace installation (#1210)
- d0b6337 Pin to working config + stable release (#3885)
- 44fc973 Post release 1.0.0 (#1651)
- fc1e2d0 Post release 1.1.0 (#2040)
- 14f96fc Properly set env field based on containerEnv values (#2175)
- 56b4d14 Publish Multi Arch images (#1716)
- 346ddd0 Ray serve gke gateway ingress (#1978)
- b8f6d06 RayCluster Headless Worker Service Should PublishNotReadyAddresses (#2375)
- 0a3c181 RayCluster Helm: Make volumeMounts and volumes optional for workers (#1689)
- 15daa54 RayCluster updates status frequently (#1211)
- 8e3296e RayClusterProvisioned status should be set while cluster is being provisioned for the first time (#2304)
- 362da3d RayJob Volcano Integration (#3972)
- acafbfe RayJob: don't delete submitter job when ShutdownAfterJobFinishes=true (#1881)
- 0216b33 RayJob: inject RAY_DASHBOARD_ADDRESS envariable variable for user provided submiter templates (#1852)
- 621e9c7 RayService event can't set redis password in both GCSFaultTolerance and rayStartParam (#3153)
- 795db0d RayService object's Status is being updated due to frequent reconciliation (#1065)
- aeb8b03 RayService: Omits Min and Max replicas from hash calculation (#2172)
- 6c4a77d Rayjob event can't set redis password in both GCSFaultTolerance and rayStartParam (#3093)
- 26372c2 Read cluster domain from env (#951)
- 62ad934 Refactor Apiserver e2e run in cluster (#3529)
- f0ff2c1 Refactor UpgradeStrategy to UpgradeSpec.Type (#2678)
- 79c6c20 Refactor configuration test framework to follow Pylint conventions (#671)
- ec642e7 Refactor multiple cases in single test function with array (#2857)
- 160ab10 Refactor to Ensure Consistent Use of CRDType (#1892)
- a7197c5 Refactor validateRayServiceSpec (#2711)
- 5f158a6 Release v0.5.0 doc validation (#997)
- 31c1e6a Release v0.5.0 doc validation part 2 (#999)
- 9dd516d Release v0.5.0 python client library validation (#1006)
- e4e8727 Release v0.6.0 doc validation (#1271)
- f256ddd Remove GOARCH in ray-operator/Dockfile to support multi-arch images (#1442)
- 728e1cb Remove ray-pod.tls.yaml(#3762)
- 22cc61d Remove default option for batch scheduler name (#2371)
- c3b17f3 Remove extranous arguments from examples (#2051)
- e2e4208 Remove generate target from build/test targets (#1874)
- 9109436 Remove helm-chart-releaser (#721)
- 16fd58b Remove ingress.enabled from KubeRay operator chart (#812)
- cc2e144 Remove kustomize from helm, as it is not required (#1370)
- 2ae7574 Remove miniReplicas in raycluster-cluster.yaml (#1473)
- fffe778 Remove preStop hooks from Ray CR Samples (#2724)
- fb7a486 Remove redundant log line that is failing golangci-lint (#2366)
- 1dbd949 Remove unecessary raycluster log in kai-scheduler logger (#3997)
- 21a3611 Remove unused fields from KubeRay operator and RayCluster charts (#839)
- 5b0b9af Remove unused icon from dashboard (#3599)
- ffda626 Remove vLLM examples in favor of Ray Serve LLM (#3786)
- 8be0a21 Removed use of the of BUILD_FLAGS in apiserver makefile (#1336)
- 082389e Reorganize python client library (#984)
- e4cf15f Replace kubectl wait command with RayClusterAddCREvent (#705)
- 4b6f1df Reuse contexts across ray operator controllers (#1126)
- c30fae2 Revert "Bump crd-ref-docs to v0.2.0 for Go 1.24+ compatibility (#4029)" (#4031)
- e77b095 Revert "Disable async serve handler in Ray Service cluster (#447)" (#606)
- 8b47826 Revert "Feature/cron scheduling rayjob 2426 (#3836)" (#3911)
- ba1a000 Revert "Fix issue where unescaped semicolons caused task execution failures. (#3691)" (#3771)
- a16f910 Revert "[BUG] Fix Dockerfile Error: WARN: FromAsCasing: 'as' and 'FROM' Keywords' Casing Do Not match (#2527)" (#2529)
- 064e0ef Revert "[Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)" (#3573)
- 493eb82 Revert "[CI] Skip redis raycluster test (#1465)" (#1490)
- 5373748 Revert "[CRD] Delete CRD v1alpha1 (#1771)" (#1784)
- d8ffec4 Revert "[release] Update Ray image to 2.34.0 (#2303)" (#2413)
- 3479347 Revert "kubectl ray job submit: provide empty entrypoint (#3127)" (#3165)
- 5d38eda Revise sample configs, increase memory requests, update Ray versions (#761)
- b2a701d Rewrite detached actor test with go (#2722)
- 4c2c046 Set imagePullPolicy in manager.yaml (#1710)
- 6359d3c Show cluster name in kubectl get rayjob (#2065)
- d0683a9 Single go.mod file (#3640)
- 7f77e46 Standardize imports of github.com/ray-project/kuberay/ray-operator/apis/ray/v1alpha1 (#1112)
- 979b909 Support --address flag for kubectl ray job submit (#3922)
- 72a63ac Support Apache YuniKorn as one batch scheduler option (#2184)
- 8694093 Support disable leader election for manager go binary via Values.yaml to mitigate kuberay restarts (#2262)
- f27e4ac Support for Image pull policy (#2101)
- df5577f Support gang scheduling with Apache YuniKorn (#2396)
- 79f757c Support json structured logging (#1912)
- 86abaab Support suspension of RayClusters (#1711)
- 55b99e6 Support to set QPS and burst by configuration. (#3969)
- 413b8ab Support uppercase default resource names for top-level Resources (#4137)
- dbd6b72 TPU Multi-Host Support (#1913)
- 3a2be0b Update APIServer docs for release v0.4.0 (#778)
- ff89298 Update Autoscaler YAML for the Autoscaler tutorial (#1400)
- 91921f2 Update CHANGELOG for v1.0.0 (#1650)
- 0becdd8 Update Dockerfile to address closed CVEs (#1488)
- 2057d76 Update Dockerfiles to address CVE-2023-44487 (HTTP/2 Rapid Reset) (#1540)
- 25eb751 Update GCS fault tolerance YAML (#1404)
- 87c5541 Update KubeRay release documentation (#3226)
- 0c16aa6 Update KubeRay versions. (#821)
- 7f986b6 Update Kuberay doc to version 1.0.0 rc.0 (#1441)
- ba5f7e0 Update RayCluster values.yaml(#3950)
- fbdf317 Update RayServices section title (#3906)
- 6d0c637 Update TPU Ray CR manifests to use Ray 2.41.0 (#2965)
- 714aea6 Update V6e TPU Ray Samples (#2448)
- 729c1b7 Update Volcano integration doc (#1380)
- 02135a4 Update apiserver chart location in readme (#896)
- b438b50 Update bug-report.yml (#1906)
- af8fb0c Update contribution doc to show users how to reach out via slack (#936)
- aae9fac Update doc and base image for Go 1.19 (#1330)
- 06a0564 Update feature-request.yml (#1907)
- 3283254 Update gcs-ft.md (#777)
- 247b7ca Update grafana dashboards to ray 2.49.2 + add README instructions on how to do the update (#4111)
- bab00be Update kind version (#1957)
- bde5e9a Update kuberay mcad integration doc (#1373)
- 38e3527 Update latest release to v1.0.0-rc.0 in tests (#1467)
- 15ce568 Update operator development instruction (#1458)
- be10373 Update overwrite-container-cmd example (#1722)
- 7a185af Update ray operator Dockerfile (#1213)
- 944a042 Update ray-operator documentation and image version in ray-cluster.heterogeneous.yaml (#585)
- b12a722 Update samples to use Ray 2.41.0 images (#2964)
- 2e35bff Update securityContext values.yaml for kuberay-operator to safe defaults. (#1896)
- e4a9645 Update swagger-initializer.js (#2543)
- 12c0a90 Update test config (#654)
- e6b2920 Update update-ray-job.kueue-toy-sample.yaml (#3782)
- b9f0209 Update v6e-256 KubeRay Sample (#2466)
- 6b12c18 Updated API server documentation (#1435)
- 71984fb Updated default timeout seconds for probes (#2265)
- f144145 Updates to the apiserver swagger-ui (#1410)
- 2793492 Updating logrus and net packages in go.mod (#1495)
- e3bdc83 Upgrade Kubernetes dependencies to v0.28.3 and Golang to 1.20 (#1648)
- f2d94ff Upgrade dependencies to address CVEs (#1865)
- b73daa9 Upgrade golang linter for precommit hook (#3319)
- 1213d15 Upgrade manifests kustomize v5 (#2352)
- 31d8a8c Upgrade to Go 1.19 (#1325)
- ce960e2 Upgrade to address High CVEs (#1731)
- 9be8abd Use Go 1.24.0 in go module (#3835)
- a4893a8 Use ImplementationSpecific in ray-cluster.separate-ingress.yaml (#3781)
- 838bc19 Use a default user agent 'kuberay-operator' instead of the default user-agent from controller-runtime (#1982)
- 0fa7d3f Use ctrl log and create logger in function in kai-scheduler (#3995)
- 4845306 Use ctrl logger in Volcano scheduler to include context (#4023)
- 747708b Use helm-docs to generate README for chart kuberay-operator automatically (#3331)
- c1dbdf1 Use standard golang image as build image and distroless image as base image for kuberay operator. (#1967)
- 2e173a1 Use webhook.CustomValidator instead of deprecated webhook.Validator. (#2803)
- 6cbb5df User longer exec probe timeouts for Head pods (#2353)
- f0abc1d [0.4.0 Release] Minor doc improvements (#780)
- 6f5047c [0.4.0 release] Update changelog for KubeRay 0.4.0 (#836)
- c45fcf0 [1/N] [Lint] Group imports by sections (#3428)
- 732a675 [1/N][apiserver] Fix half of linter issues for apiserver (#3328)
- b16de0c [2.5.0 Release] Change version numbers 2.4.0 -> 2.5.0 (#1151)
- a068e7b [2/N] [Lint] Group imports by sections (#3429)
- 8944703 [2/N] [apiserver] Fix second-half apiserver lint (#3338)
- 05c5e6b [3/N] [Lint] Group imports by sections (#3430)
- 1eac370 [API Server] Add Ray Job output - start/end time and ray cluster name (#2533)
- f3353b2 [API Server] Add security context to Ray Cluster (#2538)
- 773a475 [API Server] Add v2 related helm (#3677)
- 846416e [API Server] consolidate e2e test (#3674)
- a8ec758 [APIServer][Docs] Identify API server as community-managed and optional (#753)
- 5c0e2e9 [APIserver] [Ray Job] Added Job submission support to the API server (#1639)
- 796bf06 [Apiserver] Determine the minimum resource requirements for KubeRay API server e2e tests (#3526)
- 2ba0dd7 [Apiserver] Set the right amount of resource in e2e test (#3465)
- a361dc3 [Apiserver] Use Eventually from Gomega instead of wait from apimachinery (#3433)
- af6a005 [Apiserver][Refactor] Use polling in autoscaler e2e test (#3402)
- 5f51977 [Autoscaler V2] Polish Autoscaler V2 YAML (#2064)
- d125ab7 [Autoscaler] Improve TestRayClusterAutoscalerAddNewWorkerGroup(#3682)
- 9e14ba6 [Autoscaler] Print the value of WorkerGroupSpec.Replicas (#3005)
- c159491 [Autoscaler][Sample] Add comment for AUTOSCALER_UPDATE_INTERVAL_S (#3294)
- 759ab3a [Autoscaler][Sample] Add comment for RAY_LOGGER_LEVEL (#4104)
- 3f69f01 [Autoscaler][Test] Fix flaky idleTimeoutSeconds test (#2862)
- 9c55794 [BUG] Fix Dockerfile Error: WARN: FromAsCasing: 'as' and 'FROM' Keywords' Casing Do Not match (#2527)
- 9c28b7d [Benchmark] KubeRay memory / scalability benchmark (#1324)
- 8ad2c1b [Bug] Add default value for entrypoint flags in job_submit.go (#3808)
- 20636f9 [Bug] All worker Pods are deleted if using KubeRay v1.0.0 CRD with KubeRay operator v1.1.0 image (#2087)
- 7ad3acf [Bug] Allow zero replica for workers for Helm (#968)
- 2586468 [Bug] Autoscaler doesn't support TLS (#1119)
- cceb7a5 [Bug] Avoid assigning an entry to a map that is nil (#1715)
- ec40186 [Bug] Change image repository for make deploy(#2059)
- f56c66f [Bug] Clean up WorkersToDelete after the scaling process finishes (#1747)
- 39562b5 [Bug] Enable ResourceQuota by adding Resources for the health-check init container (#1043)
- 3f7b34c [Bug] Fail to create ingress due to the deprecation of the ingress.class annotation (#646)
- 7fd3927 [Bug] Fix RayCluster with an overridden app.kubernetes.io/name (#2147) (#2166)
- af0c7a2 [Bug] Fix flakiness of RayService e2e tests (#1385)
- b0096b0 [Bug] Fix flaky sample YAML tests (#1590)
- f3ec71b [Bug] Fix flaky test: should be able to update all Pods to Running (#893)
- c420135 [Bug] Fix null map handling in BuildServiceForHeadPodfunction (#1095)
- f1e961a [Bug] Fix rebase error (#1897)
- c683ad1 [Bug] Fix the filename of text summarizer YAML (#1415)
- cf41e24 [Bug] Issue with glibc version GLIBC_2.34 and GLIBC_2.32 not found in earlier operator tags (#2272)
- e4d4839 [Bug] KubeRay does not work on M1 macs. (#869)
- 791ea37 [Bug] KubeRay operator failed to watch endpoint (#2080)
- c22fbfa [Bug] KubeRay operator fails to get serve deployment status due to 500 Internal Server Error (#1173)
- 7aea947 [Bug] KubeRay tries to create ClusterRoleBinding when singleNamespaceInstall and rbacEnable are set to true (#1190)
- a0e59be [Bug] Long image pull time will trigger blue-green upgrade after the head is ready (#1231)
- e2a6ae8 [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_detached_actor flaky (#619)
- 1ab5a00 [Bug] Misuse of Docker API and misunderstanding of Ray HA cause test_ray_serve flaky (#650)
- d46b431 [Bug] Modification of nameOverride will cause label selector mismatch for head node (#572)
- cbc9b0b [Bug] Pod reconciliation fails if worker pod name is supplied (#587)
- 47b4e80 [Bug] Ray operator crashes when specifying RayCluster with resources.limits but no resources.requests (#2077)
- 52af139 [Bug] RayService restarts repeatedly with Autoscaler (#1037)
- ac56e33 [Bug] RayService with GCS FT HA issue (#1551)
- 2bd5c9e [Bug] Re-enable flaky kubectl plugin e2e test "should reconnect after pod connection is lost" (#3116)
- 79c7c87 [Bug] Re-enable flaky kubectl plugin e2e test in kubectl_ray_job_submit_test.go (#3124)
- a87f9a6 [Bug] Reconciler error when changing the value of nameOverride in values.yaml of helm installation for Ray Cluster (#1966)
- 0cabd14 [Bug] Service (Serve) changing port from 8000 to 9000 doesn't work (#1081)
- 60de974 [Bug] Shallow copy causes different worker configurations (#714)
- 01b4883 [Bug] Sidecar mode shouldn't restart head pod when head pod is deleted (#4141) (#4156)
- 5dab94c [Bug] Submitter K8s Job fails even though the RayJob has a JobDeploymentStatus Completeand a JobStatusSUCCEEDED(#1919)
- d05964c [Bug] TestRayServiceInPlaceUpdate is flaky (#2620)
- 457d67a [Bug] Update wait function in test_detached_actor (#635)
- 82c925b [Bug] autoscaler not working properly in rayjob (#1064)
- 3581b91 [Bug] client_golang used by KubeRay has a vulnerability (#728)
- 2b136c9 [Bug] compatibility test for the nightly Ray image fails (#1055)
- 1186737 [Bug] error: git cmd when following docs (#831)
- ddb5e52 [Bug] fix RayActorOptionSpec.items.spec.serveConfig.deployments.rayActorOptions.memory int32 data type (#1220)
- c880029 [Bug] kubectl plugin e2e test is flaky (#3147)
- 17264a6 [Bug] label rayNodeType is useless (#698)
- 0672956 [Bug] rayStartParams is required at this moment. (#1031)
- bc6be0e [Bug][Autoscaler] Operator does not remove workers (#1139)
- 0d813b4 [Bug][CI] Multi-platform build fails with docker driver in GitHub Actions (#3570)
- deec37c [Bug][Doc] Increase default operator resource requirements, improve docs (#727)
- ca929e9 [Bug][Doc] fix the link error of operator document (#1046)
- d632ac1 [Bug][GCS FT] Clean up the Redis key before the head Pod is deleted (#1989)
- 2019b4b [Bug][GCS FT] Worker pods crash unexpectedly when gcs_server on head pod is killed (#1036)
- 0e959cf [Bug][RayCluster] Fix RAY_REDIS_ADDRESS parsing with redis scheme and multiple addresses (#1556)
- 664b19a [Bug][RayJob] Avoid nil pointer dereference (#1756)
- 5da4a04 [Bug][RayJob] Check dashboard readiness before creating job pod (#1381) (#1429)
- 5a974fc [Bug][RayJob] Fix FailedToGetJobStatus by allowing transition to Running (#1583)
- f106737 [Bug][RayJob] RayJob with custom head service name (#1332)
- 9b26ba7 [Bug][RayService] KubeRay does not recreate Serve applications if a head Pod without GCS FT recovers from a failure. (#1420)
- c9802e9 [Bug][apiserver] fix apiserver create rayservice missing serve port (#734)
- 72ca169 [Bug][breaking change] Unauthorized 401 error on fetching Ray Custom Resources from K8s API server (#1128)
- f6a172f [Bug][k8s compatibility] k8s v1.20.7 ClusterIP svc do not updated under RayService (#1110)
- 3875356 [Bug][kubectl-plugin] Wrong behavior for InteractiveMode RayJob with BackoffLimit set (#3555)
- 99505a5 [Build][kubectl-plugin] Add release script for kubectl plugin (#2407)
- 4b75753 [CI] Add kind-in-Docker test to Buildkite CI (#1243)
- 1d1b8ce [CI] Add apiserver e2e test to buildkite (#3351)
- ba6a7a2 [CI] Add shellcheck and fix error of it (#2933)
- 4ca05ab [CI] Add workflow to manually trigger release image push (#801)
- 268a776 [CI] Auto download golang tools in pre-commit (#2917)
- bd7feba [CI] Bump Go version to 1.23 to support E2E Operator Version Upgrade tests (#3406)
- 0e53381 [CI] Change Pre-commit-shellcheck-to-shellcheck-py (#2974)
- 4db24e5 [CI] Composable kube resource logger when test failed (#3070)
- 7542c5e [CI] Create release tag for ray-operator Go module (#1574)
- e595ee4 [CI] Deflaky TestRayServiceGCSFaultTolerance (#2660)
- 03f1a2e [CI] Don't need to publish the security proxy image (#1885)
- b56a973 [CI] Don't push new images to DockerHub (#1923)
- 05e9279 [CI] Downgrade runner image from ubuntu-latest to ubuntu-22.04 (#2714)
- 00abf6e [CI] Enable testifylintemptyrule (#2908)
- 1830a6d [CI] Enable testifylinterror-nilrule (#2907)
- 3e97888 [CI] Enable testifylintexpected-actualrule (#2914)
- 67ed6ce [CI] Enable testifylintfloat-comparerule (#2910)
- 17d6067 [CI] Enable testifylintrequire-errorrule (#2909)
- bc2bd71 [CI] Enable testifylint bool-compare rule (#2911)
- 2ac2a92 [CI] Enable testifylint formatter rule (#2915)
- cdee6f4 [CI] Enable testifylint len rule (#2945)
- 4d2795b [CI] Enable testifylint rule (#2896)
- 67596d3 [CI] Fix MultiArch image push (#3575)
- a1e8c56 [CI] Fix RayService CI (#2525)
- 2a9e647 [CI] Fix apiserver test in image-release process (#1880)
- 02909a2 [CI] Fix autoscaler e2e test flakiness caused by timeout (#3668)
- 894f31e [CI] Fix image release pipeline (#1878)
- 0ac9942 [CI] Fix lint error (require-error) (#2931)
- 535a405 [CI] Fix variable initializations used in test case declarations (#1775)
- abd3f87 [CI] Fix: /etc/docker/daemon.json: No such file or directory (#3565)
- f3ed172 [CI] Generate CRD json schema separately in pre-commit (#2930)
- 08b9908 [CI] Install kuberay operator in buildkite test (#1308)
- 4bd2dab [CI] Jail flaky test: TestRayServiceInPlaceUpdate (#2638)
- fa67724 [CI] Make release.yaml only be triggered manually (#2798)
- 353e87f [CI] Move e2e tests to buildkite (#2639)
- cce897b [CI] Only run test_ray_serve for Ray 2.6.0 and later (#1288)
- c7a6894 [CI] Pin crd-ref-docs to v0.0.10 (#1988)
- 39a8480 [CI] Pin go version in CRD consistency check (#794)
- 3db8d23 [CI] Pin kustomize to v5.3.0 (#2067)
- 4f85055 [CI] Publish KubeRay operator / apiserver images to Quay (#1307)
- dec8137 [CI] Reenable rayjob sample yaml latest test (#1464)
- 2c5a6d0 [CI] Refactor pipeline and test RayCluster sample yamls (#1321)
- 77d0bba [CI] Remove RayService tests from comopatibility-test.py (#1395)
- 56cdfb6 [CI] Remove compatibility-test.py and modified CI (#2882)
- 0e9d177 [CI] Remove create tag step from release (#3249)
- 629bc8f [CI] Remove extraPortMappings from kind configurations (#1366)
- 2e23506 [CI] Remove test_security.py and all python test dependencies in CI (#3123)
- 1fdf04c [CI] Remove unnecessary kind load $RAY_IMAGEfor e2e sample YAML tests (#1863)
- 085c29d [CI] Remove unnecessary release.yaml workflow (#1168)
- a1cf47d [CI] Remove unnecessary sample YAML symbolic links (#2118)
- 0b61523 [CI] Replace lint CI with pre-commit (#2129)
- df7cfe1 [CI] Run sample job YAML tests in buildkite (#1315)
- 4bb1226 [CI] Skip kubectl plugin flaky e2e tests (#2800)
- 84c35ac [CI] Skip redis raycluster test (#1465)
- 21058dc [CI] Skip the flaky compatibility test test_detached_actoruntil https://github.com/ray-project/ray/issues/41343 (#1694)
- 75a63a5 [CI] Split Autoscaler e2e tests into 2 buildkite runners (#3715)
- 0288281 [CI] Stop publishing images to DockerHub (#1926)
- da763f2 [CI] Stop to publish new images to DockerHub (#1702)
- 83f3095 [CI] Unjail TestRayServiceInPlaceUpdate (#2650)
- 4fbdb9e [CI] Update latest ray version 2.5.0 -> 2.6.3 (#1320)
- 0561ba1 [CI] Upload logs as artifacts to BuildKite (#3405)
- ef7cf5e [CI] Use golang:1.24-bookworm (Debian 12) in CI for Python-3.11 support (#3949)
- e801dc1 [CI] Use quay as the default image registry (#1939)
- 9e37e19 [CI] Verify kubectl in kind-in-docker step (#1305)
- a9aa9a3 [CI] apply resource logger to ray cluster test (#3075)
- 945698b [CI] apply resource logger to ray service test (#3081)
- 23c9e5b [CI] dump failed test k8s resources (#3025)
- 3114a0c [CI] fix locust versions (#3100)
- 60bc89d [CI] fix missing Go module release step (#3644)
- c764021 [CI] split rayservice e2e test into another runner and decrease timeout to 30m (#2667)
- e9073fc [CI] stream operator logs from kind in go e2e tests (#1793)
- 03969c9 [CI]: Kuberay operator e2e tests (#1575)
- f123a44 [CI]: change kubectl plugin e2e test to buildkite (#2861)
- bd53766 [CI][#2905] Improvement: enable testifylintcomparesrule (#2977)
- f82e7ea [CI][Buildkite] An example test for Buildkite (#919)
- 1a8895e [CI][Buildkite] Fix the PATH issue (#952)
- 0e1c248 [CI][GitHub-Actions] Upgrade actions/upload-artifact to v4 (#2373)
- cbde878 [CI][HELM] Use chart-testing to install Helm charts (#3412)
- e96dedc [CI][Hotfix] Increase the timeout of Test E2E from 30m to 1h (#2664)
- 1d4a403 [CI][RayService] deflaky the TestAutoscalingRayService (#3119)
- d723f50 [CRD] Delete CRD v1alpha1 (#1771)
- 77e299b [CRD] Inject CRD version to the Autoscaler sidecar container (#1496)
- 96c4d66 [CRD] Set maxDescLen to 0 (#1449)
- 7b00aca [CRD] Sync v1alpha1 CRD with v1 CRD (#1788)
- b7bc7ae [CRD][1/n] Create v1 CRDs (#1481)
- 1184bc8 [CRD][2/n] Update from CRD v1alpha1 to v1 (#1482)
- 7336ea6 [Chore] Add RayJob InteractiveMode sample yaml (#3062)
- 491fbde [Chore] Add golangci-lint rules (#2128)
- d901fd0 [Chore] Add kubectl plugin and dashboard to components in issue template (#3678)
- 49a5725 [Chore] Add pre-commit hooks (#2127)
- ca98d1f [Chore] Create example Modin RayJob (#2221)
- e0318a3 [Chore] Delete redundant pod existance checking (#2113)
- 41c9e91 [Chore] Fix golangci-lint rule: gosec (#2163)
- fb58429 [Chore] Fix lint errors caused by casting int to int32 (#2368)
- 445b941 [Chore] Improve the appearance of compute resources status in the output of kubectl describe (#1802)
- 2b31c30 [Chore] Make error as a local variable (#2841)
- 80ab11c [Chore] Modify pre-commit yaml to allow golangci-lint version with prefix "v" (#2824)
- b16fb3f [Chore] Remove CHANGELOG.md (#3819)
- 3471f99 [Chore] Remove duplicate make command (#4145)
- e02751a [Chore] Run operator outside the cluster (#2090)
- 5d3d9d3 [Chore] Turn off golangci-lint rules except ray-operator (#2138)
- 7a43534 [Chore] Turn off no-commit-to-branch rule (#2139)
- 7cc3548 [Chore] Upgrade Ray to 2.46.0 follow-up (#3722)
- 35e913a [Chore] Use Ray 2.9.0 for Apache YuniKorn example (#2427)
- 949875a [Chore] Use new golangci-lint rules only for ray-operator (#2152)
- 6eeca32 [Chore] Use safe YAML for helm-chart-verify-rbac (#2230)
- 5894146 [Chore] make err as local variable in if-statement (#2718)
- d2ae625 [Chore] make ingressClassName as a local variable (#2815)
- dd46cb4 [Chore] remove redundant var declaration (#2811)
- 6350033 [Chore] remove unnecessary line break in log (#2709)
- 5db3012 [Chore] specify the capacity on calling make (#2719)
- 20ed56f [Chore] update comment for headGroupSpec and entrypoint (#2802)
- d97e37a [Chore][CI] Limit the release-image-build github workflow to only take tag as input (#3117)
- 9b0eda4 [Chore][CI] Remove StreamKubeRayOperatorLogs (#2637)
- 0c09b05 [Chore][CI] Upgrade ray version to 2.40 except for TestRayServiceInPlaceUpdate (#2629)
- 7b81970 [Chore][Comment] Fix wrong comment (#2294)
- 54ba287 [Chore][Linter] Upgrade golangci-lint to 1.60.3 (#2362)
- 784b7f3 [Chore][Log] Delete error loggings right before returned errors (#2103)
- b08a5ae [Chore][Minor] Add .gitignore to kubectl-plugin (#2383)
- ca7db14 [Chore][RayJob] Remove the TODO of verifying the schema of RayJobInfo because it is already correct (#1911)
- 3514856 [Chore][Sample-yaml] Upgrade pytorch-lightning to 1.8.5 for ray-job.pytorch-distributed-training.yaml (#3796)
- 296d480 [Chore][Samples] Rename ray-cluster.mini.yaml and add workerGroupSpecs (#2100)
- 708d758 [Chore][YuniKorn] Add sample yaml file for Apache YuniKorn (#2412)
- 135f129 [Chore][kubectl-plugin] Fix wrong homepage link in krew template file (#2461)
- ab17363 [Chore][precommit] Replace grep with awk in pre-commit hooks for BSD compatibility (#2541)
- ea0b9c5 [Community] Add KubeRay community guide (#3859)
- 38a07e9 [Community][2/N] Governance model (#3977)
- 30c5d74 [Compatibility] Update Redis image for compatibility tests (#2852)
- c88b174 [DOCS] Apiserver improve docs readability (#3564)
- d1b07df [DOCS] KubeRay APIServer V2 document (#3594)
- 4ac20b3 [DOCS] document step to do before running e2e test (#3385)
- aeab361 [Dashboard-client] Add proper error checking in dashboard client (#3953)
- 39d7e71 [Dashboard-client] replace http method from string to constant (#3961)
- b87480e [Doc] Add helm updatecommand to chart validation step in release process (#1165)
- 6565845 [Doc] Add a YAML to explain why some worker pod are not ready in RayService (#3139)
- f5e0ef5 [Doc] Add blogs and talks to readme (#1691)
- 1359dd5 [Doc] Add git fetch --tags command to release instructions (#1164)
- 41018bc [Doc] Add gke bucket yaml (#1372)
- 44ff72c [Doc] Cannot build kuberay with Go 1.16 (#575)
- e52dd3b [Doc] Copyedit dev guide (#1012)
- ffac2c8 [Doc] Delete unused docs (#1440)
- 83fea90 [Doc] Deprecate ServiceUnhealthySecondThreshold and DeploymentUnhealthySecondThreshold (#1688)
- e9a2698 [Doc] Develop Ray Serve Python script on KubeRay (#1250)
- 9c53a72 [Doc] Fix Doc Typos (#2060)
- 7391341 [Doc] Fix Yaml Typos (#2049)
- 856a33e [Doc] Fix release doc format (#1578)
- b26f106 [Doc] Fix the order of comments in sample Job YAML file (#1242)
- 1ee5f95 [Doc] GKE GPU cluster setup (#1223)
- 04388da [Doc] Improve DEVELOPMENT.md by adding more guidances (#1794)
- c16cac4 [Doc] Improve FAQ page and RayService troubleshooting guide (#1225)
- 3b81601 [Doc] Improve RayService doc (#1235)
- cb12484 [Doc] Reference helm chart version in helm-chart/kuberay-operator/README.md.gotmplwith go template (#3763)
- 73eef73 [Doc] Remove KubeRay CLI references and add Python client details (#2521)
- 3754d34 [Doc] Support CRD docs generation (#1625)
- cc1ff48 [Doc] Support consistency check for API reference in CI (#1655)
- d78d34f [Doc] Update README (#1433)
- be22ecf [Doc] Update README (#3695)
- 6e1f1bd [Doc] Update nav to include missing files and reorganize nav (#1011)
- 9425e7f [Doc] Update release docs (#1621)
- 6c0fbbe [Doc] Update version from 0.4.0 to 0.5.0 on remaining kuberay docs files (#1018)
- 7a1e322 [Doc] Upload a screenshot for the Serve page in Ray dashboard (#1236)
- adde70c [Doc] [RayJob] Add documentation for submitterPodTemplate (#1228)
- d55dfc3 [Doc] add ray cluster uv sample yaml (#3720)
- f3ebea7 [Doc][CI] Align K8s version in Doc and CI with minimal required version (#3628)
- 98496f4 [Doc][Fix] correct the indention of storageClass in ray-cluster.persistent-redis.yaml (#3780)
- 167a71d [Doc][Website] Add complete document link (#1224)
- fa26bb2 [Doc][Website] Update KubeRay introduction and fix layout issues (#1042)
- 8430410 [Docs] Add kubectl plugin create cluster sample yaml config files (#3804)
- fd4ab91 [Docs] Align development guide with Makefile docker-build logic (#3248)
- 89e980f [Docs] Correct command to load KubeRay operator image (#3387)
- 192d1ea [Docs] Revise release note docs (#835)
- 36f32ed [Docs] Update Security Guidance on Dashboard Ingress (#1413)
- 0532645 [Docs] add sample RayCluster using kube-rbac-proxy for dashboard access control (#2578)
- ebf8a53 [Docs] add sample RayCluster with FluentBit sidecar to persist Ray logs (#2602)
- c693140 [Docs] update development md (#3230)
- 7fb46ab [Docs][Development] Delete linting docs (#2145)
- f37a4cc [Docs][kubectl-plugin] Add doc for install via Krew (#2458)
- dcbdbfc [Docs][kubectl-plugin] Add instructions for downloading from GitHub release (#2450)
- 06367a3 [Docs][ray-operator] Add types of tests and debug tips to development doc (#3401)
- 0a56cd4 [Enhancement] GPU RayCluster doesn't work on GKE Autopilot (#1470)
- eb59de4 [Enhancement] Remove unused variables in constant.go (#1474)
- e009704 [Experimental] Fix Makefile tool check: replace -swithtest -s(#3970)
- 9e68367 [FEAT] show event message when raycluster not found in clusterSelector in rayjob (#4125)
- 9321b2d [FIX][DOC] development markdown example (#2687)
- 35b96f1 [Feat] Add e2e test for applying ray-job.interactive-mode.yaml(#3779)
- b81af7c [Feat] Add sample yaml for RayJob clusterSelector config (#2505)
- 6186a7d [Feat] Deprecate ForcedClusterUpgrade (#2075)
- f3430b0 [Feat] Remove RayService sample YAML Python tests (#2565)
- 2278768 [Feat]: Add a field to configure whether to add a proxy actor on the head Pod to the K8s serve service or not (#2598)
- 5d3bceb [Feat][Kubectl-Plugin] Implement kubectl session for RayJob and RayService (#2379)
- 6786350 [Feat][Kubectl-Plugin]Implement kubectl ray job submit (#2394)
- ea314d7 [Feat][RayCluster] Introduce the RayClusterStatus.Conditions field (#2214)
- d2b3338 [Feat][RayCluster] Make the Head service headless (#2117)
- ca39dc9 [Feat][RayCluster] Use a new RayClusterReplicaFailure condition to reflect the result of reconcilePods (#2259)
- cc94c6a [Feat][RayJob] Delete RayJob CR after job termination (#2225)
- cf4a877 [Feat][RayJob] UserMode SubmissionMode (#2364)
- 6079dc5 [Feat][Sample-yaml] Deprecated python sample yaml test cleanup (#2507)
- bc61ad9 [Feat][apiserver] Support CORS config (#3711)
- 84839a8 [Feat][kubectl-plugin] Add Long, Example, shell completion for kubectl ray log (#2405)
- 4e3340c [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray get node & workergroup (#3154)
- f69885b [Feat][kubectl-plugin] Add dynamic shell completion for kubectl ray session (#2390)
- 800ac16 [Feat][kubectl-plugin] Add instructions for static shell completion (#2384)
- bee1b71 [Feat][kubectl-plugin] Add kubectl ray version command (#2424)
- 32d8cde [Feat][kubectl-plugin] Create cluster with TPUs (--worker-tpu,--num-of-hosts) and TPUs' validation (#3258)
- 090fad0 [Feat][kubectl-plugin] Include LICENSE file into kubectl plugin tar (#2422)
- 6e8b0b0 [Feat][kubectl-plugin] Retry port-forward when connection lost (#2704)
- 52e330b [Feat][kubectl-plugin] Support -v flag for kubectl ray job submit (#3524)
- 39d42fb [Feature] Add Kubernetes manifest validation in pre-commit. (#2380)
- f7edc22 [Feature] Add ManagedBy field to RayCluster (#2597)
- e6af2cc [Feature] Add ManagedBy field to RayJob (#2589)
- 4bce739 [Feature] Add a chart-test script to enable chart lint error reproduction on laptop (#563)
- 99abccf [Feature] Add a flag to make zero downtime upgrades optional (#1564)
- 0bbdec2 [Feature] Add allow CORS in apiserversdk (#4059)
- 3fe9605 [Feature] Add an e2e test for Autoscaler to scale up by manually updating (#2634)
- 9d25660 [Feature] Add an e2e test for K8s Job submitter failures (#2688)
- 96d1ac2 [Feature] Add an example for RayService high availability (#1566)
- dcaf6a5 [Feature] Add apiserver unit test(pkg/util/cluster.go) (#3348)
- d56356b [Feature] Add cleanup for terminated RayJob/RayCluster metrics (#3923)
- ed44425 [Feature] Add default init container in workers to wait for GCS to be ready (#973)
- 6b3836e [Feature] Add e2e test for UpdateRayService function (#3446)
- 6687955 [Feature] Add e2e test for setting RayCluster deletion delay in RayService (#3912)
- 0474e8d [Feature] Add e2e tests for Autoscaler V2 (#2588)
- c85646f [Feature] Add eslint and Prettier to ray dashboard (#3975)
- 5990b05 [Feature] Add initializing timeout for RayService (#4143)
- cd9b2e8 [Feature] Add python client test to action (#993)
- a0ee1c8 [Feature] Add service account section in helm chart (#969)
- f45155b [Feature] Add timeout for apiserver grpc server (#3427)
- 39e8028 [Feature] Add timestamps for logs in e2e tests (#3006)
- 7db8f69 [Feature] Add unit test for update service request validation (#3546)
- e11a9b7 [Feature] Adding RAY_CLOUD_INSTANCE_ID as unique id for Ray node (#1759)
- de8bc26 [Feature] Allow RayCluster Helm chart to specify different images for different worker groups (#1352)
- 002e375 [Feature] Allow custom labels&annotations for kuberay operator (#1276)
- c13498b [Feature] Auto detect MIG GPUs and pass them into Ray’s logical resources. (#3567)
- 34e394f [Feature] Consistency check for RBAC (#577)
- 633ff63 [Feature] Define a general-purpose cleanup method for CREvent (#849)
- e128863 [Feature] Disable zero downtime upgrade for a RayService using RayServiceSpec (#2468)
- 13eb7b2 [Feature] Display reconcile failures as events (ServiceAccount) (#2290)
- b6b00c8 [Feature] Docker support for chart-testing (#623)
- 2600854 [Feature] Enable namespaced installs via helm chart (#860)
- 40775c5 [Feature] Expose initContainer image in RayCluster chart (#674)
- 2ee95cc [Feature] Fix auto upgrade prometheus (#3449)
- 6cbb8e7 [Feature] Fix dependency upgrade for gomock (#3558)
- 4714892 [Feature] Improve and fix Prometheus & Grafana integrations (#895)
- 244003b [Feature] Improve observability for flaky RayJob test (#1587)
- c6df15e [Feature] Improve the observability of integration tests (#775)
- 551de65 [Feature] Include CR UID in kuberay metrics (#4003)
- c6bafa3 [Feature] Make Ray and Logs links proxy to their Ray dashboards (#4112)
- 3aebd8c [Feature] Make head serviceType optional (#851)
- 1ed0b7f [Feature] Make replicas optional for WorkerGroupSpec (#1443)
- 3bb01e8 [Feature] Manually fix controller runtime package upgrade (#3448)
- a53d942 [Feature] Manually fix net package upgrade (#3447)
- 1a94b43 [Feature] Manually upgrade k8s package group (#3486)
- 8c8222c [Feature] Move some functions from prototype test framework to a new utils file (#837)
- c552d3c [Feature] Override the blockoption ofrayStartParamsto true (#1718)
- 2fb9465 [Feature] Print KubeRay logs in Buildkite runner when tests fail (#2690)
- 49e7520 [Feature] Provide multi-arch images for apiserver and security proxy (#4131)
- 78b9828 [Feature] REP 54: Add PodName to the HeadInfo (#2266)
- ad06bbd [Feature] Ray container must be the first application container (#1379)
- fd27b75 [Feature] Ray restricted podsecuritystandards for enterprise security and Kubeflow integration (#750)
- 4fdb87d [Feature] RayService HA test - GCS fault tolerance + kill GCS process (#2590)
- dd7ed90 [Feature] Refactor test framework & test kuberay-operator chart with configuration framework (#759)
- ffcf704 [Feature] Remove Docker container and NodePort from compatibility test (#844)
- 3129b87 [Feature] Remove checking CRD in Volcano scheduler initialization (#4011)
- d0debd1 [Feature] Replace service name with Fully Qualified Domain Name (#938)
- 1d3f537 [Feature] Run config tests with the latest release of KubeRay operator (#858)
- ea6e8d1 [Feature] Running end-to-end tests on local machine (#589)
- 8a35f18 [Feature] Separate controller namespace and CRD namespaces for KubeRay-Operator Dashboard (#4088)
- fd06b5b [Feature] Set default appProtocol for Ray head service to tcp (#668)
- 6691b70 [Feature] Split ray.io/originated-fromintoray.io/originated-from-cr-nameandray.io/originated-from-crd(#1864)
- a9beafb [Feature] Support ARM image for test (#2699)
- f22a75a [Feature] Support Volcano Network Topology Aware Scheduling for kuberay (#4105)
- d6aef8b [Feature] Support Volcano for batch scheduling (#755)
- 017e58f [Feature] Support configurable RayCluster deletion delay in RayService (#3864)
- baccb09 [Feature] Support environment variables for KubeRay operator chart (#978)
- a45e4ab [Feature] Support for overwriting the generated ray start command with a user-specified container command (#1704)
- 6c9f859 [Feature] Support inject specific env vars to all Ray containers in all RayCluster CRs by configuration (#4103)
- 9bc5d85 [Feature] Support suspend in RayJob (#926)
- 4aa53f4 [Feature] Sync for manifests and helm chart (#564)
- 56b2f61 [Feature] Sync logs to local file (#632)
- ca6d792 [Feature] TLS authentication (#989)
- b4b1ce7 [Feature] Test sample RayCluster YAMLs to catch invalid or out of date ones (#678)
- 65a7703 [Feature] Test sample RayService YAML to catch invalid or out of date one (#731)
- 71e260f [Feature] The default ImagePullPolicy should be IfNotPresent (#947)
- f6a401a [Feature] Upgrade ginkgo (#3503)
- 37cf2ac [Feature] Upgrade golang version (#3461)
- 9620772 [Feature] Upgrade grpc gateway version manually (#3491)
- 1be2ae0 [Feature] Upgrade net package (#3485)
- f4412f6 [Feature] Use image of Ray head container as the default Ray Autoscaler container (#1401)
- 92c2907 [Feature] Validation of RayFTEnabled is false and GcsFaultToleranceOption is not nil (#2726)
- fa74914 [Feature] Warn Users When Updating the RayClusterSpec in RayJob CR (#1778)
- 36b112e [Feature] Watch CR in multiple namespaces with namespaced RBAC resources (#1106)
- 27728d7 [Feature] [API Server] Support activeDeadlineSeconds in API Server RayJob resource (#3335)
- bc17cd9 [Feature] [Fix] Ensure Correct Logs Display for Go Test Logs in Buildkite Runner (#2837)
- bbdff70 [Feature] [KubeRay DashBoard] Reimplement and replace the Compute Template section in the New Job (#4119)
- 89f5fba [Feature] [RayJobs] Use finalizers to implement stopping a job upon cluster deletion (#735)
- b6bcf10 [Feature] [scheduler-plugins] Support second scheduler mode (#3852)
- 09aad7e [Feature] integrate RayDashboard with apiserver V2 (#4054)
- 2db5c5d [Feature] update yarn version from v1 to latest (#3945)
- f2d7c1f [Feature]: Add a new event type FailedToDeleteWorkerPodCollection (#2680)
- 536ca35 [Feature][APIServer v2] Support Compute Template in APIServer v2 (#3959)
- 491c488 [Feature][APIServer] Support decimal memory values in KubeRay APIServer (#3956)
- 8a31bfd [Feature][APIServer] add retry for http client (#3551)
- 5a766fd [Feature][Doc] Access S3 bucket from Pods in EKS (#958)
- 2a84a4b [Feature][Doc] End-to-end KubeRay Operator development process on Kind (#826)
- f4b2823 [Feature][Doc] Explain that RBAC should be synchronized manually (#641)
- 1c648a3 [Feature][Doc] Kubeflow integration (#937)
- 3ac1b5a [Feature][Docs] AWS Application Load Balancer (ALB) support (#658)
- 0564748 [Feature][Docs] Explain how to specify container command for head pod (#912)
- cfa1203 [Feature][GCS FT] Best-effort redis cleanup job for 5 minutes (#1766)
- 72ba3a3 [Feature][GCS FT] Clean up Redis once a GCS FT-Enabled RayCluster is deleted (#1412)
- 310911c [Feature][Helm] Align the key of minReplicas and maxReplicas (#663)
- 0adc508 [Feature][Helm] Enable sidecar configuration in Helm chart (#604)
- 4e9fdb0 [Feature][Helm] Expose the autoscalerOptions (#666)
- 5ca90b3 [Feature][Hotfix] Add observedGeneration to the status of CRDs (#979)
- 9835cc8 [Feature][Observability] Scrape Autoscaler and Dashboard metrics (#1493)
- 692138b [Feature][Ray-operator] Improve RayJob validation for shutdownAfterJobFinishesandttlSecondsAfterFinished(#3653)
- 5231dbf [Feature][RayCluster]: Deprecate the RayCluster .Status.State field (#2288)
- d025792 [Feature][RayCluster]: Generate GCS FT Redis Cleanup Job creation events (#2382)
- 5062a8c [Feature][RayCluster]: Implement the HeadReady condition (#2261)
- b5f14f1 [Feature][RayCluster]: introduce RayClusterSuspending and RayClusterSuspended conditions (#2403)
- b2dbb15 [Feature][RayJob] Remove the deprecated RuntimeEnv from CRD. Use RuntimeEnvYAML instead. (#1792)
- fab00b5 [Feature][RayJob] Support light-weight job submission (#1893)
- 809bfb2 [Feature][RayJob] Support light-weight job submission with entrypoint_num_cpus, entrypoint_num_gpus and entrypoint_resources (#1904)
- 6d5020f [Feature][RayJob] Use Use RayContainerIndex instead of 0 (#1427)
- 73e6c5d [Feature][RayJob]: Generate submitter and RayCluster creation/deletion events (#2389)
- 1283a62 [Feature][RayService] Set default ports (#3262)
- 72e9933 [Feature][autoscaler v2] Set RAY_NODE_TYPE_NAME when starting ray node (#1973)
- da78df4 [Feature][kubectl-plugin] Expose setting shutdownAfterJobFinishesandttlSecondsAfterFinishedin ray job submit (#3627)
- 22c2b45 [Feature][kubectl-plugin] Implement kubectl ray session (#2298)
- c86b03b [Feature][kubectl-plugin] Quick fix for Job Submission ID (#2469)
- 4e5a916 [Feature][kubectl-plugin] add KubeRay operator version query (#2443)
- 6ca956b [Feature][kubectl-plugin] e2e test for 'kubectl ray log' (#2486)
- 1bc821e [Feature][kubectl-plugin] return usage error when no entrypoint input (#2503)
- 5cb2f56 [Feature][kubectl-plugin]'ray log command' Add check and cleanup directory when no ray node exist (#2473)
- fdf7251 [Fix] Adjust crd path to verify changed files (#3103)
- a56b091 [Fix] Consistent parsing of custom accelerator resources (#2464)
- 6e70fd2 [Fix] Directly fail if RayJob metadata is invalid (#3981)
- 2f2c1a2 [Fix] RayCluster fails to transit Status.State to Ready when numOfHosts > 1 (#3353)
- 2300814 [Fix] Standardize Buildkite Display Format Across All Tests (#2992)
- 9068102 [Fix] Update Ray Service Troubleshooting Link (#2727)
- fd9c90c [Fix] Use go 1.22 on Buildkite autoscaler e2e tests (#2211)
- 7d53e78 [Fix] changelog-generator.py failed to parse some commit messages (#3818)
- 9559227 [Fix][CI] E2E tests do not reflect error (#3021)
- 795f799 [Fix][CI] Fix ray operator image build error by setting up docker buildx (#3750)
- 93e32d0 [Fix][CI] Fix revive error (#2183)
- 5124ef8 [Fix][CI] Redirect stderr to stdout in Test Autoscaler E2E (nightly operator) (#3074)
- f6bf32f [Fix][CI] kubectl plugin krew index CI error (#3015)
- dea87ff [Fix][Envtest] Decorate container nodes with Ordered (#2285)
- b903d40 [Fix][HelmChart] Move service.headService -> head.headService in values.yaml (#1998)
- 990ffe3 [Fix][Helm] Fix ClusterRole for volcano if .Values.batchScheduler.name is set (#2474)
- c7fe15b [Fix][Operator] Explictly wait for pod not found for satisfying the delete scale exectation (#3520)
- 6c168a0 [Fix][RayCluster] Make the RayClusterReplicaFailureReason to capture the correct reason (#2282)
- 084368a [Fix][RayCluster] fix missing pod name in CreatedWorkerPod and FailedToCreateWorkerPod events (#3057)
- 96fbbc1 [Fix][RayJob] Invalid quote for RayJob submitter (#2949)
- 40f5ddb [Fix][RayService] Raise error if spec.rayClusterConfig.headGroupSpec.headService.metadata.name is set (#2440)
- efbd35e [Fix][RayService] Use LRU cache for ServeConfigs (#2683)
- baa2cc6 [Fix][Release] Fix Krew release indenetation error (#3823)
- b8c4e5c [Fix][Release] Fix KubeRay dahsboard image build pipeline (#3702)
- a69252e [Fix][Sample-Yaml] Increase ray head CPU resource for pytorch minst (#2330)
- f687794 [Fix][kubectl-plugin] Create separate namespaces for each kubectl plugin e2e test (#2745)
- 3efef20 [Fix][kubectl-plugin] Don't print wrapped error for job submit startup (#3027)
- c8d34f4 [Fix][kubectl-plugin] Fix no context nil error SIGSEGV in tests (#2892)
- 909f66e [Fix][kubectl-plugin] Release bot opens PRs to Krew repo with unexpected whitespace changes (#3090)
- abb0bf4 [Fix][kubectl-plugin] Remove controller-runtime logger warning in kubectl ray job submit(#3669)
- b8484af [Fix][kubectl-plugin] Remove filepath.Clean for ray job submit workingDir (#3518)
- 029cd78 [Fix][kubectl-plugin] make tests use a temporary kube config (#2894)
- a860884 [Fix][kubectl-plugin] ray job submit runtime-env-json null error (#3063)
- c094153 [Fix][kubectl-plugin]: make versionhandle digests (#2876)
- d8b7c69 [Fix][precommit] Fix pre-commit golangci-lint always success (#2140)
- 4b46822 [Fix]remove broken link in doc (#3519)
- a614b1d [Follow Up][Test] Support to set QPS and burst by configuration (#3999)
- 4bbaa06 [GCS FT] Add e2e tests for configuring GCS FT with annotations (#2766)
- 1c9de23 [GCS FT] Consider the case of sidecar containers (#1386)
- fe26dc4 [GCS FT] Enhance observability of redis cleanup job (#1709)
- 55b1d39 [GCS FT] Give readiness / liveness probes good default values (#1364)
- 6fa2d3a [GCS FT] Improve GCS FT cleanup UX (#1592)
- 7f95a6c [GCS FT] More validations for configuring GCS FT with envs and annotations (#2772)
- a81ea81 [GCS FT] Redis e2e cleanup check (#2773)
- 937297c [GCS FT] Unify configuring Gcs FT into a single function (#2755)
- e79e0b9 [GCS FT][Refactor] Redefine the behavior for deleting Pods and stop listening to Kubernetes events (#1341)
- 6375221 [Golang] Remove go get (#1283)
- 10cc898 [Grafana] Add a Clustervariable to the Grafana Dashboard to enable filtering of different RayClusters (#2685)
- f6637d7 [Grafana] Add flag for enabling auto load dashboards (#3689)
- 9f013a3 [Grafana] Allow auto-load dashboard jsons (#3643)
- e89ae34 [Grafana] Update Grafana dashboard (#2106)
- 848d400 [Grafana] Update Grafana dashboard (#3726)
- 3425b4b [Grafana] Use PodMonitor instead of ServiceMonitor for the Head Node to avoid metric duplication (#2689)
- 9e4e709 [Grafana] Use Range option instead of instant (#4062)
- 627f529 [Grafana][Observability] Embed Grafana dashboard panels into Ray dashboard (#1278)
- 17ee134 [HELM] Add Helm unit tests for chart kuberay-apiserver (#3361)
- 9658af3 [HELM] Define name templates for all resources (#3381)
- 5265ee0 [HELM] Fix serviceAccount name inconsistency in templates (#3451)
- 75ea7ae [HELM] Typo correction (operatorComand -> operatorCommand) (#3450)
- 22f570e [Helm Chart] Set honorLabel of serviceMonitor to true (#3805)
- 9d46862 [Helm] Add gcsFaultToleranceOptions in RayCluster chart (#3881)
- ef9206c [Helm] Add missing environment variables to operator chart (#3867)
- 6114969 [Helm] Add priorityClassName for kuberay-operator chart (#3703)
- 3a512da [Helm] Clean up RayCluster Helm chart ahead of KubeRay 0.4.0 release (#751)
- 799f073 [Helm] Enable leader election when leaderElectionEnabled is not set (#2284)
- 9296c22 [Helm] Make Kube Client QPS and Burst configurable for kuberay-operator (#4002)
- ae91985 [Helm] Make reconcile concurrency configurable for kuberay-operator (#3962)
- b65e4a0 [Helm] Use helm-docs to generate README for chart api-server automatically (#3916)
- a099da3 [Helm] Use helm-docs to generate README for chart ray-cluster automatically (#3887)
- 6db864d [Helm] add sizeLimit for emptyDir (#2532)
- cde251a [Helm][RBAC] Introduce the option crNamespacedRbacEnable to enable or disable the creation of Role/RoleBinding for RayCluster preparation (#1162)
- 831b55b [Helm][ray-cluster] Fix parsing envFrom field in additionalWorkerGroups (#1039)
- 3a925f3 [Hotfix] Extend Autoscaler e2e tests timeout (#3665)
- 981c943 [Hotfix] Increase the timeout of the ProxyActor health check (#2082)
- 165291e [Hotfix][Bug] Avoid unnecessary zero-downtime upgrade (#1581)
- 1fe5ae7 [Hotfix][Bug] suspendis not a stateless operation (#1741)
- 9ad6b1b [Hotfix][CI] Pin setup-envtest dep (#2038)
- 00dc45a [Hotfix][release blocker][RayJob] HTTP client from submitting jobs before dashboard initialization completes (#1000)
- 12b9df2 [Kueue] Add a sample YAML for Kueue toy sample (#1956)
- 4fc1799 [Logging] Avoid using fmt.Sprintf inside logging functions (#2508)
- df03863 [Logging] Remove duplicate info in CR logs (#2531)
- 23b08e0 [Logging] add context info for yunikorn logger (#2522)
- 105e880 [Metric] kuberay_job_deployment_status (#3656)
- d40692f [Metrics] Remove serviceMonitor.yaml (#3795)
- fdd4bdb [Minor] Remove redundant variable (#2281)
- 7905bcf [N/N][Lint] Group imports by sections (#3454)
- 0775292 [Nit] Remove redundant code snippet (#1810)
- f77ee03 [Perf] Add NUM_WORKERS and CPUS_PER_WORKER env to the mnist workload (#2126)
- afab558 [Perf] Add a CPU-based image resizing workload using Ray Data (#2135)
- c099de4 [Perf] Add a CPU-based training workload (#2116)
- b5f237d [Perf] Improve perf-test YAMLs and README (#2110)
- c83b1bd [Post Ray 2.2.0 Release] Update Ray versions to Ray 2.2.0 (#822)
- 8da54d4 [Post Ray 2.3 Release] Update Ray versions to Ray 2.3.0 (#925)
- 473dfdb [Post Ray 2.4 Release] Update Ray versions to Ray 2.4.0 (#1049)
- cc4155b [Post Ray 2.7.0 Release] Update Ray versions to Ray 2.7.0 (#1423)
- 666679f [Post Ray 2.8.0 Release] Update Ray versions to Ray 2.8.0 (#1678)
- df3cc35 [Post release v0.5.0] Remove block from rayStartParams (#1015)
- ba814ef [Post release v0.5.0] Remove block from rayStartParams for python client and KubeRay operator tests (#1050)
- 67a0f44 [Post release v0.5.0] Remove serviceType (#1013)
- 4234e5b [Post release v0.5.0] Update CHANGELOG.md (#1026)
- dfc197f [Post release v0.5.0] Update release doc (#1028)
- 72d1c21 [Post release v0.6.0] Update CHANGELOG.md (#1274)
- ded9454 [Post v0.5.0] Remove init containers from YAML files (#1010)
- 74496a0 [Post v1.0.0-rc.1] Reenable sample YAML tests for latest release and update some docs (#1544)
- 4eed014 [Post v1.1.0] Run the sample YAML tests with KubeRay v1.1.0 (#2039)
- 022ff0d [Prometheus] Add kuberay_cluster_provisioned_duration_secondsmetric (#3212)
- 2409109 [Prometheus] Add kuberay_cluster_info metric (#3535)
- f7102b2 [Prometheus] Add serviceMonitor for KubeRay Operator (#3530)
- 6cefc40 [Prometheus] Refactor kuberay_cluster_provisioned_duration_seconds(#3497)
- 238cb4e [Quay] Sanity check for KubeRay repository setup (#1300)
- fb1463f [REFACTOR]: refactor execute pod cmd with client-go function (#2467)
- eef1d89 [Ray 2.3.0] Update --redis-password for RayCluster (#929)
- 827814c [Ray 2.9.0 Release] Update Ray versions from 2.8.0 to 2.9.0 (#1770)
- f652d5d [Ray Observability] Disk usage in Dashboard (#1152)
- 7d0eae4 [Ray-operator] Feature flag login bash (#3679)
- d4784a5 [RayCluster controller] Add headServiceAnnotations field to RayCluster CR (#841)
- d1eeaab [RayCluster controller] [Bug] Unconditionally reconcile RayCluster every 60s instead of only upon change (#850)
- 944b60c [RayCluster] Add multi-host indexing labels (#3998)
- ffac341 [RayCluster] Add serviceName to status.headInfo (#2089)
- 6463f25 [RayCluster] IsAutoscalingEnabled takes RayClusterSpec (#3111)
- 13df016 [RayCluster] Make headpod name back to non-deterministic (#3872)
- b9d8b1a [RayCluster] Make headpod name deterministic (#3028)
- 04f5b71 [RayCluster] Toggle usage of deterministic/non-deterministic head pod name with feature flag (#3873)
- d169f5c [RayCluster] Update sample yamls to use the new gcsFaultToleranceOptions option (#2856)
- 829aad5 [RayCluster] Validate GCSFaultToleranceOptions and redis password (#2754)
- eba1459 [RayCluster] Validate RayClusterSpec for empty containers and GCS FT (#2749)
- 8d60d61 [RayCluster] don't allow overriding ray.io/cluster label (#2555)
- 8feef9d [RayCluster] e2e test for GCS FT with Redis Username (#2855)
- 9bd31cf [RayCluster] grant pods and pods/resize patch permissions for IPPR (#3960)
- 28c729f [RayCluster] improve generated pod names for Ray clusters
- a788963 [RayCluster] support suspending worker groups (#2663)
- f11a1f5 [RayCluster] yunikorn batchscheduler respect gang scheduling (#4075)
- 94636bd [RayCluster]Upgrade volcano to 1.11.0 (#3159)
- 8362483 [RayCluster][CI] add e2e tests for RayClusterStatusCondition (#2661)
- c62910f [RayCluster][CI] add e2e tests for the RayClusterSuspended status condition (#2686)
- 801f081 [RayCluster][Expectation] Add a test to ensure expectations work well during scaling down (#3543)
- c2f3823 [RayCluster][Feature] Make RayClusterStatusConditions feature gate Beta and enabled by default (#2562)
- 7a768f9 [RayCluster][Feature] add GcsFaultToleranceOptions to the RayCluster CRD [1/N] (#2715)
- 991b9c7 [RayCluster][Feature] add redis password to head pod from GcsFaultToleranceOptions (#2731)
- 0055bf3 [RayCluster][Feature] add redis username to head pod from GcsFaultToleranceOptions (#2760)
- 7bb82db [RayCluster][Feature] reject redis username to head pod out side of GcsFaultToleranceOptions (#2796)
- 82e2554 [RayCluster][Feature] setup GCS FT annotations and the RAY_REDIS_ADDRESS env by the GcsFaultToleranceOptions (#2721)
- d86ea62 [RayCluster][Feature] skip suspending worker groups if the in-tree autoscaler is enabled (#2748)
- a4d7dd0 [RayCluster][Fix] Add expectations of RayCluster (#2150)
- 42f299a [RayCluster][Fix] DesiredReplicas, MinReplicas and MaxReplicas should respect workerGroupSpec.Suspend (#2728)
- 6c1c16e [RayCluster][Fix] evicted head-pod can be recreated or restarted (#2217)
- b5bcb86 [RayCluster][Fix] leave .Status.State untouched when there is a reconcile error (#2622)
- ae880c4 [RayCluster][Refactor] use RayClusterAllPodsAssociationOptions instead (#2756)
- 17809bc [RayCluster][Status][1/n] Remove ClusterState Unhealthy (#2068)
- 4da1838 [RayJob] Add Cluster Name For Rayjob. (#2046)
- 8f06197 [RayJob] Add Failure Feedback (log and event) for Failed k8s Creation Task (#2306)
- fe981a2 [RayJob] Add JobDeploymentStatusFailed Status and Reason Field to Enhance Observability for Flyte/RayJob Integration (#1942)
- 1f44bdc [RayJob] Add Tests for Atomic Suspend Operation (#2050)
- bb5b788 [RayJob] Add RayJobInfoto RayJob CRD status (#3673)
- f53b42a [RayJob] Add additional print columns for RayJob (#1895)
- 775715b [RayJob] Add default CPU and memory for job submitter pod (#1319)
- 7386427 [RayJob] Add e2e sample yaml test for shutdownAfterJobFinishes (#1269)
- aa17363 [RayJob] Add field to expose entrypoint num cpus in rayjob (#1359)
- 5de4a42 [RayJob] Add runtime env YAML field (#1338)
- 8682b2d [RayJob] Add spec.backoffLimit for retrying RayJobs with new clusters (#2192)
- 9a4de56 [RayJob] ClusterSelector shouldn't support SidecarMode (#4074)
- b33642f [RayJob] Deflaky RayJob e2e tests (#2963)
- af4f6ac [RayJob] Delete the Kubernetes Job and its Pods immediately when suspending (#1791)
- 9382c1f [RayJob] Enable job log streaming by setting PYTHONUNBUFFEREDin job container (#1375)
- 58a3ff0 [RayJob] Enhance RayJob DeletionStrategy to Support Multi-Stage Deletion (#4040)
- 528abc3 [RayJob] Fix RayJob status reconciliation (#1539)
- 370fc44 [RayJob] Follow up of RayJob deletion policy PR (#2763)
- edfc34f [RayJob] Improve dashboard client log (#1903)
- 4fb4578 [RayJob] Inject RAY_SUBMISSION_ID env variable for user provided submitter template (#1868)
- 91fcd3e [RayJob] Propagate error traceback string when GetJobInfo doesn't return valid JSON (#943)
- f191a75 [RayJob] RayJob deletion policy validation (#2771)
- 931f970 [RayJob] Refactor Rayjob E2E Tests to Use Server-Side Apply (#1927)
- 34a8d9f [RayJob] Rewrite RayJob envtest (#1916)
- 2281d9e [RayJob] Set missing CPU limit (#1899)
- f9b2cb1 [RayJob] Set the timeout of the HTTP client from 2 mins to 2 seconds (#1910)
- 7639b9d [RayJob] Sidecar Mode (#3971)
- 2583d85 [RayJob] Submit job using K8s job instead of checking Status and using DashboardHTTPClient (#1177)
- 0074129 [RayJob] Support ActiveDeadlineSeconds (#1933)
- f5d7131 [RayJob] Support deletion policies based on job status (#3731)
- c78c75b [RayJob] Transition to Completeif the JobStatus is STOPPED (#1855)
- 6b027b4 [RayJob] Unified checkBackoffLimitAndUpdateStatusIfNeeded codepath and add an e2e test for retry (#2215)
- 55a6688 [RayJob] UserMode -> InteractiveMode and check rayjob.spec.jobId instead of annotation (#2446)
- 024aaef [RayJob] Validate RayJob spec (#1813)
- 6087689 [RayJob] Validate whether runtimeEnvYAML is a valid YAML string (#1898)
- c9fa013 [RayJob] Yunikorn Integration (#3948)
- d0f1c3c [RayJob] [Doc] Add real-world Ray Job use case tutorial for KubeRay (#1361)
- 7f15e13 [RayJob] add Failing RayJob in HTTPMode e2e test for rayjob with retry (#2242)
- 27b1dca [RayJob] add Failing submitter K8s Job e2e test for rayjob with retry (#2226)
- bd33d54 [RayJob] add Light-weight RayJob Submitter (#3943)
- 1efaf68 [RayJob] add RayJob pass Deadline e2e-test with retry (#2241)
- 7e04b22 [RayJob] allow create verb for services/proxy, which is required for HTTPMode (#2321)
- 0106303 [RayJob] avoid RayCluster resource leak in k8s job mode(#3903) (#4080)
- 0544f8b [RayJob] implement deletion policy API (#2643)
- 72a7767 [RayJob] remove redundant RayJob status-transition logs in reconciler (#3976)
- 1c5f3e8 [RayJob]: Add RayJob with RayCluster spec e2e test (#1636)
- 631cd7c [RayJob]: Always use target RayCluster image as default RayJob submitter image (#1548)
- b0fee80 [RayJob][10/n] Add finalizer to the RayJob when the RayJob status is JobDeploymentStatusNew (#1780)
- c0b6b0d [RayJob][Chore] make err as a local variable (#2789)
- 0274faa [RayJob][Doc] Fix RayJob sample config. (#807)
- bcc8c09 [RayJob][Fix] Use --no-wait for job submission to avoid carrying the error return code to the log tailing (#3216)
- c45d959 [RayJob][Kueue] Move limitation check to validateRayJobSpec (#1854)
- 0ed5e7e [RayJob][Refactor] use ray job statusandray jog lobsto be tolerant of duplicated job submissions (#2579)
- ce4ec27 [RayJob][Status][1/n] Redefine the definition of JobDeploymentStatusComplete (#1719)
- 4ff389b [RayJob][Status][11/n] Refactor the suspend operation (#1782)
- 349068d [RayJob][Status][12/n] Resume suspended RayJob (#1783)
- 448e33d [RayJob][Status][13/n] Make suspend operation atomic by introducing the new status Suspending(#1798)
- a9c7abb [RayJob][Status][14/n] Decouple the Initializing status and Running status (#1801)
- f654665 [RayJob][Status][15/n] Unify the codepath for the status transition to Suspended(#1805)
- 83327f2 [RayJob][Status][16/n] Refactor Running status (#1807)
- 1eed068 [RayJob][Status][17/n] Unify the codepath for status updates (#1814)
- c55f3cc [RayJob][Status][18/n] Control the entire lifecycle of the Kubernetes submitter Job using KubeRay (#1831)
- d5d7e5f [RayJob][Status][19/n] Transition to Completeif the K8s Job fails (#1833)
- ba42038 [RayJob][Status][2/n] Redefine readyfor RayCluster to avoid using HTTP requests to check dashboard status (#1733)
- 8760d90 [RayJob][Status][3/n] Define JobDeploymentStatusInitializing (#1737)
- 62bbc13 [RayJob][Status][4/n] Remove some JobDeploymentStatus and updateState function calls (#1743)
- 1594e88 [RayJob][Status][5/n] Refactor getOrCreateK8sJob (#1750)
- d49a7af [RayJob][Status][6/n] Redefine JobDeploymentStatusComplete and clean up K8s Job after TTL (#1762)
- 59503c6 [RayJob][Status][7/n] Define JobDeploymentStatusNew explicitly (#1772)
- cac7648 [RayJob][Status][8/n] Only a RayJob with the status Running can transition to Complete at this moment (#1774)
- 6af407d [RayJob][Status][9/n] RayJob should not pass any changes to RayCluster (#1776)
- 3e64de3 [RayJob][Test] make sure annotation populated to RayCluster (#3199)
- 7f33c1d [RayJob][Test] refactor TestValidateRayJobSpec with table test (#3223)
- 834aed3 [RayService] Add New Status: NumServeEndpoints (#1901)
- bbb65b4 [RayService] Add RayService High Availability Test Doc (#1986)
- e062d07 [RayService] Add RayService alb ingress CR (#1169)
- 1c276f7 [RayService] Add a safeguard and remove the dead code to ensure that both clusters are not empty before reconciling serve (#2778)
- c13949e [RayService] Add an envtest for RayService happy path (#2868)
- c807790 [RayService] Add an envtest for autoscaler (#2872)
- 62faf27 [RayService] Add checks of RayService conditions in e2e tests (#2864)
- 8db4f6d [RayService] Add e2e tests (#1167)
- 0ee3983 [RayService] Add logs and remove in-place update for the TestOldHeadPodFailDuringUpgrade e2e test (#2819)
- 44c0d50 [RayService] Add support for multi-app config in yaml-string format (#1156)
- 77a1023 [RayService] Add unit tests for isZeroDowntimeUpgradeEnabled(#2871)
- 355de9a [RayService] Add zero-downtime triggered test after rayVersion is updated (#2881)
- de3e037 [RayService] Address Recent Flakiness in RayService Zero Downtime Rollout Test (#1979)
- a6cf6e0 [RayService] Allow updating WorkerGroupSpecs without rolling out new cluster (#1734)
- f46f328 [RayService] Always check the readiness of head Pods for both pending / active clusters if cluster exists (#2783)
- edd332b [RayService] Avoid Duplicate Serve Service (#1867)
- 1a4254c [RayService] Avoid passing RayServiceStatus to functions in reconcileServe (#2828)
- f932962 [RayService] Avoid sending health check requests to the head Pod when excludeHeadPodFromServeSvcis true (#2776)
- 80cab41 [RayService] Calculate status based on K8s resources (#2818)
- 47d55fe [RayService] Change runtime env for e2e autoscaling test (#1178)
- 850fd48 [RayService] Compare cached hashed config before triggering update (#655)
- 5a5534f [RayService] Create k8s events after creating/updating k8s resources (#2873)
- 41ee4db [RayService] Deflaky RayService envtest (#2962)
- 2970e36 [RayService] Deprecate the built-in ingress support of RayService (#1843)
- a03f721 [RayService] Fixed issue where the custom serve port is not reflected in the serve health check for worker Pods (#1816)
- deb29bd [RayService] Ignore deployments status to decide whether to deploy serve application (#1014)
- 33ee672 [RayService] Mark ServiceStatus as deprecated (#2863)
- 18bee57 [RayService] Merge initConditionsintocalculateConditions(#2866)
- 0143fef [RayService] More envtests that follow the most common scenario in the RayService code path (#2880)
- 96a2ce6 [RayService] Move HTTP Proxy's Health Check to Readiness Probe for Wokers (#1808)
- 81d7608 [RayService] Move cleanUpRayClusterInstancefromreconcileRayClustertoReconcile(#2838)
- e1bee82 [RayService] Move the cluster switch logic from reconcileServetoReconcile(#2777)
- 495c0aa [RayService] Move the update of RayClusterStatustocalculateStatus(#2826)
- a31e094 [RayService] Passing serve applications to calculateStatusand avoid callingStatus().Update(...)insidereconcileServe(#2831)
- 5b8e9c6 [RayService] Refactor createRayClusterInstance(#2874)
- 1265980 [RayService] Refactor reconcileRayClusterto avoid updating CR status in the function (#2859)
- bb31661 [RayService] Refactor updateRayClusterInstance(#2875)
- ab93442 [RayService] Refactor envtests (#2888)
- 8e1b922 [RayService] Refactor fake http proxy client and test (#2636)
- e616dc4 [RayService] Refactor to Rely More on RayService Status in RayService E2E Tests (#1928)
- bccb358 [RayService] Refactor unit tests for ShouldPrepareNewCluster (#2928)
- 17a534d [RayService] Remove WaitForServeDeploymentReady (#2842)
- 26cdacd [RayService] Remove HealthLastUpdateTimefromServeDeploymentStatus(#2825)
- 9263dc6 [RayService] Remove updateStatusForActiveCluster(#2827)
- 9c9797b [RayService] Remove everything related to Ray Serve V1 API (#1790)
- 019a6cd [RayService] Remove outdated env tests (#2886)
- 3c080dc [RayService] Remove serve v1 API (#1779)
- 9e4aa8a [RayService] Remove the dependencies between constructRayClusterForRayServiceand the reconciler to make it more unit testable (#2853)
- 7ef5654 [RayService] Rename Restarting to PreparingNewCluster (#2785)
- 3959509 [RayService] Revisit the conditions under which a RayService is considered unhealthy and the default threshold (#1293)
- 2f8ee7f [RayService] Setting observedGeneration inside calculateStatus (#2869)
- 0df4d8a [RayService] Skip update events without change (#811)
- c3f3736 [RayService] Stable Diffusion example (#1181)
- b0649c4 [RayService] Submit requests to the Dashboard after the head Pod is running and ready (#1074)
- 2acc219 [RayService] Support Incremental Zero-Downtime Upgrades (#3166)
- 7940407 [RayService] Track whether Serve app is ready before switching clusters (#730)
- 64da63b [RayService] Trim Redis Cleanup job less than 63 chars (#2846)
- 7fd79f8 [RayService] Unify multi-app and single-app codepath (#1787)
- 46355ed [RayService] Unify the cluster switch over logic together (#2805)
- ecd1539 [RayService] Update docs to use multi-app (#1179)
- 25f787b [RayService] Use DashboardPort for RayService instead of DashboardAgentPort (#1742)
- f7cf955 [RayService] Use Ready condition in e2e tests (#2849)
- 8ea39da [RayService] Use Ready condition in e2e tests (#2854)
- 4e912b9 [RayService] Use original ClusterIP for new head service (#2343)
- 9be883f [RayService] Use waitGroup to ensure goroutine completion in rayservice_ha_test (#2657)
- b753f1a [RayService] a safeguard for preventing overriding the pending cluster during a upgrade (#2887)
- f88b2fe [RayService] adapter vllm 0.6.1.post2 (#2823)
- b66763d [RayService] don't update serveConfigV2 in current ray cluster if ray… (#3559)
- a612670 [RayService] e2e for check the readiness of head Pods for both pending / active clusters (#2806)
- 8f75ad5 [RayService] e2e for redeploying RayServe application after recreating a new Head Pod (#2834)
- 78d030a [RayService] fix kubebuilder printcolumn annotations for RayService (#1981)
- 0056fbf [RayService] make RayClusterSpec required (#3169)
- 19924c3 [RayService] make checkIfNeedSubmitServeApplicationsmore unit testable (#2822)
- e11fe54 [RayService] refactor envtest by adding a util function rayServiceTemplate(#2833)
- d64bf59 [RayService] reword the comment on ServiceStatus = rayv1.Running(#2848)
- 2e8f532 [RayService][Bug] Serve Service May Select Pods That Are Actually Unready for Serving Traffic (#1856)
- 19054cb [RayService][Doc] RayService troubleshooting handbook (#1221)
- 73f4f21 [RayService][HA] Fix flaky tests (#1823)
- 6c2281c [RayService][Health-Check][1/n] Offload the health check responsibilities to K8s and RayCluster (#1656)
- 4557a01 [RayService][Health-Check][2/n] Remove the hotfix to prevent unnecessary HTTP requests (#1658)
- aa42f8b [RayService][Health-Check][3/n] Update the definition of HealthLastUpdateTime for DashboardStatus (#1659)
- 07d14de [RayService][Health-Check][4/n] Remove the health check for Ray Serve applications (#1660)
- 584132c [RayService][Health-Check][5/n] Remove unused variable deploymentUnhealthySecondThreshold (#1664)
- ed56a95 [RayService][Health-Check][6/n] Remove ServiceUnhealthySecondThreshold (#1665)
- 2767768 [RayService][Health-Check][7/n] Remove LastUpdateTime from multiple places (#1666)
- c54c3d9 [RayService][Health-Check][8/n] Add readiness / liveness probes (#1674)
- aad2fc6 [RayService][Hotfix] Hotfix for Flaky Zero Downtime Rollout Test (#1837)
- dd7789c [RayService][Observability] Add actionable logging messages for users when they do not specify ports for Ray Serve (#1218)
- 384a921 [RayService][Observability] Add more logging for RayService troubleshooting (#1230)
- 45d3a4f [RayService][Observability] Add more loggings about networking issues (#1282)
- 881008f [RayService][Refactor] Avoid flooding Kubernetes events (#2546)
- 3c8904c [RayService][Refactor] Change the ServeConfigs to nested map (#2591)
- 75dbbdf [RayService][Refactor] Remove ctrlResult (#2545)
- c620582 [RayService][Status][1/n] Remove DashboardStatus (#1839)
- 0575bd1 [RayService][Status][2/n] Remove WaitForDashboard (#1840)
- 57c6397 [RayService][Test] create curl pod waiting until running (#3740)
- 594eafc [RayService][Test] make sure annotation populated to RayCluster (#3210)
- c3b3354 [RayService][Test] util for creating empty RayClusterSpec in test (#3182)
- 39d1456 [RayService][refactor] Remove updateState(#2705)
- da6b356 [Refactor] Add a util function IsAutoscalingEnabled and refactor validations of RayJob deletion policy (#2775)
- c814963 [Refactor] Define the value type of the concurrent map explicitly to avoid type conversion (#1789)
- c5d7de6 [Refactor] Do not use RAYCLUSTER_DEFAULT_REQUEUE_SECONDS_ENV as timeout of status check in tests (#1755)
- b898828 [Refactor] Eliminate redundant range variable capture with Go 1.22 scoped iteration (#4044)
- 7a96221 [Refactor] Encapsulate RayCluster metrics in a custom Prometheus collector (#3310)
- 0b72901 [Refactor] Encapsulate RayJob metrics in a custom Prometheus collector (#3444)
- b875b85 [Refactor] Extract KubectlApplyYaml and yaml deserialization to support package (#2498)
- 59ae107 [Refactor] Fix CreatedWorkerPod for worker Pod deletion event and refactor logs (#2346)
- f38951f [Refactor] Follow-up for PR 1930 (#2124)
- a616a45 [Refactor] Format API server Makefile for consistency (#3435)
- a83d3c1 [Refactor] Improve API server developer experience (#3458)
- 4ff8316 [Refactor] Improve developer experience of API server e2e-test (#3466)
- 4492fe2 [Refactor] Make port name variables consistent and meaningful (#1389)
- 7f02eb7 [Refactor] Merge raycluster_gcs_ft_test.go and raycluster_gcsft_test.go (#3008)
- 298539d [Refactor] Move ValidateRayJobStatus to validation.go and create its unit test (#2813)
- 8c53bd5 [Refactor] Move ValidateRayClusterSpectovalidation.goand itsunit testtovalidation_test.go(#2790)
- 8dd2496 [Refactor] Move validateRayClusterStatusfunction tovalidation.goand move unit test tovalidation_test.go(#2780)
- 84f7368 [Refactor] Move constant.go from common to utils to avoid circular dependency (#1726)
- 3d1c6c3 [Refactor] Move function ValidateRayJobSpec to validation.go and its unit test (#2812)
- 28ab5c9 [Refactor] Move functions that don’t rely on the controller to non-controller member functions (#2747)
- 0867021 [Refactor] Move test name from map key to struct field (#2865)
- 5ccf361 [Refactor] Move validateRayServiceSpec to validation.go and its unit test to validation_test.go (#2816)
- 3a1fedb [Refactor] Parameterize TestGetAndCheckServeStatus (#1450)
- b775821 [Refactor] RayJob Spec ClusterSelector validation logic (#4032)
- 83104b7 [Refactor] Refactor testRayJob global variable to avoid test side effects (#4017)
- 3d533b4 [Refactor] Remove Dashboard Agent service (#1207)
- 3748746 [Refactor] Remove any unnecessary logger (#1894)
- bafb009 [Refactor] Remove cleanupInvalidVolumeMounts (#2104)
- 03eb92c [Refactor] Remove duplicate definition of get_ray_cluster_status(#3608)
- eee9d94 [Refactor] Remove global utils.GetRayXXXClientFuncs (#1727)
- 5007993 [Refactor] Rename EnableAgentService to EnableServeService (#1673)
- 76889ca [Refactor] Rename raycluster_controller_fake_test.go to XXX_unit_test.go (#2074)
- 4836d01 [Refactor] Renaming RayHttpProxyClient attribute UseProxy #1980 (#2093)
- 542f246 [Refactor] Replace Hard-Coded HTTP Values with Constants (#2702)
- 0f2f441 [Refactor] Rewrite RayCluster envtest (#1949)
- dcc8b71 [Refactor] Run golangci-li...