Highlights
The KubeRay 0.5.0 release includes the following improvements.
- Interact with KubeRay via a Python client
- Integrate KubeRay with Kubeflow to provide an interactive development environment (link).
- Integrate KubeRay with Ray TLS authentication
- Improve the user experience for KubeRay on AWS EKS (link)
- Fix some Kubernetes networking issues
- Fix some stability bugs in RayJob and RayService
Contributors
The following individuals contributed to KubeRay 0.5.0. This list is alphabetical and incomplete.
@akanso @alex-treebeard @architkulkarni @cadedaniel @cskornel-doordash @davidxia @DmitriGekhtman @ducviet00 @gvspraveen @harryge00 @jasoonn @Jeffwan @kevin85421 @psschwei @scarlet25151 @sihanwang41 @wilsonwang371 @Yicheng-Lu-llll
Python client (alpha)(New!)
Kubeflow (New!)
- [Feature][Doc] Kubeflow integration (#937, @kevin85421)
- [Feature] Ray restricted podsecuritystandards for enterprise security and Kubeflow integration (#750, @kevin85421)
TLS authentication (New!)
- [Feature] TLS authentication (#989, @kevin85421)
AWS EKS (New!)
- [Feature][Doc] Access S3 bucket from Pods in EKS (#958, @kevin85421)
Kubernetes networking (New!)
- Read cluster domain from resolv.conf or env (#951, @harryge00)
- [Feature] Replace service name with Fully Qualified Domain Name (#938, @kevin85421)
- [Feature] Add default init container in workers to wait for GCS to be ready (#973, @kevin85421)
Observability
- Fix issue with head pod not monitered by Prometheus under certain condition (#963, @Yicheng-Lu-llll)
- [Feature] Improve and fix Prometheus & Grafana integrations (#895, @kevin85421)
- Add example and tutorial to explain how to create custom metrics for Prometheus (#914, @Yicheng-Lu-llll)
- feat: enrich
kubectl get
output (#878, @davidxia)
RayCluster
- Fix issue with operator OOM restart (#946, @wilsonwang371)
- [Feature][Hotfix] Add observedGeneration to the status of CRDs (#979, @kevin85421)
- Customize the Prometheus export port (#954, @Yicheng-Lu-llll)
- [Feature] The default ImagePullPolicy should be IfNotPresent (#947, @kevin85421)
- Inject the --block option to ray start command automatically (#932, @Yicheng-Lu-llll)
- Inject cluster name as an environment variable into head and worker pods (#934, @Yicheng-Lu-llll)
- Ensure container ports without names are also included in the head node service (#891, @Yicheng-Lu-llll)
- fix:
.status.availableWorkerReplicas
(#887, @davidxia) - fix: only filter RayCluster events for reconciliation (#882, @davidxia)
- refactor: remove redundant import in
raycluster_controller.go
(#884, @davidxia) - refactor: use equivalent, shorter
Builder.Owns()
method (#881, @davidxia) - [RayCluster controller] [Bug] Unconditionally reconcile RayCluster every 60s instead of only upon change (#850, @architkulkarni)
- [Feature] Make head serviceType optional (#851, @kevin85421)
- [RayCluster controller] Add headServiceAnnotations field to RayCluster CR (#841, @cskornel-doordash)
RayJob (alpha)
- [Hotfix][release blocker][RayJob] HTTP client from submitting jobs before dashboard initialization completes (#1000, @kevin85421)
- [RayJob] Propagate error traceback string when GetJobInfo doesn't return valid JSON (#943, @architkulkarni)
- [RayJob][Doc] Fix RayJob sample config. (#807, @DmitriGekhtman)
RayService (alpha)
- [RayService] Skip update events without change (#811, @sihanwang41)
Helm
- Add rayVersion in the RayCluster chart (#975, @Yicheng-Lu-llll)
- [Feature] Support environment variables for KubeRay operator chart (#978, @kevin85421)
- [Feature] Add service account section in helm chart (#969, @ducviet00)
- Update apiserver chart location in readme (#896, @psschwei)
- add sidecar container option (#920, @akihikokuroda)
- match selector of service to pod labels (#918, @akihikokuroda)
- [Feature] Nodeselector/Affinity/Tolerations value to kuberay-apiserver chart (#879, @alex-treebeard)
- [Feature] Enable namespaced installs via helm chart (#860, @alex-treebeard)
- Remove unused fields from KubeRay operator and RayCluster charts (#839, @kevin85421)
- [Bug] Remove an unused field (ingress.enabled) from KubeRay operator chart (#812, @kevin85421)
- [helm] Add memory limits and resource documentation. (#789, @DmitriGekhtman)
CI
- [Feature] Add python client test to action (#993, @jasoonn)
- [CI][Buildkite] Fix the PATH issue (#952, @kevin85421)
- [CI][Buildkite] An example test for Buildkite (#919, @kevin85421)
- refactor: Fix flaky tests by using RetryOnConflict (#904, @Yicheng-Lu-llll)
- Use k8sClient from client.New in controller test (#898, @Yicheng-Lu-llll)
- [Bug] Fix flaky test: should be able to update all Pods to Running (#893, @kevin85421)
- Enable test framework to install operator with custom config and put operator in a namespace with enforced PSS in security testing (#876, @Yicheng-Lu-llll)
- Ensure all temp files are deleted after the compatibility test (#886, @Yicheng-Lu-llll)
- Adding a test for the document for the Pod security standard (#866, @Yicheng-Lu-llll)
- [Feature] Run config tests with the latest release of KubeRay operator (#858, @kevin85421)
- [Feature] Define a general-purpose cleanup method for CREvent (#849, @kevin85421)
- [Feature] Remove Docker container and NodePort from compatibility test (#844, @kevin85421)
- Remove Docker from BasicRayTestCase (#840, @kevin85421)
- [Feature] Move some functions from prototype test framework to a new utils file (#837, @kevin85421)
- [CI] Add workflow to manually trigger release image push (#801, @DmitriGekhtman)
- [CI] Pin go version in CRD consistency check (#794, @DmitriGekhtman)
- [Feature] Improve the observability of integration tests (#775, @jasoonn)
Sample YAML files
- Improve ray-cluster.external-redis.yaml (#986, @Yicheng-Lu-llll)
- remove ray-cluster.getting-started.yaml (#987, @Yicheng-Lu-llll)
- [Feature] Read Redis password from Kubernetes Secret (#950, @kevin85421)
- [Ray 2.3.0] Update --redis-password for RayCluster (#929, @kevin85421)
- [Bug] KubeRay does not work on M1 macs. (#869, @kevin85421)
- [Post Ray 2.3 Release] Update Ray versions to Ray 2.3.0 (#925, @cadedaniel)
- [Post Ray 2.2.0 Release] Update Ray versions to Ray 2.2.0 (#822, @DmitriGekhtman)
Documentation
- Update contribution doc to show users how to reach out via slack (#936, @gvspraveen)
- [Feature][Docs] Explain how to specify container command for head pod (#912, @kevin85421)
- [post-0.4.0 KubeRay release] update proto version to 0.4.0 (#830, @scarlet25151)
- [0.4.0 release] Update changelog for KubeRay 0.4.0 (#836, @DmitriGekhtman)
- [Docs] Revise release note docs (#835, @DmitriGekhtman)
- [release] Add release command and guidance for KubeRay cli (#834, @Jeffwan)
- [Release] Add tools and docs for changelog generator (#833, @Jeffwan)
- [Bug] error: git cmd when following docs (#831, @kevin85421)
- [post-0.4.0 KubeRay release] Update KubeRay versions (#821, @DmitriGekhtman)
- [Feature][Doc] End-to-end KubeRay operator development process on Kind (#826, @kevin85421)
- [Release][Docs] Update release instructions (#819, @DmitriGekhtman)
- [docs] Tweaks to main README, add basic API Server README. (#809, @DmitriGekhtman)
- update docs for release v0.4.0 (#778, @scarlet25151)
- [docs] Update KubeRay operator README. (#808, @DmitriGekhtman)
- [Release] Update docs for release v0.4.0 (#779, @kevin85421)