github kubeflow/training-operator v1.6.0-rc.1
v1.6.0-rc.1 release

latest releases: v1.8.1, v1.8.0, v1.8.0-rc.1...
pre-release19 months ago

Note: Since scheduler-plugins has changed API from sigs.k8s.io with the x-k8s.io, future releases of training operator(v1.7+) will not support scheduler-plugins v0.24.x or lower

Merged pull requests:

Closed issues:

  • The default value for CleanPodPolicy is inconsistent. #1753
  • HPA support for PyTorch Elastic #1751
  • Bug: allowance of non DNS-1035 compliant PyTorchJob names results in service creation failures and missing state #1745
  • paddle-operator can not get podgroup status(inqueue) with volcano when enable gang #1729
  • *job API(master) cannot compatible with old job #1725
  • Support coscheduling plugin #1722
  • Number of worker threads used by the controller can't be configured #1706
  • Conformance: Training tests #1698
  • PyTorch and MPI Operator pulls hardcoded initContainer #1696
  • PaddlePaddle Training: why can't find pods #1694
  • Training-operator pod CrashLoopBackOff in K8s v1.23.6 with kubeflow1.6.1 #1693
  • [SDK] Create unify client for all Training Job types #1691
  • Support Kubernetes v1.25 #1682
  • panic happened when add podgroup watch #1679
  • OnDependentUpdateFunc for Job will panic when enable volcano scheduler #1678
  • There is no clusterrole of "MPI Jobs" in kubeflow 1.5. #1670
  • Change Kubernetes version for test #1665
  • Support for multiplatform container imege (amd64 and arm64) #1664
  • Training Operator pod failed to start on OCP 4.10.30 with error "memory limit too low" #1661
  • After setting hostNetwork to true, mpi does not work #1657
  • What is the purpose of /examples/pytorch/elastic/etcd.yaml #1655
  • When will MPIJob support v2beta1 version? #1653
  • Kubernetes HPA doesn't work with elastic PytorchJob #1645
  • training-operator can not get podgroup status(inqueue) with volcano when enable gang #1630
  • Training operator fails to create HPA for TorchElastic jobs #1626
  • Release v1.5.0 tracking #1622
  • upgrade client-go #1599
  • trainning-operator may need to monitor PodGroup #1574
  • Error: invalid memory address or nil pointer dereference #1553
  • The pytorchJob training is slow #1532
  • pytorch elastic scheduler error #1504

Don't miss a new training-operator release

NewReleases is sending notifications on new releases.