github kubeflow/training-operator v1.4.0

latest releases: v1.8.1, v1.8.0, v1.8.0-rc.1...
2 years ago

Full Changelog

Merged pull requests:

Closed issues:

  • Question: What is the recommended way for Data Scientists to run a distributed training job #1535
  • Restore KUBEFLOW_NAMESPACE options #1522
  • Improve test coverage #1497
  • swagger.json missing Pytorchjob.Spec.ElasticPolicy #1483
  • [bug] Missing init container in PyTorchJob #1482
  • PytorchJob DDP training will stop if I delete a worker pod #1478
  • Write down e2e failure debug process #1467
  • How can i add the Priorityclass to the TFjob? #1466
  • github.com/go-logr/zapr.(*zapLogger).Error #1444
  • Display coverage % in GitHub actions list #1442
  • Add Go test to CI #1436
  • Podgroup is constantly created and deleted after tfjob is success or failure #1426
  • Cut official release of 1.3.0 #1425
  • Add "not maintained" notice to other operator repos #1423
  • Fail to install tf-operator in minikube because of the version of kubectl/kustomize #1381
  • Python SDK for Kubeflow Training Operator #1380
  • Rename this repo #1348
  • Universal Operator Phase III: Graduate operator to production grade #1318

Don't miss a new training-operator release

NewReleases is sending notifications on new releases.