New Features
- Add clientset for MPIJob, PytorchJob, MXJob, and XGBoostJob #1610 (tenzen-y)
- Add all generation tools to Makefile #1609 (johnugeorge)
- Adding MPI python sdk #1608 (johnugeorge)
- Adding XGboost Python sdk #1607 (johnugeorge)
- Generating MPI python sdk #1606 (johnugeorge)
- Update k8s dependencies to v0.24.1 #1604 (johnugeorge)
- Migrate test framework to GHA #1603 (johnugeorge)
- Add mpi in update-codegen.sh #1600 (ggaaooppeenngg)
- MXNet SDK with Status check fix #1618 (johnugeorge)
Bug Fixes
- fix: MPIJob worker still running when NotEnoughResources #1621 (hackerboy01)
- fix comments for pytorch-controller #1620 (hackerboy01)
- fix: requeue when expire time is not up yet #1614 (Garrybest)
- Look for fully-qualified job role label in Python sdk #1588 (person142)
- fix torch env typo #1573 (kuizhiqing)
- Restart job on failure for Always,OnFailure Policy #1572 (georgkaleido)
- Increase success threshold #1568 (haoxins)
- update status.startTime for pytorchjob and xgboostjob #1567 (cheimu)
- fix: add mpijobs to kubeflow training role #1565 (henrysecond1)
- fix Pytorjob status inaccuracy when task replica scale down #1593 (PeterChg)
- fix: MPIJob cannot use gang-scheduling when --enable-gang-scheduling is set #1557 (cheimu)
- fix api reader issue #1551 (zw0610)
- fix label and CleanPodPolicy for mpi-controller #1550 (zw0610)
- fix UpdateJobStatusInApiServer when gang-scheduling is enabled #1549 (zw0610)
- fix: add namespace filtering when getting pods/services for jobs #1545 (henrysecond1)
- fix: set mpijob runPolicy.cleanPodPolicy to default none #1554 (cheimu)
Misc
- Update training controller image to latest #1625 (johnugeorge)
- Update SDK version to 1.5.0 #1624 (johnugeorge)
- Upgrade common to v0.4.3 #1623 (johnugeorge)
- Adding GHA for automatic image build and push #1615 (johnugeorge)
- Remove presubmit test depending on optional-test-infra #1596 (aws-kf-ci-bot)
- chore: stop action on first fail #1595 (jasonliu747)
- update img url in design doc #1591 (zw0610)
- Remove uncalled mpi-controller DeletePodsAndServices() #1558 (cheimu)
- Update MPIJob unit tests to use spec.runPolicy.cleanPodPolicy #1556 (cheimu)
- Remove
table-logger
dependency #1544 (person142) - Bump pyyaml from 5.1 to 5.4 in /py/kubeflow/tf_operator #1542 (dependabot[bot])