Merged pull requests:
- extends path in __init__.py for SDK correctly #1531 (cakeislife100)
- Update manifests with latest image tag #1527 (johnugeorge)
- add option for mpi kubectl delivery #1525 (zw0610)
- restore option namespace in launch arguments #1524 (zw0610)
- remove unused scripts #1521 (zw0610)
- remove ChanYiLin from approvers #1513 (ChanYiLin)
- add StacktraceLevel for zapr #1512 (qiankunli)
- add unit tests for tensorflow controller #1511 (zw0610)
- add the example of MPIJob #1508 (hackerboy01)
- Added 2022 roadmap and migrated previous roadmap from kubeflow/common #1500 (terrytangyuan)
- Fix a typo in mpi controller log #1495 (LuBingtan)
- feat(pytorch): Add init container config to avoid DNS lookup failure #1493 (gaocegege)
- chore: Fix GitHub Actions script #1491 (tenzen-y)
- chore: Fix missspell in tfjob #1490 (tenzen-y)
- chore: Update OWNERS #1489 (gaocegege)
- Bump jinja2 from 2.10.1 to 2.11.3 in /py/kubeflow/tf_operator #1487 (dependabot[bot])
- fix comments for mpi-controller #1485 (hackerboy01)
- add expectation-related functions for other resources used in mpi-controller #1484 (zw0610)
- Add MPI job to README now that it's supported #1480 (terrytangyuan)
- add mpi doc #1477 (zw0610)
- Set Go version of base image to 1.17 #1476 (tenzen-y)
- update label for tf-controller #1474 (zw0610)
- Add Akuity to the list of adopters #1473 (terrytangyuan)
- Add PR template with doc checklist #1470 (andreyvelich)
- Add e2e failure debugging guidance #1469 (Jeffwan)
- chore: Add .gitattributes to ignore Jsonnet test code for linguist #1463 (terrytangyuan)
- Migrate additional examples from xgboost-operator #1461 (terrytangyuan)
- Minor edits to README.md #1460 (terrytangyuan)
- add mpi-operator(v1) to the unified operator #1457 (hackerboy01)
- fix tfjob status when enableDynamicWorker set true #1455 (zw0610)
- feat(pytorch): Support elastic training #1453 (gaocegege)
- fix: generate printer columns for job crds #1451 (henrysecond1)
- Fix README typo #1450 (davidxia)
- consistent naming for better readability #1449 (pramodrj07)
- Fix set scheduler error #1448 (qiankunli)
- Add CI to run the tests for Go #1440 (tenzen-y)
- fix: Add missing retrying package that failed the import #1439 (terrytangyuan)
- Generate a single
swagger.json
file for all frameworks #1437 (alembiewski) - Update links and files with the new URL #1434 (andreyvelich)
- chore: update CHANGELOG.md #1432 (Jeffwan)
- Add acknowledgement section in README to credit all contributors #1422 (terrytangyuan)
- Add Cisco to Adopters List #1421 (andreyvelich)
- Add Python SDK for Kubeflow Training Operator #1420 (alembiewski)
- docs: Move myself to approvers #1419 (terrytangyuan)
- fix hyperlinks in the 'overview' section #1418 (pramodrj07)
- docs: Migrate adopters of all operators to this repo #1417 (terrytangyuan)
- Feature/support pytorchjob set queue of volcano #1415 (qiankunli)
- Bump controller-tools to 0.6.0 and enable GenerateEmbeddedObjectMeta #1409 (Jeffwan)
- Update scripts to generate sdk for all frameworks #1389 (Jeffwan)
Closed issues:
- Question: What is the recommended way for Data Scientists to run a distributed training job #1535
- Restore KUBEFLOW_NAMESPACE options #1522
- Improve test coverage #1497
- swagger.json missing Pytorchjob.Spec.ElasticPolicy #1483
- [bug] Missing init container in PyTorchJob #1482
- PytorchJob DDP training will stop if I delete a worker pod #1478
- Write down e2e failure debug process #1467
- How can i add the Priorityclass to the TFjob? #1466
- github.com/go-logr/zapr.(*zapLogger).Error #1444
- Display coverage % in GitHub actions list #1442
- Add Go test to CI #1436
- Podgroup is constantly created and deleted after tfjob is success or failure #1426
- Cut official release of 1.3.0 #1425
- Add "not maintained" notice to other operator repos #1423
- Fail to install tf-operator in minikube because of the version of kubectl/kustomize #1381
- Python SDK for Kubeflow Training Operator #1380
- Rename this repo #1348
- Universal Operator Phase III: Graduate operator to production grade #1318