What's Changed
- docs: improve snapshot tool doc by @enoodle in #157
- [Refactor] Reclaimable api improvements by @itsomri in #148
- add pr coverage report by @enoodle in #154
- scheduler: Add LastStartTimestamp to PodGroup by @ArmedGuy in #153
- Add min-runtime configuration to queues by @ArmedGuy in #155
- chore: coverage update will open pr by @enoodle in #159
- chore: add missing token in action by @enoodle in #160
- Document GPU Sharing with MPS by @omer-dayan in #158
- fix coverage pr reports and badge generation by @enoodle in #166
- fix update coverage badge by @enoodle in #171
- Run scenario filters on the no potential victims scenario by @davidLif in #164
- Update PodGrouper docs to match the latest implementation by @romanbaron in #177
- Prep changelog for v0.5 version branch by @itsomri in #174
- refactor out reclaimerinfo from API by @ArmedGuy in #172
- Update README.md by @ronendar in #175
- Pre creating binding request, delete any pending status updates for t… by @davidLif in #178
- Don't add nodepool label for empty nodepool by @itsomri in #176
- Scheduler and PodGrouper use configurable nodepool label key by @romanbaron in #179
- Update CONTRIBUTING.md with coverage suggestion by @enoodle in #180
- Updating README.md with biweekly meeting details by @EkinKarabulut in #181
- Use more peek and Fix for the implementation of popNextJob instead of… by @davidLif in #152
- Renamed internally used runai names by @romanbaron in #189
- [Refactor] Encapsulate reclaimer info in proportion plugin by @itsomri in #192
- deploy with snapshot plugin enabled by @enoodle in #203
- Keep updated pod-groups data in a separate syncmap to allow better cleanup by @davidLif in #199
- Changed PodGroup comparison and removed notToUpdateAnnotations by @romanbaron in #202
- binder cdi flag added by @christophemacabiau in #209
- remove redundant
replicasvalue for binder by @slaupster in #200 - Changing Slack channel link to the kai-scheduler channel by @EkinKarabulut in #211
- Fix pod group status sync unitests by @davidLif in #212
- Roman/podgroup controller by @romanbaron in #215
- Removed
runai-job-idandrunai/job-idannotations from pods and p… by @romanbaron in #206 - Easy runai name renames by @romanbaron in #218
- Scheduler status update unitest improvment by @davidLif in #220
- Refactor scenario validators by @itsomri in #191
- Default priority class per workload type, read from configmap by @natasharomm in #216
- Made node role label keys configurable by @romanbaron in #217
- Add "local build" mode to running the e2e over kind by @davidLif in #219
- minruntime Plugin by @ArmedGuy in #162
- add queue controller by @enoodle in #214
- Do not create error events for successfully scheduled podGroups by @davidLif in #229
- add queue controller tests by @enoodle in #231
- Allow the pod-grouper ray plugin to generate a pod-group for 0 workers in a rayCluster by @davidLif in #230
- Fix default_status_updater so Annotation updates are properly applied by @ArmedGuy in #234
- Unitest fix - TestDefaultStatusUpdater_RecordJobStatusEvent by @davidLif in #232
- Fix flaky consolidation e2e test: wait only for undeleted jobs by @itsomri in #235
- Vendor neutrality migration guide by @romanbaron in #224
- propagate scheduler service namespace for leader election in HA by @iris-shain-runai in #236
New Contributors
- @ronendar made their first contribution in #175
- @christophemacabiau made their first contribution in #209
- @slaupster made their first contribution in #200
- @natasharomm made their first contribution in #216
- @iris-shain-runai made their first contribution in #236
⚠️ Migration guides:
This version introduces two breaking changes, for migration from previous releases please follow the instructions here:
https://github.com/NVIDIA/KAI-Scheduler/tree/main/docs/migrationguides
Full Changelog: v0.5.1...v0.6.0