github kubeflow/trainer v2.0.0-rc.1

latest releases: v2.1.0, v2.1.0-rc.1, v2.1.0-rc.0...
pre-release5 months ago

This is the Kubeflow Trainer v2.0.0-rc.1 pre-release.

New Features

  • [release-2.0] feat: Add schedulingGates to PodSpecOverrides (#2705 by @astefanutti)
  • [release-2.0] feat: Mutable PodSpecOverrides for suspended TrainJob (#2698 by @astefanutti)
  • [Release 2.0] KEP-2170: Add the manifests overlay for Kubeflow Training V2 (#2692 by @Doris-xm)

Bug Fixes

  • [release-2.0] fix(module): Change Go module name to v2 (#2708 by @andreyvelich)
  • [cherry-pick] fix(manifests): Update manifests to enable LLM fine-tuning workflow w… (#2696 by @Electronic-Waste)
  • [release-2.0] fix(plugins): Fix some errors in torchtune mutation process. (#2693 by @Electronic-Waste)
  • [release-2.0] fix(rbac): Add required RBAC to update ClusterTrainingRuntimes on OpenShift (#2684 by @astefanutti)

Misc

  • [release-2.0] chore: Copy generated CRDs into Helm charts (#2704 by @astefanutti)
  • [cherry-pick] feat(example): Add alpaca-trianjob-yaml.ipynb. (#2670) (#2702 by @Electronic-Waste)
  • [release-2.0] chore: Replace the deprecated intstr.FromInt with intstr.FromInt32 (#2697 by @tenzen-y)
  • [release-2.0] chore: Remove the vendor specific parameters (#2694 by @tenzen-y)
  • [release-2.0] chore(runtime): Bump Torch to 2.7.1 and DeepSpeed to 0.17.1 (#2687 by @andreyvelich)
  • [release-2.0] chore(helm): Sync ClusterRule in Helm chart (#2688 by @astefanutti)

Don't miss a new trainer release

NewReleases is sending notifications on new releases.