This is the Kubeflow Trainer v2.2.0 release 🚀
You can now deploy Trainer control plane and runtimes with a single Helm install:
helm install kubeflow-trainer oci://ghcr.io/kubeflow/charts/kubeflow-trainer \
--namespace kubeflow-system \
--create-namespace \
--version 2.2.0 \
--set runtimes.defaultEnabled=trueInstall Kubeflow Python SDK:
pip install kubeflowFor more information, please see the Kubeflow Trainer docs.
Breaking Changes
- feat(api): BREAKING CHANGE: Replace PodTemplateOverrides with RuntimePatches API (#3309 by @astefanutti)
- feat(api): BREAKING CHANGE: Remove numProcPerNode from Torch API (#3239 by @andreyvelich)
- feat(api): BREAKING CHANGE: Remove ElasticPolicy API (#3235 by @andreyvelich)
- feat(api): Fix immutability of the TrainJob APIs (#3157 by @andreyvelich)
New Features
XGBoost & JAX Runtimes
- feat(runtimes): Add XGBoost runtime(KEP-2598) (#3200 by @Krishna-kg732)
- feat(docs): KEP-2598 XGBoost Runtime for Trainer V2 (#3118 by @Krishna-kg732)
- feat(runtimes): Add JAX training runtime (#3151 by @kaisoz)
- feat(cache): KEP-2655: Adding default runtime with cache and example (#2923 by @akshaychitneni)
Flux Runtime for MPI and HPC Workloads
- feat: support for flux framework as hpc manager (#3188 by @vsoch)
- feat: KEP 2841 Flux Policy to support Flux Framework (#2909 by @vsoch)
TrainJob Lifecycle
- feat(api): Set RuntimePatch.Time field automatically during admission (#3363 by @astefanutti)
- feat: add support for tracking TrainJob progress and training metrics (#3227 by @robert-bell)
- feat(docs): KEP-2779: Track TrainJob progress and expose training metrics (#2905 by @robert-bell)
- feat: add activeDeadlineSeconds (#3258 by @XploY04)
- feat(docs): proposal for adding TTLSecondsAfterFinished and ActiveDeadlineSeconds fields to TrainJob CRD (#3068 by @XploY04)
- chore: upstream istio support - superseding 3189 (#3259 by @sameerdattav)
- feat(runtimes): Use JobSet VolumeClaimPolicies APIs for LLM Runtimes (#3150 by @andreyvelich)
- feat(cache): KEP-2655 - Supporting readiness probes on cache nodes (#2904 by @akshaychitneni)
- feat(initializer): add s3 model and dataset initializers (#2728 by @rudeigerc)
- feat(api): Add securityContext support to PodTemplateSpecOverride in TrainJob (#3066 by @Sanskarzz)
Bug Fixes
- fix(initializer): add missing glob wildcard to .pt and .pth ignore p… (#3364 by @ghazariann)
- fix(ci): re-enable XGBoost E2E test (#3348 by @Krishna-kg732)
- fix(examples): Verify TrainJob Completion (#3344 by @andreyvelich)
- fix(docs): Update steps in release document (#3342 by @andreyvelich)
- fix: back to a10-1 for gpu e2e and time slicing (#3340 by @jaiakash)
- fix(ci): Generate valid release version for Python package (#3334 by @andreyvelich)
- fix(test): Fix Data Cache runtime in Helm Charts (#3241 by @andreyvelich)
- fix: failing e2e and gpu e2e tests (#3234 by @jaiakash)
- fix: align torch-distributed-with-cache runtime logic with unit tests (#3226 by @Goku2099)
- fix(ci): correct duplicate step name in
test-go.yaml(#3202 by @puwun) - fix: align torchao with torch 2.9.1 to fix GPU e2e failure (#3203 by @Goku2099)
- fix: Defer kubernetes imports to method level for use with local mode (#3167 by @Fiona-Waters)
- fix: service account test filename (#3153 by @aniketpati1121)
- fix(manifests): Remove jobset and lws patches from kustomize deployment (#3141 by @yosri-brh)
- fix: enable read-only root filesystem for trainer manager (#3119 by @Goku2099)
- fix: resourcePerNode override not applied with Volcano scheduler (#2982 by @sksingh2005)
- fix(operator): Prevent JobSet recreation when its TTL has expired (#3013 by @astefanutti)
- fix(operator): Use Patch to update TrainJob status (#3009 by @astefanutti)
- fix(manifests): Fix RBAC for ClusterTrainingRuntime Access (#3022 by @andreyvelich)
- fix(manifests): fix Prometheus metrics port mismatch (#3056 by @ChughShilpa)
- fix(manifests): Fix boolean values defaulting in Helm charts (#2913 by @astefanutti)
- fix(manifests): Fix Helm charts image name (#2915 by @andreyvelich)
- fix(manifests): Remove the default tag from the controller image (#2916 by @andreyvelich)
- fix: add
appVersionfield to Helm chart for Kubeflow Trainer (#3044 by @milinddethe15) - fix(runtimes): Update pip version in the MLX runtime (#2908 by @andreyvelich)
- fix(examples): Fix SSL certificate error for local MNIST example (#2971 by @astefanutti)
- fix(ci): Fix kube-api-linter install (#3023 by @astefanutti)
- fix(ci): Fix new contributors GH actions workflow lint errors (#3024 by @astefanutti)
- fix(ci): Fix the Kubeflow SDK installation with Docker (#2926 by @andreyvelich)
Misc
- chore(runtimes): persist runtimes map and expose Runtimes function (#3367 by @kaisoz)
- chore: Remove deprecated Python models (#3318 by @andreyvelich)
- fix: rename JAX and Torch runtime plugin tests to descriptive names (#3283 by @Amir380-A)
- fix(examples): add parameters to Fashion MNIST training function (#3301 by @krishdef7)
- chore(ci): Ignore Coveralls Errors (#3260 by @andreyvelich)
- fix: Enforce single ML policy constraint with CEL validation for Torch, MPI, and JAX (#3225 by @Krishna-kg732)
- feat: Helm test workflow (#3228 by @Goku2099)
- feat: add production-ready MNIST example for PyTorch (#3063 by @Snehadas2005)
- feat(runtimes): add support for ClusterTrainingRuntimes in Helm chart (#3124 by @khushiiagrawal)
- feat(cache): add Helm chart configuration for data_cache (#3080 by @khushiiagrawal)
- feat(examples): add torch.compile to PyTorch local examples (#3076 by @Ishtiyaque-Alam)
- feat(manifests): Publish Trainer Helm Charts (#2906 by @adity1raut)
- fix(test): Ignore version increment in Helm Chart lint (#3240 by @andreyvelich)
- feat: Code Quality Checks workflow (#3224 by @Goku2099)
- chore: Add comprehensive unit tests for Config API (#2893 by @kapil27)
- chore(operator): Use SSA throughout runtime framework (#2877 by @astefanutti)
- chore(operator): Remove Unstructured objects caching (#3010 by @astefanutti)
- chore: Expose trainer API version via public ConfigMap (#3083 by @sameerdattav)
- chore: changed latest to dev in trainer manifests (#3146 by @sameerdattav)
- chore: use named ports for manager deployment and service (#3100 by @Goku2099)
- chore: fix
make helm-lint(#3103 by @robert-bell) - feat: add scaffolding for feature gates (#3102 by @robert-bell)
- chore: Add welcome workflow for new contributors (#3017 by @ryanHwH20)
- chore: Nominate @akshaychitneni as Kubeflow Trainer reviewer (#3149 by @andreyvelich)
- chore: add dependabot to trainer repo (#2930 by @kannon92)
- chore: only update k8s dependencies via patches (#2969 by @kannon92)
- chore: migrate to a10.2 gpu for gpu e2e (#3220 by @jaiakash)
- chore(examples): Add device to local process MNIST training example (#3006 by @astefanutti)
- chore(examples): Use DDP in local container MNIST training example (#3007 by @astefanutti)
- feat: replaced vm runner with test gpu arc from cncf (#3067 by @jaiakash)
- feat: add VERSION file (#3077 by @milinddethe15)
- chore(docs): added kubecon 2025 trainer talk (#3187 by @jaiakash)
- chore(docs): Create symlink for CLAUDE.md (#3182 by @andreyvelich)
- chore(docs): Update Trainer README with Data Cache and MPI use-cases (#3142 by @andreyvelich)
- chore(docs): Add Trainer v2.1 release news to the README (#3117 by @andreyvelich)
- feat(docs): Add AGENTS.md and Copilot instructions (#3121 by @andreyvelich)
- feat: Adding local execution example notebook (#2907 by @Fiona-Waters)
Dependencies Upgrade
- chore(deps): Bump Torch to 2.10 version (#3320 by @andreyvelich)
- chore(deps): bump transformers from 5.2.0 to 5.3.0 in /cmd/runtimes/deepspeed (#3297 by @dependabot[bot])
- chore(deps): bump datasets from 4.6.1 to 4.7.0 in /cmd/runtimes/mlx (#3291 by @dependabot[bot])
- chore(deps): bump datasets from 4.5.0 to 4.7.0 in /cmd/runtimes/deepspeed (#3298 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <1.5,>=0.27.0 to >=0.27.0,<1.7 in /cmd/initializers/dataset (#3296 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.30.7 to 0.31.0 in /cmd/runtimes/mlx (#3295 by @dependabot[bot])
- chore(deps): bump docker/login-action from 3 to 4 (#3293 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <1.5,>=0.27.0 to >=0.27.0,<1.7 in /cmd/initializers/model (#3292 by @dependabot[bot])
- chore(deps): bump aquasecurity/trivy-action from 0.34.2 to 0.35.0 (#3290 by @dependabot[bot])
- chore(deps): bump rust from 1.93-bullseye to 1.94-bullseye in /cmd/data_cache (#3289 by @dependabot[bot])
- chore(deps): bump clap from 4.5.59 to 4.5.60 in /pkg/data_cache/test (#3249 by @dependabot[bot])
- chore(deps): bump actions/upload-artifact from 6 to 7 (#3267 by @dependabot[bot])
- chore(deps): bump quinn-proto from 0.11.13 to 0.11.14 in /pkg/data_cache (#3305 by @dependabot[bot])
- chore(deps): bump tokio from 1.49.0 to 1.50.0 in /pkg/data_cache/test (#3288 by @dependabot[bot])
- chore(deps): bump tokio from 1.49.0 to 1.50.0 in /pkg/data_cache (#3299 by @dependabot[bot])
- chore(deps): bump deepspeed from 0.18.6 to 0.18.7 in /cmd/runtimes/deepspeed (#3294 by @dependabot[bot])
- chore(deps): bump the kubernetes group across 1 directory with 9 updates (#3287 by @dependabot[bot])
- chore(deps): bump datasets from 4.5.0 to 4.6.1 in /cmd/runtimes/mlx (#3272 by @dependabot[bot])
- chore(deps): bump aquasecurity/trivy-action from 0.34.1 to 0.34.2 (#3268 by @dependabot[bot])
- chore(deps): Bump Trivy version to v0.69.2 (#3265 by @andreyvelich)
- chore(deps): bump arrow-flight from 57.3.0 to 58.0.0 in /pkg/data_cache/test (#3248 by @dependabot[bot])
- chore(deps): bump tonic from 0.14.3 to 0.14.5 in /pkg/data_cache/test (#3246 by @dependabot[bot])
- chore(deps): bump mpioperator/base from v0.7.0 to v0.8.0 in /cmd/runtimes/deepspeed (#3243 by @dependabot[bot])
- chore(deps): bump actions/setup-go from 5 to 6 (#3245 by @dependabot[bot])
- chore(deps): bump mpioperator/base from v0.7.0 to v0.8.0 in /cmd/runtimes/mlx (#3244 by @dependabot[bot])
- chore(deps): bump futures from 0.3.31 to 0.3.32 in /pkg/data_cache/test (#3211 by @dependabot[bot])
- chore(deps): bump aquasecurity/trivy-action from 0.33.1 to 0.34.0 in /.github/workflows (#3222 by @dependabot[bot])
- chore(deps): bump futures from 0.3.31 to 0.3.32 in /pkg/data_cache (#3214 by @dependabot[bot])
- chore(deps): bump deepspeed from 0.18.5 to 0.18.6 in /cmd/runtimes/deepspeed (#3212 by @dependabot[bot])
- chore(deps): bump transformers from 4.57.6 to 5.2.0 in /cmd/runtimes/deepspeed (#3210 by @dependabot[bot])
- chore(deps): bump clap from 4.5.57 to 4.5.59 in /pkg/data_cache/test (#3206 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <1.4,>=0.27.0 to >=0.27.0,<1.5 in /cmd/initializers/dataset (#3194 by @dependabot[bot])
- chore(deps): bump the kubernetes group with 7 updates (#3204 by @dependabot[bot])
- feat: Add the manager field to the podTemplateOverride object (#3020 by @kaisoz)
- chore(deps): bump mlx[cuda] from 0.30.5 to 0.30.6 in /cmd/runtimes/mlx (#3196 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <1.4,>=0.27.0 to >=0.27.0,<1.5 in /cmd/initializers/model (#3198 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.30.5 to 0.30.6 in /cmd/runtimes/mlx (#3195 by @dependabot[bot])
- chore(deps): bump clap from 4.5.56 to 4.5.57 in /pkg/data_cache/test (#3193 by @dependabot[bot])
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.2-0.20260122202528-d9cc6641c482 to 6.3.2 in the kubernetes group (#3190 by @dependabot[bot])
- chore(deps): bump arrow-flight from 57.2.0 to 57.3.0 in /pkg/data_cache/test (#3192 by @dependabot[bot])
- chore(deps): bump tonic from 0.14.2 to 0.14.3 in /pkg/data_cache/test (#3163 by @dependabot[bot])
- chore(deps): bump golang.org/x/crypto from 0.47.0 to 0.48.0 in the golang group (#3191 by @dependabot[bot])
- chore(deps): bump mlx[cuda] from 0.30.3 to 0.30.5 in /cmd/runtimes/mlx (#3162 by @dependabot[bot])
- chore(deps): bump time from 0.3.44 to 0.3.47 in /pkg/data_cache (#3180 by @dependabot[bot])
- chore(deps): bump deepspeed from 0.18.4 to 0.18.5 in /cmd/runtimes/deepspeed (#3161 by @dependabot[bot])
- chore(deps): bump github.com/onsi/gomega from 1.39.0 to 1.39.1 (#3159 by @dependabot[bot])
- chore(deps): bump clap from 4.5.54 to 4.5.56 in /pkg/data_cache/test (#3160 by @dependabot[bot])
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.5 to 2.28.1 (#3158 by @dependabot[bot])
- chore(deps): bump bytes from 1.11.0 to 1.11.1 in /pkg/data_cache (#3170 by @dependabot[bot])
- chore(deps): bump bytes from 1.11.0 to 1.11.1 in /pkg/data_cache/test (#3169 by @dependabot[bot])
- chore(deps): bump nvidia/cuda from 13.1.0-devel-ubuntu22.04 to 13.1.1-devel-ubuntu22.04 in /cmd/runtimes/deepspeed (#3131 by @dependabot[bot])
- chore(deps): bump nvidia/cuda from 13.1.0-devel-ubuntu22.04 to 13.1.1-devel-ubuntu22.04 in /cmd/runtimes/mlx (#3129 by @dependabot[bot])
- chore(deps): Bump JobSet v0.11.0 and LWS v0.8.0 (#3144 by @andreyvelich)
- chore(deps): bump tower from 0.5.2 to 0.5.3 in /pkg/data_cache (#3137 by @dependabot[bot])
- chore(deps): bump rust from 1.92-bullseye to 1.93-bullseye in /cmd/data_cache (#3132 by @dependabot[bot])
- chore(deps): Bump Go 1.25, k8s v1.35, and controller-runtime v0.23.1 (#3127 by @andreyvelich)
- chore(deps): bump mlx-lm from 0.30.4 to 0.30.5 in /cmd/runtimes/mlx (#3134 by @dependabot[bot])
- chore(deps): bump tokio from 1.48.0 to 1.49.0 in /pkg/data_cache (#3138 by @dependabot[bot])
- chore(deps): bump datasets from 4.4.2 to 4.5.0 in /cmd/runtimes/deepspeed (#3105 by @dependabot[bot])
- chore(deps): bump datasets from 4.4.2 to 4.5.0 in /cmd/runtimes/mlx (#3108 by @dependabot[bot])
- chore(deps): bump mlx[cuda] from 0.30.1 to 0.30.3 in /cmd/runtimes/mlx (#3107 by @dependabot[bot])
- chore(deps): bump transformers from 4.57.3 to 4.57.6 in /cmd/runtimes/deepspeed (#3106 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.30.2 to 0.30.4 in /cmd/runtimes/mlx (#3109 by @dependabot[bot])
- chore(runtimes): Bump Torch to 2.9.1 version (#3093 by @andreyvelich)
- chore(deps): bump axum from 0.7.9 to 0.8.8 in /pkg/data_cache (#3072 by @dependabot[bot])
- chore(deps): bump tonic from 0.12.3 to 0.14.2 in /pkg/data_cache/test (#3054 by @dependabot[bot])
- chore(deps): bump tower from 0.4.13 to 0.5.2 in /pkg/data_cache (#3074 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <1.2,>=0.27.0 to >=0.27.0,<1.4 in /cmd/initializers/dataset (#3090 by @dependabot[bot])
- chore(deps): bump clap from 4.5.53 to 4.5.54 in /pkg/data_cache/test (#3070 by @dependabot[bot])
- chore(deps): bump github.com/onsi/gomega from 1.38.3 to 1.39.0 (#3085 by @dependabot[bot])
- chore(deps): bump tokio from 1.48.0 to 1.49.0 in /pkg/data_cache/test (#3069 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <1.2,>=0.27.0 to >=0.27.0,<1.4 in /cmd/initializers/model (#3091 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.30.0 to 0.30.2 in /cmd/runtimes/mlx (#3089 by @dependabot[bot])
- chore(deps): bump deepspeed from 0.18.3 to 0.18.4 in /cmd/runtimes/deepspeed (#3088 by @dependabot[bot])
- chore(deps): bump arrow-flight from 57.1.0 to 57.2.0 in /pkg/data_cache/test (#3087 by @dependabot[bot])
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.3 to 2.27.5 (#3086 by @dependabot[bot])
- chore(deps): bump golang.org/x/crypto from 0.46.0 to 0.47.0 in the golang group (#3084 by @dependabot[bot])
- chore(deps): bump mlx[cuda] from 0.30.0 to 0.30.1 in /cmd/runtimes/mlx (#3053 by @dependabot[bot])
- chore(deps): bump tracing from 0.1.41 to 0.1.44 in /pkg/data_cache/test (#3051 by @dependabot[bot])
- chore(deps): bump arrow-flight from 55.2.0 to 57.1.0 in /pkg/data_cache/test (#3055 by @dependabot[bot])
- chore(deps): bump datasets from 4.4.1 to 4.4.2 in /cmd/runtimes/mlx (#3052 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.28.4 to 0.30.0 in /cmd/runtimes/mlx (#3050 by @dependabot[bot])
- chore(deps): bump datasets from 4.4.1 to 4.4.2 in /cmd/runtimes/deepspeed (#3049 by @dependabot[bot])
- chore(deps): bump bincode from 2.0.1 to 3.0.0 in /pkg/data_cache/test (#3048 by @dependabot[bot])
- chore(deps): bump sigs.k8s.io/kind from 0.30.0 to 0.31.0 in the kubernetes group (#3047 by @dependabot[bot])
- chore(deps): bump transformers from 4.57.2 to 4.57.3 in /cmd/runtimes/deepspeed (#3031 by @dependabot[bot])
- chore(deps): bump nvidia/cuda from 13.0.2-devel-ubuntu22.04 to 13.1.0-devel-ubuntu22.04 in /cmd/runtimes/mlx (#3036 by @dependabot[bot])
- chore(deps): bump the kubernetes group with 6 updates (#3035 by @dependabot[bot])
- chore(deps): bump actions/upload-artifact from 5 to 6 (#3038 by @dependabot[bot])
- chore(deps): bump nvidia/cuda from 13.0.2-devel-ubuntu22.04 to 13.1.0-devel-ubuntu22.04 in /cmd/runtimes/deepspeed (#3037 by @dependabot[bot])
- chore(deps): bump rust from 1.91-bullseye to 1.92-bullseye in /cmd/data_cache (#3040 by @dependabot[bot])
- chore(deps): bump deepspeed from 0.18.2 to 0.18.3 in /cmd/runtimes/deepspeed (#3039 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.28.3 to 0.28.4 in /cmd/runtimes/mlx (#3029 by @dependabot[bot])
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.27.2 to 2.27.3 (#3026 by @dependabot[bot])
- chore(deps): bump github.com/onsi/gomega from 1.38.2 to 1.38.3 (#3027 by @dependabot[bot])
- chore(deps): bump golang.org/x/crypto from 0.45.0 to 0.46.0 in the golang group (#3025 by @dependabot[bot])
- chore(deps): bump bytes from 1.10.1 to 1.11.0 in /pkg/data_cache (#3001 by @dependabot[bot])
- chore(deps): bump go.uber.org/zap from 1.27.0 to 1.27.1 (#2998 by @dependabot[bot])
- chore(deps): bump clap from 4.5.52 to 4.5.53 in /pkg/data_cache/test (#3004 by @dependabot[bot])
- chore(deps): bump arrow-flight from 57.0.0 to 57.1.0 in /pkg/data_cache/test (#3003 by @dependabot[bot])
- chore(deps): bump transformers from 4.57.1 to 4.57.2 in /cmd/runtimes/deepspeed (#3002 by @dependabot[bot])
- chore(deps): bump actions/checkout from 5 to 6 (#3000 by @dependabot[bot])
- chore(deps): bump mlx[cuda] from 0.29.4 to 0.30.0 in /cmd/runtimes/mlx (#2999 by @dependabot[bot])
- chore(deps): bump sigs.k8s.io/structured-merge-diff/v6 from 6.3.0 to 6.3.1 in the kubernetes group (#2996 by @dependabot[bot])
- chore(deps): bump github.com/open-policy-agent/cert-controller from 0.14.0 to 0.15.0 (#2997 by @dependabot[bot])
- chore(deps): bump golang.org/x/crypto from 0.44.0 to 0.45.0 (#2994 by @dependabot[bot])
- chore(deps): bump golang.org/x/crypto from 0.43.0 to 0.44.0 in the golang group (#2985 by @dependabot[bot])
- chore(deps): bump clap from 4.5.51 to 4.5.52 in /pkg/data_cache/test (#2990 by @dependabot[bot])
- chore(deps): bump async-trait from 0.1.88 to 0.1.89 in /pkg/data_cache (#2988 by @dependabot[bot])
- chore(deps): bump pytorch/pytorch from 2.9.0-cuda12.8-cudnn9-runtime to 2.9.1-cuda12.8-cudnn9-runtime in /cmd/trainers/torchtune (#2986 by @dependabot[bot])
- chore(deps): bump the kubernetes group with 6 updates (#2984 by @dependabot[bot])
- chore(deps): bump bytes from 1.10.1 to 1.11.0 in /pkg/data_cache/test (#2989 by @dependabot[bot])
- chore(deps): bump mlx-lm from 0.26.3 to 0.28.3 in /cmd/runtimes/mlx (#2950 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <0.28,>=0.27.0 to >=0.27.0,<1.2 in /cmd/initializers/model (#2957 by @dependabot[bot])
- chore(deps): update huggingface-hub requirement from <0.28,>=0.27.0 to >=0.27.0,<1.2 in /cmd/initializers/dataset (#2955 by @dependabot[bot])
- chore(deps): bump datasets from 4.0.0 to 4.4.1 in /cmd/runtimes/deepspeed (#2944 by @dependabot[bot])
- chore(deps): bump mlx[cuda] from 0.28.0 to 0.29.3 in /cmd/runtimes/mlx (#2956 by @dependabot[bot])
- chore(deps): bump transformers from 4.55.0 to 4.57.1 in /cmd/runtimes/deepspeed (#2961 by @dependabot[bot])
- chore(deps): bump deepspeed from 0.17.4 to 0.18.2 in /cmd/runtimes/deepspeed (#2954 by @dependabot[bot])
- chore(deps): bump nvidia/cuda from 12.8.1-devel-ubuntu22.04 to 13.0.2-devel-ubuntu22.04 in /cmd/runtimes/deepspeed (#2939 by @dependabot[bot])
- chore(deps): bump pytorch/pytorch from 2.7.1-cuda12.8-cudnn9-runtime to 2.9.0-cuda12.8-cudnn9-runtime in /cmd/trainers/torchtune (#2934 by @dependabot[bot])
- chore(deps): bump datasets from 4.0.0 to 4.4.1 in /cmd/runtimes/mlx (#2943 by @dependabot[bot])
- chore(deps): bump nvidia/cuda from 12.8.1-devel-ubuntu22.04 to 13.0.2-devel-ubuntu22.04 in /cmd/runtimes/mlx (#2932 by @dependabot[bot])
- chore(deps): bump mpi4py from 4.1.0 to 4.1.1 in /cmd/runtimes/deepspeed (#2958 by @dependabot[bot])
- chore(deps): bump bincode from 1.3.3 to 2.0.1 in /pkg/data_cache/test (#2949 by @dependabot[bot])
- chore(deps): bump tonic from 0.12.3 to 0.14.2 in /pkg/data_cache/test (#2962 by @dependabot[bot])
- chore(deps): bump serde from 1.0.225 to 1.0.228 in /pkg/data_cache/test (#2959 by @dependabot[bot])
- chore(deps): bump actions/checkout from 4 to 5 (#2974 by @dependabot[bot])
- chore(deps): bump serde from 1.0.215 to 1.0.228 in /pkg/data_cache (#2978 by @dependabot[bot])
- chore(deps): bump actions/setup-go from 5 to 6 (#2975 by @dependabot[bot])
- chore(deps): bump amannn/action-semantic-pull-request from 5.5.3 to 6.1.1 (#2976 by @dependabot[bot])
- chore(deps): bump arrow-flight from 55.2.0 to 57.0.0 in /pkg/data_cache/test (#2973 by @dependabot[bot])
- chore(deps): bump actions/setup-python from 5 to 6 (#2977 by @dependabot[bot])
- chore(deps): bump python from 3.11-slim-bookworm to 3.14-slim-bookworm in /cmd/initializers/model (#2951 by @dependabot[bot])
- chore(deps): bump python from 3.11-slim-bookworm to 3.14-slim-bookworm in /cmd/initializers/dataset (#2941 by @dependabot[bot])
- chore(deps): bump sentencepiece from 0.2.0 to 0.2.1 in /cmd/runtimes/deepspeed (#2948 by @dependabot[bot])
- chore(deps): bump tokio from 1.47.1 to 1.48.0 in /pkg/data_cache/test (#2963 by @dependabot[bot])
- chore(deps): bump clap from 4.5.43 to 4.5.51 in /pkg/data_cache/test (#2965 by @dependabot[bot])
- chore(deps): bump tokio from 1.46.1 to 1.48.0 in /pkg/data_cache (#2966 by @dependabot[bot])
- chore(deps): bump aquasecurity/trivy-action from 0.28.0 to 0.33.1 (#2947 by @dependabot[bot])
- chore(deps): bump actions/stale from 9 to 10 (#2942 by @dependabot[bot])
- chore(deps): bump mpioperator/base from v0.6.0 to v0.7.0 in /cmd/runtimes/deepspeed (#2938 by @dependabot[bot])
- chore(deps): bump golang from 1.24 to 1.25 in /cmd/trainer-controller-manager (#2935 by @dependabot[bot])
- chore(deps): bump actions/github-script from 7 to 8 (#2937 by @dependabot[bot])
- chore(deps): bump actions/upload-artifact from 4 to 5 (#2936 by @dependabot[bot])
- chore(deps): bump mpioperator/base from v0.6.0 to v0.7.0 in /cmd/runtimes/mlx (#2933 by @dependabot[bot])
- chore(deps): bump rust from 1.85-bullseye to 1.91-bullseye in /cmd/data_cache (#2931 by @dependabot[bot])
- chore(deps): bump github/codeql-action from 3 to 4 (#2953 by @dependabot[bot])
- chore(deps): bump github.com/onsi/ginkgo/v2 from 2.25.3 to 2.27.2 (#2952 by @dependabot[bot])
- chore(deps): bump sigs.k8s.io/controller-runtime from 0.22.3 to 0.22.4 in the kubernetes group (#2940 by @dependabot[bot])
- chore(deps): bump golang.org/x/crypto from 0.41.0 to 0.43.0 in the golang group (#2945 by @dependabot[bot])