๐ Summary
๐ก๏ธ v8.4.62 is mainly a reliability release focused on preventing trained models from being lost at the end of training, with additional improvements to Platform docs, dataset/API documentation, test stability, and CI efficiency.
๐ Key Changes
-
๐จ Major training fix: checkpoints are no longer discarded just because EMA hits NaN/Inf during save checks
- The most important change in this release, from PR #24731 by @glenn-jocher, fixes a bug where good models could finish training successfully but still fail to save any checkpoint.
- This especially affected some runs using AdamW + AMP, where validation could corrupt the live EMA weights and cause repeated warnings like โSkipping checkpoint save... EMA contains NaN/Infโ.
- The fix now:
- keeps validation from modifying the live EMA in place
- checks finiteness on the original fp32 EMA, not an already-converted fp16 copy
- safely clamps overflow during checkpoint serialization instead of skipping the save
-
โ Validation is now safer during AMP training
- Validation still benefits from mixed precision speedups, but it no longer permanently โpoisonsโ the EMA model.
- This prevents a failure mode where one bad validation step could block checkpoint saving for the rest of training.
-
๐งช New test coverage for fp16 overflow checkpoint handling
- Added tests to ensure models with large-but-finite EMA weights are still saved correctly.
- This helps protect against regressions in future releases.
-
๐ Big Ultralytics Platform docs refresh
- PR #24726 by @glenn-jocher significantly improved accuracy across Platform docs.
- Updates include:
- corrected UI labels and workflows
- expanded Platform API reference
- clearer dataset, annotation, deployment, training, billing, teams, and integrations docs
- newly documented API capabilities like dataset embeddings, class management, GPU availability, import flows, and more
-
๐ Fixed broken COCO evaluation links
- PR #24723 replaces dead COCO evaluation server links with the canonical COCO upload instructions.
- Also fixes outdated docs author profile links.
-
๐งช Less flaky data-related tests
- PR #24724 reduces unnecessary downloads in tests and reuses cached assets when possible.
- This should make CI more dependable and faster.
-
โก Lean CI improvements
๐ฏ Purpose & Impact
-
๐พ Prevents losing trained models
- The headline fix is very important for users training YOLO models locally or in automated pipelines.
- If your run trained well but ended with โno checkpoint was saved,โ this release directly addresses that issue.
-
๐ Improves training stability and trustworthiness
- Users can have more confidence that successful training runs will actually produce saved checkpoints, especially when using AMP for faster training.
-
๐ Better experience for common training setups
- This is especially impactful for users training with AdamW + AMP, where the bug had been widely reported.
- In practical terms: fewer surprise failures, less wasted compute, and less need for workarounds like disabling AMP.
-
๐ More accurate docs for the Ultralytics Platform
- Platform users should now find the docs easier to follow and more aligned with what they actually see in the app.
- This lowers confusion for both new and advanced users working with datasets, training, deployment, billing, and APIs.
-
๐งฐ Improved developer and CI reliability
- Faster, lighter CI and more stable tests help maintain release quality and reduce false failures behind the scenes.
-
๐ Cleaner external documentation links
- Broken COCO benchmark links are fixed, making it easier for users to find the right evaluation submission path.
Overall, v8.4.62 is not a major model-feature release, but it is a high-value stability update ๐ ๏ธโespecially for anyone training YOLO models with mixed precision and expecting reliable checkpoint saves.
What's Changed
- Fix broken COCO eval server and docs author links by @glenn-jocher in #24723
- Bump codecov/codecov-action from v6 to v7 in /.github/workflows by @UltralyticsAssistant in #24722
- Fix flaky data asset tests by @glenn-jocher in #24724
- Slim CI git clones: blobless docs publish, shallow Conda tests, drop unused checkout by @glenn-jocher in #24725
- Improve Platform docs accuracy and API coverage by @glenn-jocher in #24726
- Prevent NaN/Inf EMA from discarding training checkpoints by @glenn-jocher in #24731
Full Changelog: v8.4.61...v8.4.62