๐ Summary
Ultralytics v8.4.53 mainly improves training reliability on NVIDIA GPUs by automatically recovering from more CUDA memory-related failures, while also polishing semantic segmentation stability, documentation clarity, and CI robustness ๐
๐ Key Changes
-
Smarter GPU memory recovery during training ๐ง โก
The biggest change in this release, from PR #24569 by @glenn-jocher, expands the existing auto-retry logic for large-batch GPU training.- Previously, Ultralytics could retry after a standard CUDA out-of-memory error.
- Now it also retries for related backend memory failures such as:
CUDNN_STATUS_INTERNAL_ERROR- โunable to find an engine to execute this computationโ
- When this happens during the first epoch on a single GPU, training can automatically reduce batch size and continue instead of crashing.
-
Semantic segmentation validation is more reliable ๐ผ๏ธโ
PR #24552 fixes issues in semantic validation so metrics are preserved correctly after evaluation.- Prevents stats from being cleared too early
- Makes mIoU handling more robust when some classes are missing
- Avoids
NaN-style metric problems in sparse-class cases
-
Segmentation loss is numerically more stable ๐ ๏ธ๐
PR #24554 improves Dice loss stability by forcing key intermediate calculations to usefloat32.- Helps avoid precision-related instability
- Especially useful in mixed-precision or lower-precision training setups
-
Much better docs for prediction outputs, especially semantic segmentation ๐
PR #24558 is a major documentation improvement.- Clearly explains the
Resultsobject for each task - Documents the new/important
semantic_maskoutput in detail - Adds comparisons between instance segmentation and semantic segmentation
- Clarifies output shapes, dtypes, masks, polygons, boxes, and per-task behavior
- Clearly explains the
-
Broader YOLO26 and semantic task visibility across docs and metadata ๐
Several updates improve how Ultralytics presents current capabilities:- Package metadata now reflects YOLO26, Platform, and oriented object detection
- Project classifier changed to Production/Stable
- Many docs now consistently mention semantic segmentation as a first-class supported task
-
CI and test workflow became more stable ๐งช
Multiple changes reduce avoidable test failures:- IMX export test is skipped in broader CI where it was causing instability
- Training-related tests are skipped on Raspberry Pi, where they are not practical
-
Small but useful documentation and packaging fixes โจ
- README task banner now renders correctly on PyPI
- OBB validation docs now note the true default confidence used to reduce memory usage
- Minor doc fixes for tracking links, benchmark hardware, glossary links, and formatting
๐ฏ Purpose & Impact
-
Fewer frustrating training crashes on GPUs ๐ช
If you train with aggressive batch sizes, this release can help Ultralytics recover automatically from more real-world CUDA memory failures instead of stopping unexpectedly. -
Better experience for users pushing hardware limits ๐
Large-batch training on single GPUs should now be more forgiving, especially when failures come from CUDA backend libraries rather than a plain out-of-memory exception. -
More dependable semantic segmentation workflows ๐ง
Users working with semantic segmentation should see:- more stable validation metrics
- fewer confusing missing/invalid values
- improved training stability from safer Dice loss calculations
-
Easier to understand model outputs ๐
The new docs make it much clearer what prediction results contain for each task, which is especially helpful for:- beginners trying to use
Results - developers building pipelines around semantic outputs
- teams comparing instance vs semantic segmentation behavior
- beginners trying to use
-
Better confidence in production use โ
The โProduction/Stableโ metadata and improved CI signal a continued push toward reliability for real deployments. -
Low risk release with practical quality-of-life improvements ๐
This is not a major architecture or model release, but it delivers meaningful gains in stability, clarity, and usability, especially for training and semantic segmentation users.
What's Changed
- Clean up semantic CI follow-up by @glenn-jocher in #24544
- Fix tasks banner not rendering on PyPI by @raimbekovm in #24545
- Update pyproject metadata for YOLO26 and Platform by @glenn-jocher in #24547
- Fix Cityscapes glossary link by @glenn-jocher in #24550
- Fix semantic docs typo by @Laughing-q in #24551
- Fix semantic
nt_per_imageprinting issue by @Laughing-q in #24552 - Fix NaN Dice loss during validation by @Laughing-q in #24554
- Skip training on Raspberry Pi Test CIs by @lakshanthad in #24556
- Add segmentation results output details to task docs by @glenn-jocher in #24558
- Note OBB validation conf default of 0.01 by @raimbekovm in #24561
- Fix stale line anchor for
track_high_threshin track docs by @raimbekovm in #24559 - Docs: align markdown table columns by @Laughing-q in #24566
- Update CI tests by @Laughing-q in #24567
ultralytics 8.4.53Retry CUDA backend memory errors during training by @glenn-jocher in #24569
Full Changelog: v8.4.52...v8.4.53