๐ Summary (single-line synopsis)
Ultralytics v8.4.13 makes training more resilient by automatically recovering from CUDA out-of-memory (OOM) errors during the first epoch by retrying with a smaller batch size ๐๐ง ๐ฅ
๐ Key Changes
- Auto-retry on CUDA OOM during training (major change) ๐ฅ๐ ๏ธ
- If a CUDA OOM happens in the first epoch on single-GPU, Ultralytics will retry up to 3 times, halving the batch size each time (down to 1).
- Training pipeline is rebuilt after batch reduction (dataloaders + optimizer + scheduler) to continue cleanly.
- New internal training helper ๐งฉ
- Adds a
_build_train_pipeline()method to rebuild loaders/optimizer/scheduler when batch size changes (used by the new OOM recovery flow).
- Adds a
- More reliable ONNX export for OBB + NMS ๐ฆโ
- When exporting OBB (oriented bounding boxes) to ONNX with NMS enabled,
simplify=Trueis now forced to avoid a known runtime issue (TopK-related error in some ONNX Runtime versions).
- When exporting OBB (oriented bounding boxes) to ONNX with NMS enabled,
- DGX system detection + TensorRT handling ๐ฅ๏ธโ๏ธ
- Adds
is_dgx()detection and uses it (along with Jetson JetPack 7) to trigger a TensorRT version check/reinstall path for better export reliability on those systems.
- Adds
- Packaging stability fix: pin setuptools ๐งฐ๐
- Pins build requirements to
setuptools<=81.0.0to avoid breakages introduced by newer setuptools versions (notably affectingtensorflow.jsexport tooling).
- Pins build requirements to
- Docs & examples refresh (YOLO26 messaging + tracking content) ๐๐ฅ
- Tracking docs now embed a newer multi-object tracking video featuring YOLO26 + BoT-SORT/ByteTrack.
- Exporter docs/examples updated to show YOLO26 (
yolo26n.pt) and mention ExecuTorch/Axelera export options (documentation signposting).
- Example dependency update ๐
- Updates
protobufin the RT-DETR ONNX Runtime Python example.
- Updates
๐ฏ Purpose & Impact
- Fewer training crashes for everyday users ๐๐ฅ
- If you start training with a batch size thatโs slightly too large for your GPU, Ultralytics can now self-correct and continue instead of failing immediatelyโespecially helpful for beginners and for โfirst-epoch spikesโ in memory use.
- Less manual trial-and-error ๐ฏ
- Reduces the common loop of โOOM โ lower batch โ restart training,โ saving time and frustration.
- More dependable deployment exports ๐
- ONNX exports for OBB models with embedded NMS should work more reliably out of the box, with fewer runtime surprises.
- More predictable builds/CI ๐งฑ
- Pinning setuptools helps prevent sudden packaging/tooling failures across environments.
- Clearer guidance aligned with YOLO26 ๐งญ
- Docs and examples increasingly steer users toward YOLO26 as the recommended model for training, tracking, and export workflows.
What's Changed
- feat: ๐ NVIDIA DGX device variants check by @onuralpszr in #23573
- Add https://youtu.be/qQkzKISt5GE to docs by @RizwanMunawar in #23582
- Bump protobuf from 6.31.1 to 6.33.5 in /examples/RTDETR-ONNXRuntime-Python in the pip group across 1 directory by @dependabot[bot] in #23572
- docs: ๐ exporter documentation for new model formats and examples updated by @onuralpszr in #23585
- Force
simplify=Truefor OBB export with NMS by @Y-T-G in #23580 - Pin
setuptoolsversion by @Burhan-Q in #23589 ultralytics 8.4.13Retry smaller batch on training CUDA OOM by @glenn-jocher in #23590
Full Changelog: v8.4.12...v8.4.13