π Summary
Ultralytics 8.3.212 focuses on making training more robust and predictable by hardening the Trainer against edge cases (like deleted save directories and transient non-finite losses), while simplifying the optimizer step for modern PyTorch. π
π Key Changes
- Trainer stability and I/O resilience (primary change in PR ultralytics 8.3.212 β Improve Trainer robustness to save_dir deletion by @glenn-jocher) β
- Always run backward and optimizer steps via AMP GradScaler; removed the non-finite loss guard to avoid silent skips. π§
- Timed stopping remains unchanged and synchronized across DDP ranks. β±οΈ
- Safer metrics reading:
read_results_csv()
now returns{}
on read failures instead of raising. πβ‘οΈ{} - Safer saving: Trainer ensures directories exist before writing weights and metrics (e.g.,
best.pt
,last.pt
,results.csv
). πΎποΈ
- Optimizer step simplification (PR Revert optimizer_step() nan error changes by @glenn-jocher)
- Removed legacy version branches and PyTorch 1.9 handling.
- Unified behavior: unscale, clip gradients (
clip_grad_norm_
withmax_norm=10.0
),scaler.step()
,scaler.update()
, zero grads. π§Ή
- Faster CI-only improvement (PR Use uv for docker.yml pip install tests by @glenn-jocher)
- Docker test step now installs pytest with
uv
for faster, deterministic installs. β‘π³
- Docker test step now installs pytest with
- Version bump to 8.3.212. π
π― Purpose & Impact
- More resilient training runs π‘οΈ
- Prevents stalled or inconsistent training by avoiding silent skips when encountering transient NaNs/Infs; relies on GradScaler to safely handle invalid grads.
- Reduces surprises in distributed training with unchanged, synchronized timed stopping.
- Robust file handling on local or network storage π¦
- Automatically re-creates missing directories for checkpoints and logsβuseful if
save_dir
is deleted mid-run or on flaky filesystems. - Avoids crashes when reading
results.csv
; failures return{}
so training and tools can continue gracefully.
- Automatically re-creates missing directories for checkpoints and logsβuseful if
- Modernized, cleaner codebase for PyTorch 2.x β
- Less legacy branching, simpler optimizer step logic, and consistent gradient clipping.
- Faster, more reliable CI with no user-facing behavior change π§ͺ
- Speeds up internal testing using
uv
, improving development velocity without affecting end users.
- Speeds up internal testing using
Tip: If you programmatically consume training metrics, you can safely handle missing/locked CSVs:
from ultralytics.engine.trainer import BaseTrainer
trainer = BaseTrainer(args={})
metrics = trainer.read_results_csv() # returns {} on failure instead of raising
What's Changed
- Use
uv
for docker.ymlpip install tests
by @glenn-jocher in #22356 - Revert optimizer_step() nan error changes by @glenn-jocher in #22360
ultralytics 8.3.212
Improve Trainer robustness tosave_dir
deletion by @glenn-jocher in #22358
Full Changelog: v8.3.211...v8.3.212