pypi ultralytics 8.3.218
v8.3.218 - `ultralytics 8.3.218` Enable multi-GPU validation during training (#22377)

17 hours ago

🌟 Summary

Better, faster multi-GPU training: v8.3.218 enables true multi-GPU validation during training with correct cross-GPU metric aggregation and a new contiguous sampler for stable evaluation. 🚀

📊 Key Changes

  • Multi-GPU validation during training ✅
    • Validation DataLoader and Validator are now created on all ranks for proper DistributedDataParallel (DDP) execution.
    • Rank-aware device selection ensures each process validates on its own GPU.
  • New ContiguousDistributedSampler 🧩
    • Preserves dataset ordering by assigning contiguous, batch-aligned chunks per GPU.
    • Automatically used when shuffle=False (e.g., rect/size-grouped evaluation) to prevent interleaved indices.
    • Falls back to PyTorch’s DistributedSampler when shuffle=True.
  • Correct cross-GPU metric aggregation 📈
    • Validation losses are reduced across GPUs.
    • Detection/classification validators gather stats from all ranks and compute results on rank 0 only.
    • EMA buffers are synchronized from rank 0 to all GPUs to keep validation consistent.
  • Trainer flow improvements 🛠️
    • Validation is executed outside the inner training step for cleaner DDP behavior.
    • Final evaluation flow streamlined; only necessary work is done on rank 0, with safe synchronization for others.
  • Documentation update 📚
    • Added reference docs for ContiguousDistributedSampler.

Links:

🎯 Purpose & Impact

  • More reliable multi-GPU results ✅
    • Proper aggregation means metrics and losses now reflect the full distributed dataset, avoiding misleading per-rank results.
  • Faster and more stable validation ⚡
    • Contiguous sampling avoids mixing image sizes across GPUs, which reduces padding/overhead and improves determinism—especially with rect=True.
  • Seamless distributed training 🧠
    • Users can train with multiple GPUs and get accurate, consistent validation without extra setup.
  • Backward compatible ✔️
    • Single-GPU behavior is unchanged; most users don’t need to modify their scripts.

Quick tip to run distributed training and benefit from these improvements:

  • CLI:
    • yolo detect train data=coco128.yaml model=yolo11n.pt devices=0,1,2,3
  • Python:
    from ultralytics import YOLO
    
    model = YOLO("yolo11n.pt")
    model.train(data="coco128.yaml", devices=[0, 1], imgsz=640, epochs=50)

Happy training and validating across GPUs! 🎉

What's Changed

  • ultralytics 8.3.218 Enable multi-GPU validation during training by @Y-T-G in #22377

Full Changelog: v8.3.217...v8.3.218

Don't miss a new ultralytics release

NewReleases is sending notifications on new releases.