ultralytics 8.3.218 on Python PyPI

🌟 Summary

Better, faster multi-GPU training: v8.3.218 enables true multi-GPU validation during training with correct cross-GPU metric aggregation and a new contiguous sampler for stable evaluation. 🚀

📊 Key Changes

Multi-GPU validation during training ✅
- Validation DataLoader and Validator are now created on all ranks for proper DistributedDataParallel (DDP) execution.
- Rank-aware device selection ensures each process validates on its own GPU.
New ContiguousDistributedSampler 🧩
- Preserves dataset ordering by assigning contiguous, batch-aligned chunks per GPU.
- Automatically used when shuffle=False (e.g., rect/size-grouped evaluation) to prevent interleaved indices.
- Falls back to PyTorch’s DistributedSampler when shuffle=True.
Correct cross-GPU metric aggregation 📈
- Validation losses are reduced across GPUs.
- Detection/classification validators gather stats from all ranks and compute results on rank 0 only.
- EMA buffers are synchronized from rank 0 to all GPUs to keep validation consistent.
Trainer flow improvements 🛠️
- Validation is executed outside the inner training step for cleaner DDP behavior.
- Final evaluation flow streamlined; only necessary work is done on rank 0, with safe synchronization for others.
Documentation update 📚
- Added reference docs for ContiguousDistributedSampler.

Links:

See the implementing PR: Enable multi-GPU validation during training (#22377)
Issues addressed: Multi-GPU val during train, Cross-GPU aggregation, Sampler ordering issues

🎯 Purpose & Impact

More reliable multi-GPU results ✅
- Proper aggregation means metrics and losses now reflect the full distributed dataset, avoiding misleading per-rank results.
Faster and more stable validation ⚡
- Contiguous sampling avoids mixing image sizes across GPUs, which reduces padding/overhead and improves determinism—especially with rect=True.
Seamless distributed training 🧠
- Users can train with multiple GPUs and get accurate, consistent validation without extra setup.
Backward compatible ✔️
- Single-GPU behavior is unchanged; most users don’t need to modify their scripts.

Quick tip to run distributed training and benefit from these improvements:

CLI:
- yolo detect train data=coco128.yaml model=yolo11n.pt devices=0,1,2,3

Python:

from ultralytics import YOLO

model = YOLO("yolo11n.pt")
model.train(data="coco128.yaml", devices=[0, 1], imgsz=640, epochs=50)

Happy training and validating across GPUs! 🎉

What's Changed

ultralytics 8.3.218 Enable multi-GPU validation during training by @Y-T-G in #22377

Full Changelog: v8.3.217...v8.3.218

ultralytics 8.3.218 v8.3.218 - `ultralytics 8.3.218` Enable multi-GPU validation during training (#22377) on Python PyPI

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

What's Changed

ultralytics 8.3.218
v8.3.218 - `ultralytics 8.3.218` Enable multi-GPU validation during training (#22377)

on Python PyPI