🌟 Summary
Ultralytics 8.3.224 boosts segmentation performance and stability on GPUs, adds CPU-only ExecuTorch export/benchmarks for YOLO11, fixes multi‑GPU evaluation and ONNX device selection, and improves logging + docs for a smoother developer experience. 🚀
📊 Key Changes
- GPU-accelerated crop_mask with device-safe logic (PR #22575) ⚡
- Ensures 
boxeslive on the same device asmasksand avoids slow Python loops on CUDA. - Switches to loop-only when n < 50 and on CPU; vectorized paths on GPU.
 
 - Ensures 
 - Correct multi-GPU COCO evaluation jdict gather (PR #22541) 🧩
- Aggregates predictions across ranks using 
dist.gather_objectand cleans up worker memory. 
 - Aggregates predictions across ranks using 
 - ExecuTorch export and benchmark support (CPU-only) for YOLO11 (PR #22552) 🧠
- Adds format guards (no YOLOWorldv2, no E2E, no Pose) and benchmark table entry.
 
 - Respect selected CUDA device in ONNX Runtime (PR #22546) 🎯
- Passes 
device_idtoCUDAExecutionProviderto avoid accidental GPU 0 usage. 
 - Passes 
 - More reliable W&B logging (PR #22563) 📈
- Commits metrics each epoch and aligns log order for real-time, clean dashboards.
 
 - Rust ONNXRuntime example fix and dependency updates (PR #22557) 🦀
- Uses 
download-binariesfor ONNX Runtime to reduce setup friction and version conflicts. 
 - Uses 
 - Massive docstring/style cleanup across codebase (PRs #22554, #22565) 📚
- Standardizes one-line summaries, arg sections, and readability with no logic changes.
 
 - Export formats updated: ExecuTorch now listed CPU=True, GPU=False in exporter matrix 🧪
 
🎯 Purpose & Impact
- Faster, safer segmentation on GPU 💨
- Avoids device mismatch errors and slow loops; users should see snappier, more stable mask processing.
 
 - Accurate distributed validation ✅
- Multi-GPU validation now correctly aggregates predictions for COCO-style evaluation and JSON exports.
 
 - Broader deployment options with ExecuTorch 🧩
- Enables CPU-only export and benchmarking for YOLO11—handy for mobile/edge experimentation, with clear guardrails.
 
 - Correct GPU selection for ONNX inference 🔧
- Ensures the specified GPU (e.g., 
cuda:1) is honored, improving reliability in multi-GPU environments. 
 - Ensures the specified GPU (e.g., 
 - Better experiment tracking in W&B 📊
- Immediate per-epoch updates and aligned steps yield clearer, real-time dashboards with minimal overhead.
 
 - Easier Rust example usage 🛠️
- Reduces dependency issues; quicker “just run it” developer path.
 
 - Cleaner docs and IDE help ✨
- Consistent docstrings improve readability, navigation, and autocompletion across modules.
 
 
Quick tips:
- Export to ExecuTorch (CPU-only) for YOLO11:
yolo export model=yolo11n.pt format=executorch device=cpu - Ensure ONNX uses the intended GPU automatically when available—no code changes needed.
 
What's Changed
- Fix gather 
jdictfor COCO evaluation by @Laughing-q in #22541 - feat: ✨ Add ExecuTorch benchmark and update export settings and add format validation in benchmarks by @onuralpszr in #22552
 - Fix docstrings by @glenn-jocher in #22554
 - Commit Weights & Biases logs after each epoch to force dashboard update by @Y-T-G in #22563
 - Fixing rust example (fixes #22556) by @andrenatal in #22557
 - Update Google-style docstrings by @glenn-jocher in #22565
 - Specify device ID for ONNX inference on CUDA by @Y-T-G in #22546
 ultralytics 8.3.224Acceleratecrop_maskon GPU withis_cudacheck by @Laughing-q in #22575
New Contributors
- @andrenatal made their first contribution in #22557
 
Full Changelog: v8.3.223...v8.3.224