๐ Summary
Ultralytics v8.4.40 introduces per-image precision/recall/F1 tracking during validation (led by PR #24089 from @Laughing-q), making it much easier to see exactly which images your model handles well or poorly. ๐๐ผ๏ธ
๐ Key Changes
- New per-image validation metrics added to results:
precision,recall,f1,tp,fp,fnfor each image.- Exposed via
metrics.box.image_metrics(and also forsegandposewhere applicable). โ
- Detection validation pipeline updated to store image name and compute image-level stats consistently with validation matching logic. ๐
- Distributed (multi-GPU) validation support now gathers and merges
image_metricscorrectly across ranks, so results remain complete in larger training setups. ๐ง โ๏ธ - Metrics classes extended with:
image_metricsstorage- update helpers
- clear/reset helpers to prevent stale metrics between runs.
- Docs updated across validation/task guides (detect, segment, pose, OBB, insights, custom trainer) with examples showing how to access per-image metrics. ๐
- Version bump:
8.4.39โ8.4.40๐
๐ฏ Purpose & Impact
- Faster debugging of weak samples: You can now pinpoint problematic images directly instead of relying only on dataset-wide averages. ๐ฏ
- Better dataset curation: Find images causing high false positives/false negatives and decide whether to relabel, augment, or rebalance. ๐งน
- More actionable model evaluation: Teams get practical, image-level insight for error analysis and iterative improvement. ๐
- Reliable at scale: Works cleanly in multi-GPU validation, so enterprise and research workflows benefit too. ๐๏ธ
- Broad usability: Useful for both beginners and advanced users working with YOLO models, especially YOLO26 validation workflows. ๐ค
What's Changed
ultralytics 8.4.40Per-image Precision and Recall by @Laughing-q in #24089
Full Changelog: v8.4.39...v8.4.40