🌟 Summary (single-line synopsis)
Ultralytics v8.4.17 makes NDJSON dataset conversions resplit-friendly—reusing existing images, cleaning stale labels, and avoiding unnecessary downloads for faster iteration 🚀📦
📊 Key Changes
- NDJSON dataset re-split support (priority change) ♻️🧹
- Detects when the dataset output folder already exists and reuses previously downloaded images when you change
train/val/testsplits. - For non-classification tasks (detect/segment/pose), removes the existing
labels/directory before reconversion to prevent stale annotations. - If an image is missing from the current split, it will move (rename) it from another split (train/val/test) before downloading from the URL.
- Adds a content hash (ignoring signed/rotating URLs) stored in
data.yamlto skip reconversion when nothing meaningful changed 🧾🔒 - Deletes orphaned images that are no longer part of the dataset after a resplit, reducing “stale background files” that can silently affect training 🗑️
- Detects when the dataset output folder already exists and reuses previously downloaded images when you change
- More reliable EdgeTPU exports 🧩⚙️
- Automatically disables
end2endmode for EdgeTPU exports (and logs a warning), aligning EdgeTPU with other limited backends that don’t support required ops.
- Automatically disables
- OpenVINO INT8 export dependency handling improved 📦✅
- Adds PyTorch 2.3 detection and adjusts
nncfrequirements to reduce install/export conflicts across Torch versions.
- Adds PyTorch 2.3 detection and adjusts
- Clearer disk space error messages 💾🔎
- Fixes incorrect “GB” reporting and now shows MB for sizes under 1GB, making download/asset errors easier to understand.
🎯 Purpose & Impact
- Faster dataset iteration for NDJSON workflows ⚡
Resplitting a dataset no longer forces full re-downloads—saving time, bandwidth, and friction when you’re refining splits during curation. - Fewer training gotchas from stale files 🧼
Clearing old labels and removing orphaned images helps prevent subtle mismatches (images/labels out of sync) that can degrade training quality. - More dependable deployment exports 📤
EdgeTPU exports should fail less often due to unsupportedend2endbehavior, and OpenVINO INT8 export setups are smoother across PyTorch versions. - Better user experience when storage is low 🧰
Disk space errors now communicate real numbers, reducing confusion during downloads and conversions.
What's Changed
- Fix incorrect disk space in error message by @Y-T-G in #23727
- Pin
nncf<3for PyTorch 2.2 and below by @Y-T-G in #23726 - Disable
end2endfor EdgeTPU by @Y-T-G in #23724 ultralytics 8.4.17NDJSON dataset re-split support by @glenn-jocher in #23735
Full Changelog: v8.4.16...v8.4.17