🌟 Summary
Segmentation gets leaner and more reliable: segment masks are now ~4× lighter with consistent uint8 handling, plus smoother first-iteration latency thanks to NMS warmup and a new dataloader pin_memory option. 🚀🧠
📊 Key Changes
-
Segmentation masks optimized and standardized (PR: Segment masks now 4× lighter with
.byte()
by @glenn-jocher)- Mask tensors now use
.byte()
(uint8) across processing and plotting, reducing memory and avoiding dtype mismatches. process_mask
andprocess_mask_native
return uint8 masks instead of bool.masks2segments
consumes byte masks directly (no extra cast).- Evaluation fix: predicted masks cast to
float()
before IoU to prevent edge-case errors. - Image/tensor conversions use
.byte()
to ensure consistent uint8 NumPy output. - See details in the current PR: Segment masks now 4× lighter with
.byte()
optimization.
- Mask tensors now use
-
Dataloader and backend improvements (PR: Add NMS warmup for clearer post-processing latency by @Y-T-G)
- New
pin_memory
parameter inbuild_dataloader(..., pin_memory: bool = True)
. - Validation sets
pin_memory=self.training
, reducing host memory pinning during eval by default. - Autobackend now warms up Non-Max Suppression (NMS) after the first forward pass for smoother post-processing latency.
- More in the PR: Add NMS warmup for clearer post-processing latency.
- New
-
Version bump to 8.3.217.
🎯 Purpose & Impact
-
Faster, lighter segmentation workflows 💾⚡
- ~4× smaller mask tensors reduce memory footprint and can improve throughput on large-batch or high-resolution segmentation tasks.
- Fewer dtype conversions in the critical path minimize overhead and potential inconsistencies.
-
More robust and consistent results ✅
- Unified dtype handling (uint8 for images/masks, float for IoU) reduces dtype-related bugs and evaluation edge cases.
-
Smoother first-iteration performance 🚀
- NMS warmup eliminates “cold start” spikes in post-processing latency on supported devices.
-
Better memory control for training/eval 🧰
- The new
pin_memory
flag allows fine-grained control to balance throughput and system memory behavior; disabled by default in validation for stability.
- The new
Minimal examples:
- Enable/disable pinned memory:
from ultralytics.data.build import build_dataloader # dataset = ... dl = build_dataloader(dataset, batch=16, workers=8, pin_memory=False)
- NMS warmup happens automatically during backend warmup; no code changes required.
Contributors: @glenn-jocher, @Y-T-G, @Laughing-q
Links:
- Segment masks now 4× lighter with
.byte()
optimization (PR #22427) - Add NMS warmup for clearer post-processing latency (PR #22425)
What's Changed
- Add NMS warmup for clearer post-processing latency by @Y-T-G in #22425
ultralytics 8.3.217
Segment masks now 4× lighter with.byte()
optimization by @glenn-jocher in #22427
Full Changelog: v8.3.216...v8.3.217