🌟 Summary
Ultralytics 8.3.237 adds full SAM 3 image & video segmentation support (including text & exemplar prompts and tracking), improves export behavior (ONNX FP16 on CPU, Edge TPU/IMX deps), and polishes training, validation, and docs for smoother day‑to‑day use. 🚀
📊 Key Changes
-
🧠 SAM 3 integration (image & video)
- New SAM 3 model builder stack (
build_sam3.py) with ViT backbone, transformer encoder/decoder, text encoder, geometry encoders, and video tracker (SAM3Model,SAM3SemanticModeland SAM3-specific modules). - SAM entrypoint now detects
sam3.ptand builds the SAM 3 tracker viabuild_interactive_sam3.
- New SAM 3 model builder stack (
-
🎛️ New SAM 3 predictors & APIs
- Added predictors and public exports:
SAM3Predictor– SAM3-style interactive segmentation.SAM3SemanticPredictor– text & exemplar based concept segmentation on images.SAM3VideoPredictor– video tracking with box prompts.SAM3VideoSemanticPredictor– video concept tracking (text + boxes + masklets).
- Wired into
ultralytics.models.sam.__all__and SAM’stask_map, soSAM("sam3.pt")routes to the right predictor.
- Added predictors and public exports:
-
🧩 SAM pipeline upgrades (SAM / SAM2 / SAM3)
Predictor.setup_sourcenow accepts an explicitstride, and SAM/SAM2/SAM3 predictors use it to enforce square image sizes and consistent feature shapes.- SAM modules updated to support SAM3:
- More flexible
MemoryEncoderandMaskDownSampler(interpolation to fixed sizes, higher-res mask handling). - Memory attention can accept custom attention modules; SAM3 uses RoPE-based attention and new positional utilities (
get_abs_pos,concat_rel_pos). SAM2Model.set_imgszgeneralized (no longer hardcoded stride 16) and specializedSAM3Modeladded with improved mask post-processing and non‑overlap suppression.
- More flexible
-
🖼️ SAM 3 docs & usage examples
- New reference docs under
docs/en/reference/models/sam/sam3/*for all SAM3 modules (encoder, decoder, geometry encoders, text encoder, tokenizer, etc.). docs/en/models/sam-3.mdrewritten from “API preview” into concrete usage:- Clear warning that SAM 3 weights are not auto-downloaded – users must manually download
sam3.ptfrom the SAM 3 repo. - Instructions to download the BPE vocab (
bpe_simple_vocab_16e6.txt.gz) for text prompts. - Full Python examples for:
- Text prompts (
SAM3SemanticPredictor) - Box exemplar prompts
- Reusing image features across multiple queries
- Video concept tracking with boxes (
SAM3VideoPredictor) - Video concept tracking with text (
SAM3VideoSemanticPredictor) - SAM2-style visual prompts via
SAM("sam3.pt")while clarifying the difference vs. concept segmentation.
- Text prompts (
- Clear warning that SAM 3 weights are not auto-downloaded – users must manually download
- New reference docs under
-
🔢 ONNX FP16 export on CPU
- FP16 TorchScript (JIT) on CPU is now explicitly disallowed only for JIT: clearer warning that
half=Trueon CPU applies only to GPU TorchScript export. - ONNX export now supports
half=Trueon CPU:- Converts model weights to FP16 using
onnxruntime.transformers.float16.convert_float_to_float16(keep_io_types=True). - Failures downgrade gracefully with a warning instead of aborting export.
- Converts model weights to FP16 using
- FP16 TorchScript (JIT) on CPU is now explicitly disallowed only for JIT: clearer warning that
-
🐧 Edge TPU & IMX export dependency management
export_edgetpu: shellapt-getcalls replaced with centralizedcheck_apt_requirements(["edgetpu-compiler"]).export_imx: Java installs now usecheck_apt_requirements()for:openjdk-21-jreon Ubuntu / Debian Trixie.openjdk-17-jreon Raspberry Pi / Debian Bookworm.
check_apt_requirements()now runsapt updatewithcheck=True, surfacing update failures instead of silently ignoring them.
-
🔄 More flexible resume‑training overrides
- When resuming training, you can now override more runtime/logging parameters without restarting:
save_period,workers,cache,patience,time,freeze,val,plots.
- When resuming training, you can now override more runtime/logging parameters without restarting:
-
📏 RT-DETR validation scaling fix
- Simplified RT-DETR validation transforms; removed a custom
ratio_padinjection and replaced with a cleanCompose([]). - Added a no-op
scale_preds()override to make scaling behavior explicit and safe for future changes.
- Simplified RT-DETR validation transforms; removed a custom
-
🧭 OBB plotting robustness
OBBValidator.plot_predictions()reworked to:- Accept prediction dicts instead of raw tensors.
- Early‑return cleanly for empty predictions.
- Use
plot_images()directly, avoiding redundantxywh2xyxyconversions and mismatched formats.
-
📚 Data augmentation docs:
scalerange clarifiedscalehyperparameter doc updated from “≥ 0.0” to0.0 - 1.0in the guide and macro tables, matching real‑world usage and preventing unstable configs.
🎯 Purpose & Impact
-
🧠 Richer segmentation capabilities with SAM 3
- Unlocks concept-level segmentation: find “all persons”, “all buses”, “person with red hat”, etc., using text or exemplar boxes rather than only point/box prompts.
- Brings video concept tracking to Ultralytics: track semantics (e.g., “person”, “bicycle”) or specific instances across frames with SAM3’s memory-based tracker.
- Advanced APIs (feature reuse, semantic + instance outputs, presence scores) enable efficient pipelines and research use cases.
-
🧪 More predictable, robust SAM/SAM2/SAM3 behavior
- Enforcing square image sizes via shared
stridehandling avoids subtle spatial shape bugs and mismatches in encoders/decoders. - Improved memory encoding and non-overlap suppression reduce spurious overlaps and noisy tracks, especially in crowded scenes.
- Enforcing square image sizes via shared
-
🚀 Better export experience across devices
- ONNX FP16 on CPU lets you reduce model size and improve performance where GPU isn’t available while keeping I/O types stable.
- Centralized apt handling for Edge TPU and IMX exports is more robust and easier to debug, especially on varied Debian/Ubuntu-based systems.
-
🧪 Easier experiment management & training control
- Expanded resume overrides let you adapt jobs mid‑run (e.g., change workers, cache strategy, early stopping, validation frequency, plot generation) without throwing away progress.
-
🎯 More reliable evaluation & visualization
- RT-DETR validation now avoids fragile manual
ratio_padhacks and is prepared for future scaling logic viascale_preds(). - OBB plots are more stable, especially for empty detections or batched outputs, giving cleaner visual diagnostics.
- RT-DETR validation now avoids fragile manual
-
📖 Clearer documentation & safer configs
- SAM 3 docs now match the actual shipped API and explicitly call out weight & vocab requirements, helping users get started without guesswork.
- Clarified
scaleaugmentation bounds (0–1) help avoid extreme settings that could degrade training stability or accuracy.
What's Changed
- ONNX FP16 export on CPU by @glenn-jocher in #22927
- Update IMX and Edge TPU exports with
check_apt_requirementsfunction by @lakshanthad in #22925 - Correct scale range in data augmentation guide by @Y-T-G in #22907
- Expand overrideable arguments for resumed training by @Y-T-G in #22903
- Fix box scaling in
predictions.jsonfor RTDETR by @Y-T-G in #22817 - fix: 🐞 remove redundant xywh2xyxy conversion in OBBValidator.plot_predictions by @onuralpszr in #22765
ultralytics 8.3.237SAM3 integration by @Laughing-q in #22897
Full Changelog: v8.3.236...v8.3.237