pypi ultralytics 8.3.237
v8.3.237 - `ultralytics 8.3.237` SAM3 integration (#22897)

latest release: 8.3.238
3 days ago

🌟 Summary

Ultralytics 8.3.237 adds full SAM 3 image & video segmentation support (including text & exemplar prompts and tracking), improves export behavior (ONNX FP16 on CPU, Edge TPU/IMX deps), and polishes training, validation, and docs for smoother day‑to‑day use. 🚀


📊 Key Changes

  • 🧠 SAM 3 integration (image & video)

    • New SAM 3 model builder stack (build_sam3.py) with ViT backbone, transformer encoder/decoder, text encoder, geometry encoders, and video tracker (SAM3Model, SAM3SemanticModel and SAM3-specific modules).
    • SAM entrypoint now detects sam3.pt and builds the SAM 3 tracker via build_interactive_sam3.
  • 🎛️ New SAM 3 predictors & APIs

    • Added predictors and public exports:
      • SAM3Predictor – SAM3-style interactive segmentation.
      • SAM3SemanticPredictor – text & exemplar based concept segmentation on images.
      • SAM3VideoPredictor – video tracking with box prompts.
      • SAM3VideoSemanticPredictor – video concept tracking (text + boxes + masklets).
    • Wired into ultralytics.models.sam.__all__ and SAM’s task_map, so SAM("sam3.pt") routes to the right predictor.
  • 🧩 SAM pipeline upgrades (SAM / SAM2 / SAM3)

    • Predictor.setup_source now accepts an explicit stride, and SAM/SAM2/SAM3 predictors use it to enforce square image sizes and consistent feature shapes.
    • SAM modules updated to support SAM3:
      • More flexible MemoryEncoder and MaskDownSampler (interpolation to fixed sizes, higher-res mask handling).
      • Memory attention can accept custom attention modules; SAM3 uses RoPE-based attention and new positional utilities (get_abs_pos, concat_rel_pos).
      • SAM2Model.set_imgsz generalized (no longer hardcoded stride 16) and specialized SAM3Model added with improved mask post-processing and non‑overlap suppression.
  • 🖼️ SAM 3 docs & usage examples

    • New reference docs under docs/en/reference/models/sam/sam3/* for all SAM3 modules (encoder, decoder, geometry encoders, text encoder, tokenizer, etc.).
    • docs/en/models/sam-3.md rewritten from “API preview” into concrete usage:
      • Clear warning that SAM 3 weights are not auto-downloaded – users must manually download sam3.pt from the SAM 3 repo.
      • Instructions to download the BPE vocab (bpe_simple_vocab_16e6.txt.gz) for text prompts.
      • Full Python examples for:
        • Text prompts (SAM3SemanticPredictor)
        • Box exemplar prompts
        • Reusing image features across multiple queries
        • Video concept tracking with boxes (SAM3VideoPredictor)
        • Video concept tracking with text (SAM3VideoSemanticPredictor)
        • SAM2-style visual prompts via SAM("sam3.pt") while clarifying the difference vs. concept segmentation.
  • 🔢 ONNX FP16 export on CPU

    • FP16 TorchScript (JIT) on CPU is now explicitly disallowed only for JIT: clearer warning that half=True on CPU applies only to GPU TorchScript export.
    • ONNX export now supports half=True on CPU:
      • Converts model weights to FP16 using onnxruntime.transformers.float16.convert_float_to_float16(keep_io_types=True).
      • Failures downgrade gracefully with a warning instead of aborting export.
  • 🐧 Edge TPU & IMX export dependency management

    • export_edgetpu: shell apt-get calls replaced with centralized check_apt_requirements(["edgetpu-compiler"]).
    • export_imx: Java installs now use check_apt_requirements() for:
      • openjdk-21-jre on Ubuntu / Debian Trixie.
      • openjdk-17-jre on Raspberry Pi / Debian Bookworm.
    • check_apt_requirements() now runs apt update with check=True, surfacing update failures instead of silently ignoring them.
  • 🔄 More flexible resume‑training overrides

    • When resuming training, you can now override more runtime/logging parameters without restarting:
      • save_period, workers, cache, patience, time, freeze, val, plots.
  • 📏 RT-DETR validation scaling fix

    • Simplified RT-DETR validation transforms; removed a custom ratio_pad injection and replaced with a clean Compose([]).
    • Added a no-op scale_preds() override to make scaling behavior explicit and safe for future changes.
  • 🧭 OBB plotting robustness

    • OBBValidator.plot_predictions() reworked to:
      • Accept prediction dicts instead of raw tensors.
      • Early‑return cleanly for empty predictions.
      • Use plot_images() directly, avoiding redundant xywh2xyxy conversions and mismatched formats.
  • 📚 Data augmentation docs: scale range clarified

    • scale hyperparameter doc updated from “≥ 0.0” to 0.0 - 1.0 in the guide and macro tables, matching real‑world usage and preventing unstable configs.

🎯 Purpose & Impact

  • 🧠 Richer segmentation capabilities with SAM 3

    • Unlocks concept-level segmentation: find “all persons”, “all buses”, “person with red hat”, etc., using text or exemplar boxes rather than only point/box prompts.
    • Brings video concept tracking to Ultralytics: track semantics (e.g., “person”, “bicycle”) or specific instances across frames with SAM3’s memory-based tracker.
    • Advanced APIs (feature reuse, semantic + instance outputs, presence scores) enable efficient pipelines and research use cases.
  • 🧪 More predictable, robust SAM/SAM2/SAM3 behavior

    • Enforcing square image sizes via shared stride handling avoids subtle spatial shape bugs and mismatches in encoders/decoders.
    • Improved memory encoding and non-overlap suppression reduce spurious overlaps and noisy tracks, especially in crowded scenes.
  • 🚀 Better export experience across devices

    • ONNX FP16 on CPU lets you reduce model size and improve performance where GPU isn’t available while keeping I/O types stable.
    • Centralized apt handling for Edge TPU and IMX exports is more robust and easier to debug, especially on varied Debian/Ubuntu-based systems.
  • 🧪 Easier experiment management & training control

    • Expanded resume overrides let you adapt jobs mid‑run (e.g., change workers, cache strategy, early stopping, validation frequency, plot generation) without throwing away progress.
  • 🎯 More reliable evaluation & visualization

    • RT-DETR validation now avoids fragile manual ratio_pad hacks and is prepared for future scaling logic via scale_preds().
    • OBB plots are more stable, especially for empty detections or batched outputs, giving cleaner visual diagnostics.
  • 📖 Clearer documentation & safer configs

    • SAM 3 docs now match the actual shipped API and explicitly call out weight & vocab requirements, helping users get started without guesswork.
    • Clarified scale augmentation bounds (0–1) help avoid extreme settings that could degrade training stability or accuracy.

What's Changed

Full Changelog: v8.3.236...v8.3.237

Don't miss a new ultralytics release

NewReleases is sending notifications on new releases.