pypi ultralytics 8.3.196
v8.3.196 - `ultralytics 8.3.196` `torch.compile` acceleration for 30% faster training (#21975)

18 hours ago

🌟 Summary

Optional torch.compile acceleration lands across train/val/predict for up to ~30% faster runs, plus dataloader throughput boosts, CoreML export reliability, unified device handling, and smoother setup/CI/paths. ⚡️🧩

📊 Key Changes

  • torch.compile acceleration (primary)

    • New compile arg in train/val/predict (default False) with end-to-end wiring via config/CLI/API.
    • New helpers: attempt_compile(...) to safely enable compile and disable_dynamo(...) to opt out specific code paths.
    • Integrations:
      • Trainer compiles the model after initializing loss, marks dynamic tensors for stability, and unwraps models for EMA/checkpointing.
      • Validator can compile for standalone val; training-time final eval avoids compile for speed/stability.
      • Predictor supports compile=True for accelerated inference.
    • Utility rename: de_parallelunwrap_model (handles both parallel and compiled models).
    • Docs updated for the new args and torch utils.
  • Faster data loading

    • Doubled default prefetch_factor to 4 when num_workers > 0 and seamlessly removed it for older PyTorch (<2.0) to avoid errors.
    • Safer drop_last behavior during compile-enabled training to improve shape stability.
  • Unified, safer device handling

    • Centralized “move batch tensors to device” logic across detection/pose/segment/YOLOE to reduce CPU/GPU mismatches and duplicated code.
  • CoreML export robustness

    • Cleanup of CoreML NMS pipeline: direct use of spec outputs, explicit shape setting when needed, consistent IO names, and simpler wiring for more reliable exports on macOS/Linux/Windows.
  • Plotting stability

    • Added @plt_settings() to feature_visualization(...) for backend-safe, non-blocking feature map plots.
  • Config directory resolution

    • Smarter get_user_config_dir(): honors YOLO_CONFIG_DIR, follows OS conventions (XDG on Linux), and gracefully falls back to writable paths like /tmp.
  • Compatibility and CI polish

    • TorchVision matrix updated for PyTorch 2.8/0.23 and 2.9/0.24.
    • GitHub Actions bumps: setup-python v6 and actions/stale v10.
    • Minor typo fix in a deprecation warning.

🎯 Purpose & Impact

  • Speedups you can feel
    • Training can be up to ~30% faster with torch.compile; inference can also benefit on supported devices (CUDA/CPU/MPS). 🚀
  • One-line opt-in
    • CLI: yolo train model=yolo11n.pt data=coco8.yaml epochs=100 compile=True
    • Python:
      from ultralytics import YOLO
      model = YOLO("yolo11n.pt")
      model.train(data="coco8.yaml", epochs=100, compile=True)
      # Also works for val/predict:
      model.val(compile=True)
      model.predict("img.jpg", compile=True)
  • Fewer runtime hiccups
    • Centralized device transfer cuts down on “tensor on CPU vs GPU” issues. ✅
    • Dataloader tweaks reduce bottlenecks and avoid PyTorch version pitfalls. 🧠
  • Better exports and environments
    • CoreML exports are more consistent across platforms, improving deployment on Apple ecosystems. 🍎
    • Updated PyTorch–TorchVision checks reduce install/runtime confusion on new stacks. 🧩
  • Smoother headless/CI and container use
    • Plotting and config dir improvements prevent blocked UIs, backend errors, and permission issues in constrained environments. 🛡️

Primary PR: “ultralytics 8.3.196 torch.compile acceleration” by @glenn-jocher (adds compile flag, utilities, and engine integrations).

What's Changed

Full Changelog: v8.3.195...v8.3.196

Don't miss a new ultralytics release

NewReleases is sending notifications on new releases.