ultralytics 8.3.196 on Python PyPI

🌟 Summary

Optional torch.compile acceleration lands across train/val/predict for up to ~30% faster runs, plus dataloader throughput boosts, CoreML export reliability, unified device handling, and smoother setup/CI/paths. ⚡️🧩

📊 Key Changes

torch.compile acceleration (primary)
- New compile arg in train/val/predict (default False) with end-to-end wiring via config/CLI/API.
- New helpers: attempt_compile(...) to safely enable compile and disable_dynamo(...) to opt out specific code paths.
- Integrations:
  - Trainer compiles the model after initializing loss, marks dynamic tensors for stability, and unwraps models for EMA/checkpointing.
  - Validator can compile for standalone val; training-time final eval avoids compile for speed/stability.
  - Predictor supports compile=True for accelerated inference.
- Utility rename: de_parallel ➝ unwrap_model (handles both parallel and compiled models).
- Docs updated for the new args and torch utils.
Faster data loading
- Doubled default prefetch_factor to 4 when num_workers > 0 and seamlessly removed it for older PyTorch (<2.0) to avoid errors.
- Safer drop_last behavior during compile-enabled training to improve shape stability.
Unified, safer device handling
- Centralized “move batch tensors to device” logic across detection/pose/segment/YOLOE to reduce CPU/GPU mismatches and duplicated code.
CoreML export robustness
- Cleanup of CoreML NMS pipeline: direct use of spec outputs, explicit shape setting when needed, consistent IO names, and simpler wiring for more reliable exports on macOS/Linux/Windows.
Plotting stability
- Added @plt_settings() to feature_visualization(...) for backend-safe, non-blocking feature map plots.
Config directory resolution
- Smarter get_user_config_dir(): honors YOLO_CONFIG_DIR, follows OS conventions (XDG on Linux), and gracefully falls back to writable paths like /tmp.
Compatibility and CI polish
- TorchVision matrix updated for PyTorch 2.8/0.23 and 2.9/0.24.
- GitHub Actions bumps: setup-python v6 and actions/stale v10.
- Minor typo fix in a deprecation warning.

🎯 Purpose & Impact

Speedups you can feel
- Training can be up to ~30% faster with torch.compile; inference can also benefit on supported devices (CUDA/CPU/MPS). 🚀

One-line opt-in

CLI: yolo train model=yolo11n.pt data=coco8.yaml epochs=100 compile=True

Python:

from ultralytics import YOLO
model = YOLO("yolo11n.pt")
model.train(data="coco8.yaml", epochs=100, compile=True)
# Also works for val/predict:
model.val(compile=True)
model.predict("img.jpg", compile=True)

Fewer runtime hiccups
- Centralized device transfer cuts down on “tensor on CPU vs GPU” issues. ✅
- Dataloader tweaks reduce bottlenecks and avoid PyTorch version pitfalls. 🧠
Better exports and environments
- CoreML exports are more consistent across platforms, improving deployment on Apple ecosystems. 🍎
- Updated PyTorch–TorchVision checks reduce install/runtime confusion on new stacks. 🧩
Smoother headless/CI and container use
- Plotting and config dir improvements prevent blocked UIs, backend errors, and permission issues in constrained environments. 🛡️

Primary PR: “ultralytics 8.3.196 torch.compile acceleration” by @glenn-jocher (adds compile flag, utilities, and engine integrations).

What's Changed

Add @plt_settings() decorator to feature_visualization() by @glenn-jocher in #21973
Cleanup CoreML NMS pipeline code by @Y-T-G in #21970
Double default Dataloader prefetch_factor to 4 by @glenn-jocher in #21974
Update torchvision compat matrix with 2.8 and 2.9 by @glenn-jocher in #21978
Fix overly verbose USER_CONFIG_DIR checks by @glenn-jocher in #21980
Fix missing Tensors on device in Trainer and Validator preprocess_batch methods by @glenn-jocher in #21981
Bump actions/setup-python from 5 to 6 in /.github/workflows by @dependabot[bot] in #21984
Bump actions/stale from 9 to 10 in /.github/workflows by @dependabot[bot] in #21983
Fix typo in deprecation_warn method by @RizwanMunawar in #21987
ultralytics 8.3.196 torch.compile acceleration for 30% faster training by @glenn-jocher in #21975

Full Changelog: v8.3.195...v8.3.196

ultralytics 8.3.196 v8.3.196 - `ultralytics 8.3.196` `torch.compile` acceleration for 30% faster training (#21975) on Python PyPI

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

What's Changed

ultralytics 8.3.196
v8.3.196 - `ultralytics 8.3.196` `torch.compile` acceleration for 30% faster training (#21975)

on Python PyPI