ultralytics 8.3.199 on Python PyPI

🌟 Summary

ultralytics 8.3.199 boosts startup speed with lazy model loading, refines export/runtime stability, and modernizes GPU Docker docs—delivering faster imports, smoother deployments, and clearer tooling. ⚡🐳

📊 Key Changes

Lazy model loading for faster imports (primary)
- Models like YOLO, SAM, RTDETR, NAS, YOLOE, FastSAM, YOLOWorld are now loaded on first access via __getattr__, preserving the public API. About ~3% faster import ultralytics.
  See PR: 3% Faster Imports with Lazy Loading (#21985) by @RizwanMunawar
More consistent export outputs
- Quantized export NMS wrapper now returns unpackable tensors (boxes, scores, labels, n_valid) for non-keypoint tasks; keypoint outputs unchanged.
  See PR: Fix imx object detection export outputs (#22045) by @Laughing-q
Smarter TensorRT installation on Linux
- Auto-installs the CUDA-matching TensorRT wheel (e.g., tensorrt-cu12) and blocks known-bad versions for more reliable exports.
  See PR: Specify CUDA version during TensorRT installation (#22060) by @Y-T-G
Safer torch.compile defaults
- attempt_compile() now warns on mode="max-autotune" and uses max-autotune-no-cudagraphs instead; docs updated accordingly.
  See PR: Add warning and default to no-cudagraphs (#22040) by @Y-T-G
Clearer hyperparameter tuning plots
- New default filters out zero-fitness points in plot_tune_results(..., exclude_zero_fitness_points=True) for cleaner visuals.
  See PR: Exclude zero-fitness points in Tuner plots (#22047) by @glenn-jocher
Robustness fix for custom model parsing
- Prevents undefined scale errors in parse_model() when scales isn’t provided.
  See PR: Fix undefined variable in parse_model (#22054) by @Y-T-G
GPU test coverage re-enabled
- ONNX export with NMS for OBB re-enabled; CUDA export tests and GPU benchmarks now run when GPUs are available.
  See PR: Re-enable TensorRT export in GPU tests (#22062) by @Laughing-q
Docker docs modernized for NVIDIA Container Toolkit
- Replaces deprecated NVIDIA Docker approach with NVIDIA Container Toolkit; adds distro-specific install steps and standardizes --runtime=nvidia.
  See PRs: Docker Quickstart update (#21994), Standardize GPU Docker commands (#22052) by @onuralpszr
CI reliability and maintenance
- GPU runner updated to gpu-latest; Slack alerts now target specific failed jobs; runner image version parameterized.
  See PRs: Update GPU runner label (#22051) by @glenn-jocher, Refine Slack notifications (#22012) by @lakshanthad, Parametrize runner version in Dockerfile (#22049) by @glenn-jocher
New reference docs for lazy imports
- Adds a reference page explaining lazy imports in ultralytics/__init__.py.

🎯 Purpose & Impact

Faster startup and the same API 🏎️
- Importing Ultralytics is quicker with zero code changes. You can still do:
  - from ultralytics import YOLO
  - ultralytics.YOLO("yolo11n.pt")
More reliable deployment pipelines 🧰
- Standardized NMS export outputs simplify integration with ONNX/TensorRT and downstream code.
- Correct TensorRT package selection reduces install/export friction on Linux.
Safer compilation defaults 🛡️
- torch.compile now prefers max-autotune-no-cudagraphs, avoiding CUDA Graphs issues while keeping performance benefits.
Cleaner experiment insights 📈
- Tuning plots focus on meaningful runs by default, making it easier to spot what works.
Improved docs and GPU usability 🧪
- NVIDIA Container Toolkit guidance and consistent --runtime=nvidia examples make GPU containers more predictable across distros.
Better CI signal and stability 🔔
- Targeted Slack alerts and updated runners improve reliability without affecting user-facing features.

Helpful snippets:

Import remains the same:

from ultralytics import YOLO
model = YOLO("yolo11n.pt")

Tuner plots with zero-fitness points visible (previous behavior):

from ultralytics.utils.plotting import plot_tune_results
plot_tune_results("tune_results.csv", exclude_zero_fitness_points=False)

GPU Docker run examples:

sudo docker run -it --ipc=host --runtime=nvidia --gpus all ultralytics/ultralytics:latest

What's Changed

docs: 📝 Update Docker Quickstart Guide to include NVIDIA Container Toolkit by @onuralpszr in #21994
Fix imx object detection export outputs by @Laughing-q in #22045
Fix Slack notifications on scheduled CI failure by @lakshanthad in #22012
Exclude zero-fitness points from Tuner plots by @glenn-jocher in #22047
Add warning when using mode="max-autotune" with compile by @Y-T-G in #22040
Parameterize runner version in Dockerfile-runner by @glenn-jocher in #22049
Update ci.yml for A100 GPU DDP runners by @glenn-jocher in #22051
Re-enable TensorRT export in GPU tests by @Laughing-q in #22062
Fix undefined variable error in parse_model() by @Y-T-G in #22054
docs: 📝 Update Docker commands to use NVIDIA runtime for GPU support by @onuralpszr in #22052
Specify CUDA version during TensorRT installation by @Y-T-G in #22060
ultralytics 8.3.199 3% Faster Ultralytics Imports with Lazy Model Loading by @RizwanMunawar in #21985

Full Changelog: v8.3.198...v8.3.199

ultralytics 8.3.199 v8.3.199 - `ultralytics 8.3.199` 3% Faster Ultralytics Imports with Lazy Model Loading (#21985) on Python PyPI

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

What's Changed

ultralytics 8.3.199
v8.3.199 - `ultralytics 8.3.199` 3% Faster Ultralytics Imports with Lazy Model Loading (#21985)

on Python PyPI