π Summary
ultralytics 8.3.199 boosts startup speed with lazy model loading, refines export/runtime stability, and modernizes GPU Docker docsβdelivering faster imports, smoother deployments, and clearer tooling. β‘π³
π Key Changes
- Lazy model loading for faster imports (primary)
- Models like YOLO, SAM, RTDETR, NAS, YOLOE, FastSAM, YOLOWorld are now loaded on first access via
__getattr__
, preserving the public API. About ~3% fasterimport ultralytics
.
See PR: 3% Faster Imports with Lazy Loading (#21985) by @RizwanMunawar
- Models like YOLO, SAM, RTDETR, NAS, YOLOE, FastSAM, YOLOWorld are now loaded on first access via
- More consistent export outputs
- Quantized export NMS wrapper now returns unpackable tensors
(boxes, scores, labels, n_valid)
for non-keypoint tasks; keypoint outputs unchanged.
See PR: Fix imx object detection export outputs (#22045) by @Laughing-q
- Quantized export NMS wrapper now returns unpackable tensors
- Smarter TensorRT installation on Linux
- Auto-installs the CUDA-matching TensorRT wheel (e.g.,
tensorrt-cu12
) and blocks known-bad versions for more reliable exports.
See PR: Specify CUDA version during TensorRT installation (#22060) by @Y-T-G
- Auto-installs the CUDA-matching TensorRT wheel (e.g.,
- Safer torch.compile defaults
attempt_compile()
now warns onmode="max-autotune"
and usesmax-autotune-no-cudagraphs
instead; docs updated accordingly.
See PR: Add warning and default to no-cudagraphs (#22040) by @Y-T-G
- Clearer hyperparameter tuning plots
- New default filters out zero-fitness points in
plot_tune_results(..., exclude_zero_fitness_points=True)
for cleaner visuals.
See PR: Exclude zero-fitness points in Tuner plots (#22047) by @glenn-jocher
- New default filters out zero-fitness points in
- Robustness fix for custom model parsing
- Prevents undefined
scale
errors inparse_model()
whenscales
isnβt provided.
See PR: Fix undefined variable in parse_model (#22054) by @Y-T-G
- Prevents undefined
- GPU test coverage re-enabled
- ONNX export with NMS for OBB re-enabled; CUDA export tests and GPU benchmarks now run when GPUs are available.
See PR: Re-enable TensorRT export in GPU tests (#22062) by @Laughing-q
- ONNX export with NMS for OBB re-enabled; CUDA export tests and GPU benchmarks now run when GPUs are available.
- Docker docs modernized for NVIDIA Container Toolkit
- Replaces deprecated NVIDIA Docker approach with NVIDIA Container Toolkit; adds distro-specific install steps and standardizes
--runtime=nvidia
.
See PRs: Docker Quickstart update (#21994), Standardize GPU Docker commands (#22052) by @onuralpszr
- Replaces deprecated NVIDIA Docker approach with NVIDIA Container Toolkit; adds distro-specific install steps and standardizes
- CI reliability and maintenance
- GPU runner updated to
gpu-latest
; Slack alerts now target specific failed jobs; runner image version parameterized.
See PRs: Update GPU runner label (#22051) by @glenn-jocher, Refine Slack notifications (#22012) by @lakshanthad, Parametrize runner version in Dockerfile (#22049) by @glenn-jocher
- GPU runner updated to
- New reference docs for lazy imports
- Adds a reference page explaining lazy imports in
ultralytics/__init__.py
.
- Adds a reference page explaining lazy imports in
π― Purpose & Impact
- Faster startup and the same API ποΈ
- Importing Ultralytics is quicker with zero code changes. You can still do:
from ultralytics import YOLO
ultralytics.YOLO("yolo11n.pt")
- Importing Ultralytics is quicker with zero code changes. You can still do:
- More reliable deployment pipelines π§°
- Standardized NMS export outputs simplify integration with ONNX/TensorRT and downstream code.
- Correct TensorRT package selection reduces install/export friction on Linux.
- Safer compilation defaults π‘οΈ
torch.compile
now prefersmax-autotune-no-cudagraphs
, avoiding CUDA Graphs issues while keeping performance benefits.
- Cleaner experiment insights π
- Tuning plots focus on meaningful runs by default, making it easier to spot what works.
- Improved docs and GPU usability π§ͺ
- NVIDIA Container Toolkit guidance and consistent
--runtime=nvidia
examples make GPU containers more predictable across distros.
- NVIDIA Container Toolkit guidance and consistent
- Better CI signal and stability π
- Targeted Slack alerts and updated runners improve reliability without affecting user-facing features.
Helpful snippets:
- Import remains the same:
from ultralytics import YOLO model = YOLO("yolo11n.pt")
- Tuner plots with zero-fitness points visible (previous behavior):
from ultralytics.utils.plotting import plot_tune_results plot_tune_results("tune_results.csv", exclude_zero_fitness_points=False)
- GPU Docker run examples:
sudo docker run -it --ipc=host --runtime=nvidia --gpus all ultralytics/ultralytics:latest
What's Changed
- docs: π Update Docker Quickstart Guide to include NVIDIA Container Toolkit by @onuralpszr in #21994
- Fix
imx
object detection export outputs by @Laughing-q in #22045 - Fix Slack notifications on scheduled CI failure by @lakshanthad in #22012
- Exclude zero-fitness points from Tuner plots by @glenn-jocher in #22047
- Add warning when using
mode="max-autotune"
withcompile
by @Y-T-G in #22040 - Parameterize runner version in Dockerfile-runner by @glenn-jocher in #22049
- Update ci.yml for A100 GPU DDP runners by @glenn-jocher in #22051
- Re-enable TensorRT export in GPU tests by @Laughing-q in #22062
- Fix undefined variable error in
parse_model()
by @Y-T-G in #22054 - docs: π Update Docker commands to use NVIDIA runtime for GPU support by @onuralpszr in #22052
- Specify CUDA version during TensorRT installation by @Y-T-G in #22060
ultralytics 8.3.199
3% Faster Ultralytics Imports with Lazy Model Loading by @RizwanMunawar in #21985
Full Changelog: v8.3.198...v8.3.199