π Summary
Expands safe FP16 (half-precision) export support across ONNX and TorchScript, improves NCNN export reliability by switching to the ONNX pipeline, and delivers faster RT-DETR inference via anchor caching β plus clearer docs and smoother dependency handling. π
π Key Changes
-
FP16 export behavior (PR #22316)
- Extends FP16 handling to both ONNX and TorchScript exports.
- Enforces GPU-only rule for FP16; if
half=True
on CPU, it now warns and automatically useshalf=False
. - Expands tests to cover ONNX FP16 combinations for higher reliability.
- Docs updated to clearly state FP16 constraints (not compatible with INT8 or CPU-only export). β οΈ
-
NCNN export pipeline update (PR #22315)
- NCNN export now uses ONNX β PNNX instead of TorchScript β PNNX for better compatibility and stability.
- Automatically triggers ONNX when
format='ncnn'
. TorchScript is only produced if explicitly requested. π
-
RT-DETR performance optimization (PR #22318)
- Caches decoder anchors and masks; regenerates only if shapes change or
dynamic=True
. - Reduces redundant computation to speed up training/inference. β‘
- Caches decoder anchors and masks; regenerates only if shapes change or
-
Smarter dependency checks with interchangeable packages (PR #22321)
check_requirements()
supports alternatives like("onnxruntime", "onnxruntime-gpu")
, making installs more flexible. β- Internal usage updated (e.g., ONNX benchmarking) to accept either CPU or GPU variants.
-
Validation docs clarity (PR #22319)
- Clarifies that validation uses
model.names
(the modelβs class set), not the dataset YAML classes. π
- Clarifies that validation uses
-
Better examples for
check_requirements
usage (PR #22317)- Shows passing custom pip args (e.g., PyTorch CPU index) and updates version constraints. π§°
-
CI and maintenance improvements
π― Purpose & Impact
-
Safer, clearer FP16 exports on GPU only
- Prevents confusing or invalid CPU FP16 exports by auto-correcting and warning.
- Users get predictable, faster FP16 ONNX/TorchScript exports on supported hardware. π¨
-
More reliable NCNN exports
- ONNX-based NCNN conversion reduces TorchScript-related issues and simplifies the export path.
- Improves portability and success rate across platforms. π¦
-
Faster RT-DETR workflows
- Anchor/mask caching trims unnecessary computation, accelerating common inference loops. βοΈ
-
Smoother installs and environment setup
- Interchangeable package support (e.g.,
onnxruntime
vsonnxruntime-gpu
) reduces friction across CPU/GPU setups. - Updated examples help users configure installs for their hardware quickly. π§βπ»
- Interchangeable package support (e.g.,
-
Reduced test flakes and cleaner maintenance
- More robust tests and streamlined CI/dependency settings lower maintenance noise without user-facing changes. β
Quick examples:
- Export ONNX in FP16 on GPU:
from ultralytics import YOLO
YOLO("yolo11n.pt").export(format="onnx", half=True, device=0) # β
GPU FP16
- Attempting FP16 export on CPU will auto-disable FP16 and warn:
YOLO("yolo11n.pt").export(format="onnx", half=True, device="cpu") # βΉοΈ Will warn and use half=False
- Check requirements with interchangeable packages:
from ultralytics.utils.checks import check_requirements
check_requirements([("onnxruntime", "onnxruntime-gpu"), "numpy"])
What's Changed
- Show
check_requirements
pip install extras example by @glenn-jocher in #22317 - Add
missing_ok=True
to TensorRT INT8 cache cleanup test by @glenn-jocher in #22320 - Fix validation data argument and class names description by @jb297686 in #22319
- Drop deprecated dependabot fields by @Borda in #22322
- Interchangeable packages feature for
check_requirements()
by @glenn-jocher in #22321 - Cache anchors in RTDETR head to avoid repeated initialization by @Y-T-G in #22318
- Bump astral-sh/setup-uv from 6 to 7 in /.github/workflows by @dependabot[bot] in #22327
- Create NCNN from ONNX model instead of TorchScript by @Y-T-G in #22315
ultralytics 8.3.208
Expand FP16 export support by @glenn-jocher in #22316
New Contributors
Full Changelog: v8.3.207...v8.3.208