π Summary
v8.4.72 is a small but important stability release that mainly fixes a TensorRT INT8 export crash on some RTX GPUs π, while also improving export environment reliability and cleaning up docs/CI.
π Key Changes
-
Fixed TensorRT INT8 export crashes on certain RTX cards π§
- The most important change in this release is from PR #24876 by @glenn-jocher.
- Exporting models with
format="engine", int8=Truecould fail on some RTX GPUs, including cases where ONNX Runtime exposed both TensorRT execution providers. - Ultralytics changed the INT8 calibration step to use CUDA with CPU fallback instead of triggering conflicting TensorRT provider combinations.
-
Improved TensorRT/ONNX export image compatibility for CUDA 12 π³
- PR #24877 pins
onnxruntime-gputo below 1.27.0 in the export Docker image. - This avoids breakage caused by newer ONNX Runtime GPU builds that now expect CUDA 13, while the export image still uses CUDA 12.8.
- PR #24877 pins
-
GitHub Actions checkout updated from v6 to v7 βοΈ
- PR #24873 refreshes CI workflows to a newer
actions/checkoutversion. - This is a maintenance update for the projectβs automation pipeline.
- PR #24873 refreshes CI workflows to a newer
-
Documentation updates and cleanup π
- PR #24863 refreshes the Queue Management guide with a newer YOLO26 video and updated wording.
- PR #24875 simplifies the Triton C++ example README by removing extra contributor footer content.
- PR #24851 adds clearer page titles for account settings and YOLO configuration docs, helping site clarity and SEO.
π― Purpose & Impact
-
More reliable INT8 TensorRT exports on RTX hardware π‘
- Users exporting optimized TensorRT engines should see fewer failures on affected NVIDIA RTX systems.
- This is especially helpful for developers deploying compact, fast INT8 inference pipelines.
-
Better support for modern RTX environments π₯οΈ
- The fix avoids a low-level ONNX Runtime provider conflict that was causing exports to crash before the model was even fully built.
- In simple terms: INT8 export now works more reliably on hardware that previously broke unexpectedly.
-
Safer Docker-based export workflows π‘οΈ
- Pinning
onnxruntime-gpuprevents version mismatches that could break ONNX inference and export tests inside CUDA 12 environments. - This should make CI and container-based deployment setups more predictable.
- Pinning
-
No major model architecture changes π
- This release does not introduce a new model or training feature.
- The focus is on export stability, environment compatibility, and documentation polish.
-
Practical benefit for users β
- If you use Ultralytics for training and then export to TensorRT INT8 for production, this release is worth adopting.
- If you mainly use standard training or FP16 export, the impact is smaller but still positive thanks to general reliability improvements.
What's Changed
- Add https://youtu.be/TEVPiGCxB0o to docs by @RizwanMunawar in #24863
- docs: small cleanup triton contributions by @onuralpszr in #24875
- Pin onnxruntime-gpu<1.27.0 for CUDA 12 export image by @glenn-jocher in #24877
- Bump actions/checkout from v6 to v7 in /.github/workflows by @UltralyticsAssistant in #24873
- Fix duplicate title tags on Spanish account settings and YOLO config pages by @miles-deans-ultralytics in #24851
- Fix TensorRT INT8 export crash on RTX cards exposing the NvTensorRTRTX EP by @glenn-jocher in #24876
Full Changelog: v8.4.71...v8.4.72