pypi ultralytics 8.4.72
v8.4.72 - Fix TensorRT INT8 export crash on RTX cards exposing the NvTensorRTRTX EP (#24876)

3 hours ago

🌟 Summary

v8.4.72 is a small but important stability release that mainly fixes a TensorRT INT8 export crash on some RTX GPUs πŸš€, while also improving export environment reliability and cleaning up docs/CI.

πŸ“Š Key Changes

  • Fixed TensorRT INT8 export crashes on certain RTX cards πŸ”§

    • The most important change in this release is from PR #24876 by @glenn-jocher.
    • Exporting models with format="engine", int8=True could fail on some RTX GPUs, including cases where ONNX Runtime exposed both TensorRT execution providers.
    • Ultralytics changed the INT8 calibration step to use CUDA with CPU fallback instead of triggering conflicting TensorRT provider combinations.
  • Improved TensorRT/ONNX export image compatibility for CUDA 12 🐳

    • PR #24877 pins onnxruntime-gpu to below 1.27.0 in the export Docker image.
    • This avoids breakage caused by newer ONNX Runtime GPU builds that now expect CUDA 13, while the export image still uses CUDA 12.8.
  • GitHub Actions checkout updated from v6 to v7 βš™οΈ

    • PR #24873 refreshes CI workflows to a newer actions/checkout version.
    • This is a maintenance update for the project’s automation pipeline.
  • Documentation updates and cleanup πŸ“š

    • PR #24863 refreshes the Queue Management guide with a newer YOLO26 video and updated wording.
    • PR #24875 simplifies the Triton C++ example README by removing extra contributor footer content.
    • PR #24851 adds clearer page titles for account settings and YOLO configuration docs, helping site clarity and SEO.

🎯 Purpose & Impact

  • More reliable INT8 TensorRT exports on RTX hardware πŸ’‘

    • Users exporting optimized TensorRT engines should see fewer failures on affected NVIDIA RTX systems.
    • This is especially helpful for developers deploying compact, fast INT8 inference pipelines.
  • Better support for modern RTX environments πŸ–₯️

    • The fix avoids a low-level ONNX Runtime provider conflict that was causing exports to crash before the model was even fully built.
    • In simple terms: INT8 export now works more reliably on hardware that previously broke unexpectedly.
  • Safer Docker-based export workflows πŸ›‘οΈ

    • Pinning onnxruntime-gpu prevents version mismatches that could break ONNX inference and export tests inside CUDA 12 environments.
    • This should make CI and container-based deployment setups more predictable.
  • No major model architecture changes πŸ“Œ

    • This release does not introduce a new model or training feature.
    • The focus is on export stability, environment compatibility, and documentation polish.
  • Practical benefit for users βœ…

    • If you use Ultralytics for training and then export to TensorRT INT8 for production, this release is worth adopting.
    • If you mainly use standard training or FP16 export, the impact is smaller but still positive thanks to general reliability improvements.

What's Changed

Full Changelog: v8.4.71...v8.4.72

Don't miss a new ultralytics release

NewReleases is sending notifications on new releases.