π Summary
π Ultralytics v8.4.63 mainly adds TensorRT 11 export support with FP16 and INT8 quantization via NVIDIA ModelOpt, while also expanding built-in multi-object tracking with several new tracker options and improving reliability, performance, and docs.
π Key Changes
-
π§ Major export upgrade: TensorRT 11 support
- The standout update in this release is PR #24735 by @onuralpszr.
- Ultralytics now supports exporting models to NVIDIA TensorRT 11, which had previously broken due to API changes in TensorRT 11.
- For FP16 and INT8 export on TensorRT 11, Ultralytics now uses NVIDIA ModelOpt instead of the older TensorRT builder flags and calibrator interface.
- This keeps export working across:
- TensorRT 7β10 with the legacy path
- TensorRT 11 with the new strongly-typed ModelOpt path
- Tested successfully for:
fp32fp16int8- including INT8 with dynamic shapes on TensorRT 11 β
-
β‘ Better quantization workflow for modern NVIDIA deployment
- FP16 is now applied by baking mixed precision into the ONNX graph before engine build.
- INT8 is now applied through explicit quantization in the ONNX graph with calibration data.
- This is especially important because TensorRT 11 removed the old methods many exporters depended on.
-
π― Big tracking expansion: 4 new built-in trackers
- PR #24371 added four new multi-object trackers alongside BoT-SORT and ByteTrack:
- OC-SORT
- Deep OC-SORT
- FastTracker
- TrackTrack
- These are now documented and wired into the tracking system with YAML configs like:
ocsort.yamldeepocsort.yamlfasttrack.yamltracktrack.yaml
- PR #24371 added four new multi-object trackers alongside BoT-SORT and ByteTrack:
-
π Tracking docs and selection guidance improved
- Tracking docs were expanded to explain the strengths of each tracker and how to choose between them.
- This makes the tracking feature much easier to use for both beginners and advanced users.
-
πΉ Video stream loading is safer
- PR #24749 improves cleanup when stream initialization fails.
- If one stream fails while others already opened, Ultralytics now properly closes those partially-opened resources instead of leaking threads or capture handles.
-
β‘ AI Gym pose workflow is faster
- PR #24744 reduces repeated GPU-to-CPU syncs in the workout monitoring loop by transferring keypoints to CPU in one go instead of piece by piece.
- Same behavior, better efficiency.
-
π§ͺ Validation mixed precision handling simplified
- PR #24736 consolidates validation autocast into one cleaner scope during training validation.
- This helps keep mixed-precision behavior simpler and more robust.
-
π Documentation improvements
- Added new Rust inference documentation for running YOLO models through ONNX Runtime without Python.
- Expanded tracking docs and reference pages.
- Updated TensorRT and Jetson docs to explain TensorRT 11 behavior and DLA limitations.
- Refreshed guides like Coral Edge TPU and semantic image search.
π― Purpose & Impact
-
π TensorRT 11 users can export again
- This is the biggest user-facing change in the release.
- If you deploy on modern NVIDIA systems using TensorRT 11, exports that were failing should now work again.
- This is especially valuable for production inference pipelines and edge/server deployment.
-
βοΈ Future-proofs NVIDIA deployment
- TensorRT 11 changed how precision and quantization are handled.
- By moving to a ModelOpt-based workflow, Ultralytics stays compatible with newer NVIDIA tooling instead of relying on removed APIs.
-
πΎ Smaller, faster engines remain accessible
- FP16 and INT8 exports are still available even with TensorRT 11βs breaking changes.
- That means users can continue optimizing for:
- faster inference
- lower memory use
- better deployment efficiency
-
π¦ Dynamic INT8 export support is especially useful
- Supporting INT8 with dynamic shapes on TensorRT 11 can help users deploying across varying input sizes without giving up quantization benefits.
-
π Tracking becomes more flexible for real-world scenarios
- The new trackers give users more choices depending on their needs:
- simple baseline tracking
- crowded-scene tracking
- appearance-aware tracking
- occlusion-heavy tracking
- This can improve results in surveillance, sports, traffic, and retail use cases.
- The new trackers give users more choices depending on their needs:
-
π More stable long-running applications
- Stream cleanup improvements reduce the chance of lingering resources in apps that open cameras or network streams.
- This matters most for production or multi-stream systems.
-
β‘ Small but meaningful performance wins
- AI Gym and validation updates help reduce overhead and improve runtime efficiency without changing how users interact with the API.
Overall, v8.4.63 is a strong deployment-focused release π¦βwith the headline improvement being restored and modernized TensorRT 11 export support, plus a major boost to tracking capabilities and several reliability/performance refinements.
What's Changed
- Consolidate validation autocast scope by @glenn-jocher in #24736
- Convert SAM3 docstrings to Google style by @glenn-jocher in #24741
- Merge
isolated_modelandisolated_task_modelintoisolated_model_pathby @Laughing-q in #24742 - Avoid per-keypoint gpu sync in workout monitoring loop by @raimbekovm in #24744
- New TrackTrack, FastTracker, OC-SORT and Deep OC-SORT Trackers by @onuralpszr in #24371
- Reduce CI PyTorch index flakiness by @glenn-jocher in #24753
- Fix Codecov OIDC uploads by @glenn-jocher in #24754
- docs: π Add Rust Inference documentation by @onuralpszr in #24712
- Report SystemLogger disk list by @glenn-jocher in #24758
- Fix instances count logging by @Y-T-G in #24738
- Clean up partially-initialized LoadStreams on failure by @raimbekovm in #24749
- Restructure Coral Edge TPU guide and fix FAQ Edge TPU model filename by @raimbekovm in #24745
- Restructure semantic image search guide by @raimbekovm in #24746
- NVIDIA TensorRT 11 support with ModelOpt FP16 and INT8 quantization by @onuralpszr in #24735
Full Changelog: v8.4.62...v8.4.63