ultralytics 8.4.65 on Python PyPI

🌟 Summary

v8.4.65 is mainly a mobile inference performance release 📱⚡, with the biggest upgrade improving QNN exports for Snapdragon NPUs and speeding up semantic segmentation outputs by moving costly work inside the model graph.

📊 Key Changes

🚀 Major QNN export performance upgrade in PR #24790 by @glenn-jocher
- QNN exports now use channel-last (NHWC) input, which better matches Qualcomm Hexagon NPU hardware and camera-native image layouts.
- This avoids extra data reordering at inference time, reducing overhead on both the app side and the NPU side.
- Implemented through new export wrappers like QNNModel instead of fragile post-export graph editing.
🧠 Semantic segmentation exports are much faster on-device
- For QNN and CoreML semantic models, export now embeds ArgMax / class-map generation directly in the graph.
- Instead of returning large floating-point logits that apps must decode afterward, exported models can now return a compact per-pixel class map directly.
- New ClassMapModel wrapper handles this behavior during export.
📱 Better mobile runtime compatibility
- Semantic predict/validation code was updated so Ultralytics can correctly handle both:
  - traditional logits outputs
  - new exported class-map outputs
- This makes the faster export behavior work more cleanly across deployment and evaluation workflows.
📉 Measured impact highlighted in docs
- QNN semantic performance became much more stable, replacing previously erratic decoding times with far more predictable behavior.
- CoreML semantic exports also saw notable end-to-end speedups from in-graph class-map generation.
🛠️ RKNN export improvements
- RKNN export now properly supports FP16 intermediate ONNX export.
- Added RKNN INT8 export test coverage to improve reliability.
- RKNN export docs/tables were updated accordingly.
🐳 Docker build reliability improvements
- Dockerfiles now use more cache-conscious install patterns and extra cleanup steps to reduce disk pressure during builds.
- The Docker workflow was simplified by removing an extra assistant-trigger step.
📚 Docs and export guidance improvements
- Expanded and refreshed docs for CoreML and QNN, including clearer deployment guidance and performance notes.
- Export docs now better explain YOLO26 end-to-end detection output formats.
- Tracking docs now clarify support for OBB models as well.
- Rust inference docs were updated to ultralytics-inference version 0.0.19.
✅ CI and workflow fixes
- Docs redeploys now trigger more reliably when documentation changes land on main.
- CUDA training tests are now explicitly skipped on Jetson devices, making CI behavior cleaner and easier to maintain.

🎯 Purpose & Impact

⚡ Faster mobile inference on Snapdragon devices
Users deploying YOLO on Qualcomm hardware should see lower overhead and better real-world efficiency, especially in camera-based apps where image buffers are already channel-last.
🧩 Less app-side postprocessing work
By returning semantic class maps directly from exported models, apps no longer need to spend as much CPU time decoding huge segmentation outputs.
📈 Better and more stable semantic segmentation performance
This is especially important for real-time mobile experiences, where unpredictable postprocessing delays can cause lag or jitter.
🔒 More robust export pipeline
Replacing manual ONNX graph surgery with clean wrapper modules makes exports easier to maintain and less error-prone over time.
📱🍎 Benefits extend beyond QNN
CoreML semantic exports also gain from the same in-graph class-map idea, so Apple-device deployment gets a speed boost too.
🛠️ Improved reliability for deployment workflows
Better RKNN testing, cleaner Docker builds, and more dependable docs publishing all reduce friction for developers working across edge and production environments.
👥 Broad user impact
- Mobile and edge developers get the biggest benefit from this release.
- Semantic segmentation users should see the clearest speed and usability gains.
- General users also benefit from clearer docs, cleaner CI, and more reliable export behavior.

In short: v8.4.65 is a strong deployment-focused release 🎉, with the standout improvement being faster, more hardware-friendly QNN exports and smarter semantic segmentation outputs for mobile AI.

What's Changed

Improve skipping training on Jetson Test CIs by @lakshanthad in #24557
Trigger docs redeploys on changes by @glenn-jocher in #24784
Replace Enterprise '50+ members' with custom team size in Platform docs by @mykolaxboiko in #24785
Normalize YAML docs links by @glenn-jocher in #24788
Fix docs deploy change detection by @glenn-jocher in #24787
Improve docs, comments, and correctness by @glenn-jocher in #24789
Enable RKNN FP16 export and add RKNN INT8 test by @Laughing-q in #24762
Remove Docker workflow assistant trigger by @glenn-jocher in #24783
docs: ultralytics-inference version to 0.0.19 in rust docs by @onuralpszr in #24793
QNN channel-last + semantic ArgMax performance improvements by @glenn-jocher in #24790

Full Changelog: v8.4.64...v8.4.65

ultralytics 8.4.65 v8.4.65 - QNN channel-last + semantic ArgMax performance improvements (#24790) on Python PyPI

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

What's Changed

ultralytics 8.4.65
v8.4.65 - QNN channel-last + semantic ArgMax performance improvements (#24790)

on Python PyPI