pypi ultralytics 8.4.65
v8.4.65 - QNN channel-last + semantic ArgMax performance improvements (#24790)

6 hours ago

🌟 Summary

v8.4.65 is mainly a mobile inference performance release 📱⚡, with the biggest upgrade improving QNN exports for Snapdragon NPUs and speeding up semantic segmentation outputs by moving costly work inside the model graph.

📊 Key Changes

  • 🚀 Major QNN export performance upgrade in PR #24790 by @glenn-jocher

    • QNN exports now use channel-last (NHWC) input, which better matches Qualcomm Hexagon NPU hardware and camera-native image layouts.
    • This avoids extra data reordering at inference time, reducing overhead on both the app side and the NPU side.
    • Implemented through new export wrappers like QNNModel instead of fragile post-export graph editing.
  • 🧠 Semantic segmentation exports are much faster on-device

    • For QNN and CoreML semantic models, export now embeds ArgMax / class-map generation directly in the graph.
    • Instead of returning large floating-point logits that apps must decode afterward, exported models can now return a compact per-pixel class map directly.
    • New ClassMapModel wrapper handles this behavior during export.
  • 📱 Better mobile runtime compatibility

    • Semantic predict/validation code was updated so Ultralytics can correctly handle both:
      • traditional logits outputs
      • new exported class-map outputs
    • This makes the faster export behavior work more cleanly across deployment and evaluation workflows.
  • 📉 Measured impact highlighted in docs

    • QNN semantic performance became much more stable, replacing previously erratic decoding times with far more predictable behavior.
    • CoreML semantic exports also saw notable end-to-end speedups from in-graph class-map generation.
  • 🛠️ RKNN export improvements

    • RKNN export now properly supports FP16 intermediate ONNX export.
    • Added RKNN INT8 export test coverage to improve reliability.
    • RKNN export docs/tables were updated accordingly.
  • 🐳 Docker build reliability improvements

    • Dockerfiles now use more cache-conscious install patterns and extra cleanup steps to reduce disk pressure during builds.
    • The Docker workflow was simplified by removing an extra assistant-trigger step.
  • 📚 Docs and export guidance improvements

    • Expanded and refreshed docs for CoreML and QNN, including clearer deployment guidance and performance notes.
    • Export docs now better explain YOLO26 end-to-end detection output formats.
    • Tracking docs now clarify support for OBB models as well.
    • Rust inference docs were updated to ultralytics-inference version 0.0.19.
  • CI and workflow fixes

    • Docs redeploys now trigger more reliably when documentation changes land on main.
    • CUDA training tests are now explicitly skipped on Jetson devices, making CI behavior cleaner and easier to maintain.

🎯 Purpose & Impact

  • Faster mobile inference on Snapdragon devices
    Users deploying YOLO on Qualcomm hardware should see lower overhead and better real-world efficiency, especially in camera-based apps where image buffers are already channel-last.

  • 🧩 Less app-side postprocessing work
    By returning semantic class maps directly from exported models, apps no longer need to spend as much CPU time decoding huge segmentation outputs.

  • 📈 Better and more stable semantic segmentation performance
    This is especially important for real-time mobile experiences, where unpredictable postprocessing delays can cause lag or jitter.

  • 🔒 More robust export pipeline
    Replacing manual ONNX graph surgery with clean wrapper modules makes exports easier to maintain and less error-prone over time.

  • 📱🍎 Benefits extend beyond QNN
    CoreML semantic exports also gain from the same in-graph class-map idea, so Apple-device deployment gets a speed boost too.

  • 🛠️ Improved reliability for deployment workflows
    Better RKNN testing, cleaner Docker builds, and more dependable docs publishing all reduce friction for developers working across edge and production environments.

  • 👥 Broad user impact

    • Mobile and edge developers get the biggest benefit from this release.
    • Semantic segmentation users should see the clearest speed and usability gains.
    • General users also benefit from clearer docs, cleaner CI, and more reliable export behavior.

In short: v8.4.65 is a strong deployment-focused release 🎉, with the standout improvement being faster, more hardware-friendly QNN exports and smarter semantic segmentation outputs for mobile AI.

What's Changed

Full Changelog: v8.4.64...v8.4.65

Don't miss a new ultralytics release

NewReleases is sending notifications on new releases.