ultralytics 8.4.87 on Python PyPI

🌟 Summary

🚀 Ultralytics v8.4.87 delivers a cleaner, safer GPU device-selection system, plus stability and performance fixes for training, inference, tracking, exports, and dataset checks.

📊 Key Changes

Clean-sheet CUDA device selection 🧭
- Added parse_device() to normalize device inputs such as cuda:0, 0,1, lists/tuples, torch.device, and -1 idle-GPU auto-selection.
- select_device() no longer mutates CUDA_VISIBLE_DEVICES, making device selection predictable across repeated calls and long-running Python processes.
- Explicit single-GPU requests now use torch.cuda.set_device() instead of environment-variable remapping.
- Trainer, DDP setup, validation, autobatch, and distributed barriers now consistently use resolved CUDA device indices.
- Added documentation for ultralytics.utils.torch_utils.parse_device.
Stronger GPU training tests 🧪
- Added a cold-process nonzero-GPU training test to better match real CLI and Ultralytics Platform training behavior.
- Verifies that training on GPUs like device=1 or higher works correctly from a fresh process without relying on previous CUDA initialization.
Fixed DataLoader worker cleanup at training shutdown 🧹
- Added a close() method to InfiniteDataLoader.
- Training now explicitly shuts down persistent train and validation workers before Python exits.
- Helps prevent end-of-run DataLoader worker ... killed by signal: Terminated errors after results are already saved.
Improved inference warmup for standard NMS ⚡
- AutoBackend.warmup() now preloads torchvision for non-end-to-end models.
- This helps later non-max suppression calls use faster torchvision NMS when appropriate, reducing first-inference latency after warmup.
Corrected dataset file-speed reporting 💾
- Fixed an inverted condition in check_file_speeds().
- Slow storage, such as network-mounted datasets, should now trigger the intended warning instead of being incorrectly reported as “Fast image access ✅”.
Tracking ReID device alignment 🎯
- Trackers now pass the predictor device into ReID encoders.
- ReID models are initialized and run on the same device as prediction where applicable, improving consistency for tracking workflows.
Export reliability improvements 📦
- TensorFlow SavedModel export now distinguishes CUDA vs non-CUDA export paths more carefully.
- CPU exports hide TensorFlow GPUs where possible to avoid unnecessary GPU memory use.
- ONNX Runtime and Paddle dependency checks now better handle interchangeable CPU/GPU package variants to avoid unnecessary or conflicting installs.
- Paddle export now uses the actual export device to decide whether GPU Paddle is needed.

🎯 Purpose & Impact

More reliable GPU behavior 🚀
- Users should see fewer surprises when training, validating, predicting, or exporting repeatedly in the same Python session.
- This is especially important for notebooks, services, CI, distributed training, and production systems where changing CUDA_VISIBLE_DEVICES mid-process can cause hard-to-debug issues.
Better support for nonzero GPU training 🖥️
- Training on GPUs beyond CUDA:0 is now more robust, including cold-start CLI usage common in production and Ultralytics Platform environments.
Cleaner shutdowns after training ✅
- Persistent DataLoader workers are now cleaned up explicitly, reducing noisy shutdown crashes and improving confidence that completed runs exit cleanly.
Lower latency after warmup ⚡
- Standard detection workflows can benefit from smoother post-warmup inference performance by ensuring faster NMS paths are ready when needed.
More accurate dataset diagnostics 📊
- Users with slow disks or network storage will receive correct warnings, helping them identify dataset I/O bottlenecks that can slow training.
More consistent tracking and export workflows 🔄
- ReID tracking components now better follow the selected prediction device.
- Export paths are less likely to allocate unwanted GPU memory or install conflicting runtime packages.

What's Changed

Add cold-process nonzero-device GPU train test by @glenn-jocher in #25019
Fix inverted read-speed condition in dataset file speed check by @ahmet-f-gumustas in #25025
Fix leaked dataloader workers at end of training (atexit killed by signal: Terminated crash) by @Bovey0809 in #25024
Preload torchvision during warmup for non-end2end NMS path by @Y-T-G in #25023
Clean-sheet device selection: stop mutating CUDA_VISIBLE_DEVICES by @glenn-jocher in #25021

Full Changelog: v8.4.86...v8.4.87

ultralytics 8.4.87 v8.4.87 - Clean-sheet device selection: stop mutating `CUDA_VISIBLE_DEVICES` (#25021) on Python PyPI

🌟 Summary

📊 Key Changes

🎯 Purpose & Impact

What's Changed

ultralytics 8.4.87
v8.4.87 - Clean-sheet device selection: stop mutating `CUDA_VISIBLE_DEVICES` (#25021)

on Python PyPI