Highlights

[BETA] Transforms and augmentations

Major speedups

The new transforms in torchvision.transforms.v2 support image classification, segmentation, detection, and video tasks. They are now 10%-40% faster than before! This is mostly achieved thanks to 2X-4X improvements made to v2.Resize(), which now supports native uint8 tensors for Bilinear and Bicubic mode. Output results are also now closer to PIL's! Check out our performance recommendations to learn more.

Additionally, torchvision now ships with libjpeg-turbo instead of libjpeg, which should significantly speed-up the jpeg decoding utilities (read_image, decode_jpeg), and avoid compatibility issues with PIL.

CutMix and MixUp

Long-awaited support for the CutMix and MixUp augmentations is now here! Check our tutorial to learn how to use them.

Towards stable V2 transforms

In the previous release 0.15 we BETA-released a new set of transforms in torchvision.transforms.v2 with native support for tasks like segmentation, detection, or videos. We have now stabilized the design decisions of these transforms and made further improvements in terms of speedups, usability, new transforms support, etc.

We're keeping the torchvision.transforms.v2 and torchvision.tv_tensors namespaces as BETA until 0.17 out of precaution, but we do not expect disruptive API changes in the future.

Whether you’re new to Torchvision transforms, or you’re already experienced with them, we encourage you to start with Getting started with transforms v2 in order to learn more about what can be done with the new v2 transforms.

Browse our main docs for general information and performance tips. The available transforms and functionals are listed in the API reference. Additional information and tutorials can also be found in our example gallery, e.g. Transforms v2: End-to-end object detection/segmentation example or How to write your own v2 transforms.

[BETA] MPS support

The nms and roi-align kernels (roi_align, roi_pool, ps_roi_align, ps_roi_pool) now support MPS. Thanks to Li-Huai (Allan) Lin for this contribution!

Detailed Changes

Deprecations / Breaking changes

All changes below happened in the transforms.v2 and datapoints namespaces, which were BETA and protected with a warning. We do not expect other disruptive changes to these APIs moving forward!

[transforms.v2] to_grayscale() is not deprecated anymore (#7707)
[transforms.v2] Renaming: torchvision.datapoints.Datapoint -> torchvision.tv_tensors.TVTensor (#7904, #7894)
[transforms.v2] Renaming: BoundingBox -> BoundingBoxes (#7778)
[transforms.v2] Renaming: BoundingBoxes.spatial_size -> BoundingBoxes.canvas_size (#7734)
[transforms.v2] All public method on TVTensor classes (previously: Datapoint classes) were removed
[transforms.v2] transforms.v2.utils is now private. (#7863)
[transforms.v2] Remove wrap_like class method and add tv_tensors.wrap() function (#7832)

New Features

[transforms.v2] Add support for MixUp and CutMix (#7731, #7784)
[transforms.v2] Add PermuteChannels transform (#7624)
[transforms.v2] Add ToPureTensor transform (#7823)
[ops] Add MPS kernels for nms and roi ops (#7643)

Improvements

[io] Added support for CMYK images in decode_jpeg (#7741)
[io] Package torchvision with libjpeg-turbo instead of libjpeg (#7672, #7840)
[models] Downloaded weights are now sha256-validated (#7219)
[transforms.v2] Massive Resize speed-up by adding native uint8 support for bilinear and bicubic modes (#7557, #7668)
[transforms.v2] Enforce pickleability for v2 transforms and wrapped datasets (#7860)
[transforms.v2] Allow catch-all "others" key in fill dicts. (#7779)
[transforms.v2] Allow passthrough for Resize (#7521)
[transforms.v2] Add scale option to ToDtype. Remove ConvertDtype. (#7759, #7862)
[transforms.v2] Improve UX for Compose (#7758)
[transforms.v2] Allow users to choose whether to return TVTensor subclasses or pure Tensor (#7825)
[transforms.v2] Remove import-time warning for v2 namespaces (#7853, 7897)
[transforms.v2] Speedup hsv2rgb (#7754)
[models] Add filter parameters to list_models() (#7718)
[models] Assert RAFT input resolution is 128 x 128 or higher (#7339)
[ops] Replaced gpuAtomicAdd by fastAtomicAdd (#7596)
[utils] Add GPU support for draw_segmentation_masks (#7684)
[ops] Add deterministic, pure-Python roi_align implementation (#7587)
[tv_tensors] Make TVTensors deepcopyable (#7701)
[datasets] Only return small set of targets by default from dataset wrapper (#7488)
[references] Added support for v2 transforms and tensors / tv_tensors backends (#7732, #7511, #7869, #7665, #7629, #7743, #7724, #7742)
[doc] A lot of documentation improvements (#7503, #7843, #7845, #7836, #7830, #7826, #7484, #7795, #7480, #7772, #7847, #7695, #7655, #7906, #7889, #7883, #7881, #7867, #7755, #7870, #7849, #7854, #7858, #7621, #7857, #7864, #7487, #7859, #7877, #7536, #7886, #7679, #7793, #7514, #7789, #7688, #7576, #7600, #7580, #7567, #7459, #7516, #7851, #7730, #7565, #7777)

Bug Fixes

[datasets] Fix split=None in MovingMNIST (#7449)
[io] Fix heap buffer overflow in decode_png (#7691)
[io] Fix blurry screen in video decoder (#7552)
[models] Fix weight download URLs for some models (#7898)
[models] Fix ShuffleNet ONNX export (#7686)
[models] Fix detection models with pytorch 2.0 (#7592, #7448)
[ops] Fix segfault in DeformConv2d when mask is None (#7632)
[transforms.v2] Stricter SanitizeBoundingBoxes labels_getter heuristic (#7880)
[transforms.v2] Make sure RandomPhotometricDistort transforms all images the same (#7442)
[transforms.v2] Fix v2.Lambda’s transformed types (#7566)
[transforms.v2] Don't call round() on float images for Resize (#7669)
[transforms.v2] Let SanitizeBoundingBoxes preserve output type (#7446)
[transforms.v2] Fixed int type support for sigma in GaussianBlur (#7887)
[transforms.v2] Fixed issue with jitted AutoAugment transforms (#7839)
[transforms] Fix Resize pass-through logic (#7519)
[utils] Fix color in draw_segmentation_masks (#7520)

Others

[tests] Various test improvements / fixes (#7693, #7816, #7477, #7783, #7716, #7355, #7879, #7874, #7882, #7447, #7856, #7892, #7902, #7884, #7562, #7713, #7708, #7712, #7703, #7641, #7855, #7842, #7717, #7905, #7553, #7678, #7908, #7812, #7646, #7841, #7768, #7828, #7820, #7550, #7546, #7833, #7583, #7810, #7625, #7651)
[CI] Various CI improvements (#7485, #7417, #7526, #7834, #7622, #7611, #7872, #7628, #7499, #7616, #7475, #7639, #7498, #7467, #7466, #7441, #7524, #7648, #7640, #7551, #7479, #7634, #7645, #7578, #7572, #7571, #7591, #7470, #7574, #7569, #7435, #7635, #7590, #7589, #7582, #7656, #7900, #7815, #7555, #7694, #7558, #7533, #7547, #7505, #7502, #7540, #7573)
[Code Quality] Various code quality improvements (#7559, #7673, #7677, #7771, #7770, #7710, #7709, #7687, #7454, #7464, #7527, #7462, #7662, #7593, #7797, #7805, #7786, #7831, #7829, #7846, #7806, #7814, #7606, #7613, #7608, #7597, #7792, #7781, #7685, #7702, #7500, #7804, #7747, #7835, #7726, #7796)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Adam J. Stewart, Aditya Oke , Andrey Talman, Camilo De La Torre, Christoph Reich, Danylo Baibak, David Chiu, David Garcia, Dennis M. Pöpperl, Dhuige, Duc Mguyen, Edward Z. Yang, Eric Sauser , Fansure Grin, Huy Do, Illia Vysochyn, Johannes, Kai Wana, Kobrin Eli, kurtamohler, Li-Huai (Allan) Lin, Liron Ilouz, Masahiro Hiramori, Mateusz Guzek, Max Chuprov, Minh-Long Luu (刘明龙), Minliang Lin, mpearce25, Nicolas Granger, Nicolas Hug , Nikita Shulga, Omkar Salpekar, Paul Mulders, Philip Meier , ptrblck, puhuk, Radek Bartoň, Richard Barnes , Riza Velioglu, Sahil Goyal, Shu, Sim Sun, SvenDS9, Tommaso Bianconcini, Vadim Zubov, vfdev-5

pytorch/vision v0.16.0 TorchVision 0.16 - Transforms speedups, CutMix/MixUp, and MPS support! on GitHub