Highlights

[BETA] New transforms API

TorchVision is extending its Transforms API! Here is what’s new:

You can use them not only for Image Classification but also for Object Detection, Instance & Semantic Segmentation and Video Classification.
You can use new functional transforms for transforming Videos, Bounding Boxes and Segmentation Masks.

The API is completely backward compatible with the previous one, and remains the same to assist the migration and adoption. We are now releasing this new API as Beta in the torchvision.transforms.v2 namespace, and we would love to get early feedback from you to improve its functionality. Please reach out to us if you have any questions or suggestions.

import torchvision.transforms.v2 as transforms

# Exactly the same interface as V1:
trans = transforms.Compose([
    transforms.ColorJitter(contrast=0.5),
    transforms.RandomRotation(30),
    transforms.CenterCrop(480),
])
imgs, bboxes, masks, labels = trans(imgs, bboxes, masks, labels)

You can read more about these new transforms in our docs, and you can also check out our examples:

Note that this API is still Beta. While we do not expect major breaking changes, some APIs may still change according to user feedback. Please submit any feedback you may have in #6753, and you can also check out #7319 to learn more about the APIs that we suspect might involve future changes.

[BETA] New Video Swin Transformer

We added a Video SwinTransformer model is based on the Video Swin Transformer paper.

import torch
from torchvision.models.video import swin3d_t

video = torch.rand(1, 3, 32, 800, 600)
# or swin3d_b, swin3d_s
model = swin3d_t(weights="DEFAULT")
model.eval()
with torch.inference_mode():
    prediction = model(video)
print(prediction)

The model has the following accuracies on the Kinetics-400 dataset:

Model	Acc@1	Acc@5
swin3d_t	77.7	93.5
swin3d_s	79.5	94.1
swin3d_b	79.4	94.4

We would like to thank oke-aditya for this contribution.

Detailed Changes (PRs)

BC-breaking changes

[models] Fixed a bug inside ops.MLP when backpropagating with dropout>0 by implicitly setting the inplace argument of nn.Dropout to False (#7209)
[models, transforms] remove functionality scheduled for 0.15 after deprecation (#7176)
We removed deprecated functionalities according to the deprecation cycle: gen_bar_updater, model_urls/quant_model_urls in models.

Deprecations

[transforms] Change default of antialias parameter from None to 'warn' (#7160)
For all transforms / functionals that have the interpolate parameter, we change its current default from None to "warn" value that behaves exactly like None, but raises a warning indicating users to explicitly set either True, False or None. In v0.17.0 we plan remove "warn" and set the default to True.

[transforms] Deprecate functional_pil and functional_tensor and make them private (#7269)
Since v0.15.0 torchvision.transforms.functional_pil and torchvision.transforms.functional_tensor have become private and will be removed in v0.17.0. Please use torchvision.transforms.functional or torchvision.transforms.v2.functional instead.

[transforms] Undeprecate PIL int constants for interpolation (#7241)
We restored the support for integer interpolation mode (Pillow constants) which was deprecated since v0.13.0 (as PIL un-deprecated those as well).

New Features

[transforms] New transforms API (see highlight)
[models] Add Video SwinTransformer (see highlight) (#6521)

Improvements

[transforms] introduce nearest-exact interpolation (#6754)
[transforms] add sequence fill support for ElasticTransform (#7141)
[transforms] perform out of bounds check for single values and two tuples in ColorJitter (#7133)
[datasets] Fixes use download of SBU dataset (#7046) (#7051)
[hub] Add video models to torchhub (#7083)
[hub] Expose maxvit and swin_v2 models to torchhub (#7078)
[io] suppress warning in VideoReader (#6976, 6971)
[io] Set pytorch vision decoder probesize for getting stream info based on the value from decode setting (#6900) (#6950)
[io] improve warning message for missing image extension (#7150)
[io] Read video from memory newapi (#6771)
[models] Allow dropout overwrites on EfficientNet (#7031)
[models] Don't use named args in MHA calls to allow applying pytorch forward hooks to VIT (#6956)
[onnx] Support exporting RoiAlign align=True to ONNX with opset 16 (#6685)
[ops] Handle invalid reduction values (#6675)
[datasets] Add MovingMNIST dataset (#7042)
Add torchvision maintainers guide (#7109)
[Documentation] Various doc improvements (#7041, #6947, #6690, #7142, #7156, #7025, #7048, #7074, #6936, #6694, #7161, #7164, #6912, #6854, #6926, #7065, #6813)
[CI] Various CI improvements (#6864, #6863, #6855, #6856, #6803, #6893, #6865, #6804, #6866, #6742, #7273, #6999, #6713, #6972, #6954, #6968, #6987, #7004, #7010, #7014, #6915, #6797, #6759, #7060, #6857, #7212, #7199, #7186, #7183, #7178, #7163, #7181, #6789, #7110, #7088, #6955, #6788, #6970)
[tests] Various tests improvements (#7020, #6939, #6658, #7216, #6996, #7363, #7379, #7218, #7286, #6901, #7059, #7202, #6708, #7013, #7206, #7204, #7233)

Bug Fixes

[datasets] fix MNIST byte flipping (#7081)
[models] properly support deepcopying and serialization of model weights (#7107)
[models] Use inplace=None as default in ops.MLP (#7209)
[models] Fix dropout issue in swin transformers (#7224)
[reference scripts] Fix quantized classif reference - missing args (#7072)
[models, tests] [FBcode->GH] Fix GRACE_HOPPER file internal discovery (#6719)
[transforms] Replace getbands() with get_image_num_channels() (#6941)
[transforms] Switch view() with reshape() on equalize (#6772)
[transforms] add sequence fill support for ElasticTransform (#7141)
[transforms] make RandomErasing scriptable for integer value (#7134)
[video] fix bug in output format for pyav (#6672)
[video, datasets] [bugfix] Fix the output format for VideoClips.subset (#6700)
[onnx] Fix dtype for NonMaxSuppression (#7056)

Code Quality

[datasets] Remove unused import (#7245)
[models] Fix error message typo (#6682)
[models] make weights deepcopyable (#6883)
[models] Fix missing f-string prefix in error message (#6684)
[onnx] [ONNX] Rephrase ONNX RoiAlign warning for aligned=True (#6704)
[onnx] [ONNX] misc improvements (#7249)
[ops] Raise kernel launch errors instead of just print error message in cuda ops (#7080)
[ops, tests] Remove torch.jit.fuser("fuser2") in test (#7069)
[tests] replace assert torch.allclose with torch.testing.assert_allclose (#6895)
[transforms] Remove old TODO about using _log_api_usage_once() (#7277)
[transforms] Fixed repr for ElasticTransform (#6758)
[transforms] Use is False for some antialias checks (#7234)
[datasets, models] Various type-hints improvements (#6844, #6929, #6843, #7087, #6735, #6845, #6846)
[all] switch to C++17 following the core library (#7116)

Prototype

Most of these PRs (not all) relate to the transforms V2 work (#7122, #7120, #7113, #7270, #7037, #6665, #6944, #6919, #7033, #7138, #6718, #6068, #7194, #6997, #6647, #7279, #7232, #7225, #6663, #7235, #7236, #7275, #6791, #6786, #7203, #7009, #7278, #7238, #7230, #7118, #7119, #6876, #7190, #6995, #6879, #6904, #6921, #6905, #6977, #6714, #6924, #6984, #6631, #7276, #6757, #7227, #7197, #7170, #7228, #7246, #7255, #7254, #7253, #7248, #7256, #7257, #7252, #6724, #7215, #7260, #7261, #7244, #7271, #7231, #6738, #7268, #7258, #6933, #6891, #6890, #7012, #6896, #6881, #6880, #6877, #7045, #6858, #6830, #6935, #6938, #6914, #6907, #6897, #6903, #6859, #6835, #6837, #6807, #6776, #6784, #6795, #7135, #6930, #7153, #6762, #6681, #7139, #6831, #6826, #6821, #6819, #6820, #6805, #6811, #6783, #6978, #6667, #6741, #6763, #6774, #6748, #6749, #6722, #6756, #6712, #6733, #6736, #6874, #6767, #6902, #6847, #6851, #6777, #6770, #6800, #6812, #6702, #7223, #6906, #7226, #6860, #6934, #6726, #6730, #7196, #7211, #7229, #7177, #6923, #6949, #6913, #6775, #7091, #7136, #7154, #6833, #6824, #6785, #6710, #6653, #6751, #6503, #7266, #6729, #6989, #7002, #6892, #6888, #6894, #6988, #6940, #6942, #6945, #6983, #6773, #6832, #6834, #6828, #6801, #7084)

Contributors

We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

Aditya Gandhamal, Aditya Oke, Aidyn-A, Akira Noda, Andrey Talman, Bowen Bao, Bruno Korbar, Chen Liu, cyy, David Berard, deepsghimire, Erjia Guan, F-G Fernandez, Jithun Nair, Joao Gomes, John Detloff, Justin Chu, Karan Desai, lezcano, mpearce25, Nghia, Nicolas Hug, Nikita Shulga, nps1ngh, Omkar Salpekar, Philip Meier, Robert Perrotta, RoiEX, Samantha Andow, Sergii Dymchenko, shunsuke yokokawa, Sim Sun, Toni Blaslov, toni057, Vasilis Vryniotis, vfdev-5, Vladislav Sovrasov, vsuryamurthy, Yosua Michael Maranatha, Yuxin Wu

pytorch/vision v0.15.1 TorchVision 0.15 - New transforms API! on GitHub