pytorch/vision v0.11.0 on GitHub

This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.

Highlights

New Models

RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:

import torch
from torchvision import models

x = torch.rand(1, 3, 224, 224)

regnet = models.regnet_y_400mf(pretrained=True)
regnet.eval()
predictions = regnet(x)

efficientnet = models.efficientnet_b0(pretrained=True)
efficientnet.eval()
predictions = efficientnet(x)

The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)

Model	Acc@1	Acc@5
regnet_x_400mf	72.834	90.95
regnet_x_800mf	75.212	92.348
regnet_x_1_6gf	77.04	93.44
regnet_x_3_2gf	78.364	93.992
regnet_x_8gf	79.344	94.686
regnet_x_16gf	80.058	94.944
regnet_x_32gf	80.622	95.248
regnet_y_400mf	74.046	91.716
regnet_y_800mf	76.42	93.136
regnet_y_1_6gf	77.95	93.966
regnet_y_3_2gf	78.948	94.576
regnet_y_8gf	80.032	95.048
regnet_y_16gf	80.424	95.24
regnet_y_32gf	80.878	95.34
EfficientNet-B0	77.692	93.532
EfficientNet-B1	78.642	94.186
EfficientNet-B2	80.608	95.31
EfficientNet-B3	82.008	96.054
EfficientNet-B4	83.384	96.594
EfficientNet-B5	83.444	96.628
EfficientNet-B6	84.008	96.916
EfficientNet-B7	84.122	96.908

We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.

FX-based Feature Extraction

A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:

import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor


x = torch.rand(1, 3, 224, 224)

model = resnet50()

return_nodes = {
    "layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)

print(intermediate_outputs['layer4'].shape)

We would like to thank Alexander Soare for developing this utility.

New Data Augmentations

Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:

from torchvision import transforms

t = transforms.RandAugment()
# t = transforms.TrivialAugmentWide()
transformed = t(image)

transform = transforms.Compose([
    transforms.Resize(256),
    transforms.RandAugment(),  # transforms.TrivialAugmentWide()
    transforms.ToTensor()])

We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.

Updated Training Recipes

We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.

Backward-incompatible changes

[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)

Deprecations

[models] Deprecate the C++ vision::models namespace (#4375)

New Features

[datasets] Add iNaturalist dataset (#4123)
[datasets] Download and Kinetics 400/600/700 Datasets (#3680)
[datasets] Added LFW Dataset (#4255)
[models] Add FX feature extraction as an alternative to intermediate_layer_getter (#4302) (#4418)
[models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550)
[ops] Add new masks_to_boxes op (#4290) (#4469)
[ops] Add StochasticDepth implementation (#4301)
[reference scripts] Adding Mixup and Cutmix (#4379)
[transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221)
[transforms] Adding RandAugment implementation (#4348)
[models] Add EfficientNet Architecture in TorchVision (#4293)

Improvements

Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132)
[build] Add CUDA-11.3 builds to torchvision (#4248)
[ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062)
[ci] New issue templates (#4299)
[ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925)
[ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501)
[datasets] Add bzip2 file compression support to datasets (#4097)
[datasets] Faster dataset indexing (#3939)
[datasets] Enable logging of internal dataset instanciations. (#4319) (#4090)
[datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (#4184)
[io] Add warning for files with corrupt containers (#3961)
[models, tests] Add test to check that classification models are FX-compatible (#3662)
[tests] Speedup various tests (#3929) (#3933) (#3936)
[models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448)
[models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327)
[ops, tests] Add JIT tests (#4472)
[ops] Make StochasticDepth FX-compatible (#4373)
[ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211)
[ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095)
[reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407)
[reference scripts] Adding label smoothing on classification reference (#4335)
[reference scripts] Further enhance Classification Reference (#4444)
[reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (#4452)
[reference scripts] Update the metrics output on reference scripts (#4408)
[reference scripts] Warmup schedulers in References (#4411)
[tests] Add check for fx compatibility on segmentation and video models (#4131)
[tests] Mock redirection logic for tests (#4197)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Skip building torchvision with ffmpeg when python==3.9 (#4417)
[tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380)
[tests] make tests that involve GDrive more robust (#4454)
[tests] remove dependency for dtype getters (#4291)
[transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494)
[transforms] Explicitly copying array in pil_to_tensor (#4566) (#4573)
[transforms] Make get_image_size and get_image_num_channels public. (#4321)
[transforms] adding gray images support for adjust_contrast and adjust_saturation (#4477) (#4480)
[utils] Support single color in utils.draw_bounding_boxes (#4075)
[video, documentation] Port the video_api.ipynb notebook to the example gallery (#4241)
[video, io, tests] Added check for invalid input file (#3932)
[video, io] remove deprecated function call (#3861) (#3989)
[video, tests] Removed test_audio_video_sync as it doesn't work as expected (#4050)
[video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)

Bug Fixes

[build] Conda: Add numpy dependency (#4442)
[build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148)
[build] More robust version check (#4285)
[ci] Fix broken clang format test. (#4320)
[ci] Remove mentions of conda-forge (#4082)
[ci] fixup '' -> '/./' for CI filter (#4059)
[datasets] Fix download from google drive which was downloading empty files in some cases (#4109)
[datasets] Fix splitting CelebA dataset (#4377)
[datasets] Add support for files with periods in name (#4099)
[io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (#4167)
[io] Fix size_t issues across JPEG versions and platforms (#4439)
[io] Raise proper error when decoding 16-bits jpegs (#4101)
[io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (#4288)
[io] deinterlacing PNG images with read_image (#4268)
[io] More robust ffmpeg version query in setup.py (#4254)
[io] Fixed read_image bug (#3948)
[models] Don't download backbone weights if pretrained=True (#4283)
[onnx, tests] Do not disable profiling executor in ONNX tests (#4324)
[ops, tests] Fix DeformConvTester::test_backward_cuda by setting threads per block to 512 (#3942)
[ops] Fix typing issue to make DeformConv2d scriptable (#4079)
[ops] Fixes deform_conv issue with large input/output (#4351)
[ops] Resolving tracing problem on StochasticDepth iterator. (#4372)
[ops] Port quantize_val and dequantize_val into torchvision to avoid at::native and android xplat incompatibility (#4311)
[reference scripts] Fix bug on EMA n_averaged estimation. (#4544) (#4545)
[tests] Avoid cmyk in nvjpeg tests (#4246)
[tests] Catch ValueError due to recent change to torch.testing.assert_close (#4165)
[tests] Fix failing tests by catching the proper exception from torch.testing (#4121)
[tests] Skip test if connection issues on fate (#4284)
[transforms] Fix RandAugment and TrivialAugment bugs (#4370)
[transforms] [FBcode->GH] [JIT] Add reference semantics to TorchScript classes (#44324) (#4166)
[utils] Handle grayscale images on draw_bounding_boxes (#4043) (#4049)
[video, io] Fixed missing audio with video_reader and pyav backend (#3934, #4064)

Code Quality

Various typing improvements (#4369) (#4168) (#4169) (#4170) (#4171) (#4224) (#4227) (#4395) (#4409) (#4232) (#4234 (#4236) (#4226) (#4416)
Renamed the “master” branch into “main” (#4306) (#4365)
[ci] (fb-internal only) Allow all torchvision test rules to run with RE (#4073)
[ci] add pre-commit hooks for convenient formatting checks (#4387)
[ci] Import hipify_python only when needed (#4031)
[io] Fixed a couple of typos and removed unnecessary bracket (#4345)
[io] use from_blob to avoid memcpy (#4118)
[models, ops] Moving common layers to ops (#4504)
[models, ops] Replace MobileNetV3's SqueezeExcitation with EfficientNet's one (#4487)
[models] Explicitely store a distance value that is reused (#4341)
[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)
[onnx, tests] Use test images from repo rather than internet for ONNX tests (#4176)
[onnx] Import ONNX utils from symbolic_opset11 module (#4230)
[ops] Fix clang formatting in deform_conv2d_kernel.cu (#3943)
[ops] Update gpu atomics include path (#4478) (reverted)
[reference scripts] Cleaned-up coco evaluation code (#4453)
[reference scripts] remove unused package in coco_eval.py (#4404)
[tests] Ported all tests to pytest (#3962) (#3996) (#3950) (#3964) (#3957) (#3959) (#3981) (#3952) (#3977) (#3974) (#3976) (#3983) (#3971) (#3988) (#3990) (#3985) (#3984) (#4030) (#3955)r (#4008) (#4010) (#4023) (#3954) (#4026) (#3953) (#4047) (#4185) (#3947) (#4045) (#4036) (#4034) (#3978) (#4046) (#3991) (#3930) (#4038) (#4037) (#4215) (#3972) (#3966) (#4114) (#4177) (#4280) (#3946) (#4233) (#4258) (#4035) (#4040) (#4000) (#4196) (#3922) (#4032)
[tests] Prevent tests from leaking their respective RNG (#4497) (#3926) (#4250)
[tests] Remove TestCase dependency for test_models_detection_anchor_utils.py (#4207)
[tests] Removed tests executing deprecated F_t.center/five/ten_crop methods (#4479)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Remove torchvision/test/fakedata_generation.py (#4130)
[transforms, reference scripts] Added PILToTensor and ConvertImageDtype classes in reference scripts and used them to replace ToTensor(#4495, #4481)
[transforms] Refactor AutoAugment to support more augmentations. (#4338)
[transforms] Replace deprecated torch.lstsq with torch.linalg.lstsq (#3918)
[video] Drop virtual from private member functions of Decoder class (#4027)
[video] Fixed comparison warnings in audio_stream and video_stream (#4007)
[video] Fixed some ffmpeg deprecation warnings in decoder (#4003)

Contributors

We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:

ABD-01, Adam J. Stewart, Aditya Oke, Alex Lin, Alexander Grund, Alexander Soare, Allen Goodman, Amani Kiruga, Anirudh, Beat Buesser, beet, Bert Maher, Bruno Korbar, Camilo De La Torre, cyy, D. Khuê Lê-Huu, David Fan, DevPranjal, dgenzel, dgenzel2, Dmitriy Genzel, Drishti Bhasin, Edward Z. Yang, Eli Uriegas, F-G Fernandez, Francisco Massa, Gary Miguel, Gaurav7888, IgorSusmelj, Ishan Kumar, Ivan Kobzarev, Jiawei Liu, Jithun Nair, Joao Gomes, Joe Early, Julien RIPOCHE, julienripoche, Kai Zhang, kingyiusuen, Loi Ly, Matti Picus, Meghan Lele, Muhammed Abdullah, Nicolas Hug, Nikita Shulga, ORippler, peterbell10, Philip Meier, Prabhat Roy, puhuk, Rajat Jaiswal, S Harish, Sahil Goyal, Samuel Gabriel, Santiago Castro, Saswat Das, Sepehr Sameni, Shengwei An, Shrill Shrestha, Shruti Pulstya, Sugato Ray, tanvimoharir, Vasilis Vryniotis, Vassilis C. Nicodemou, Vassilis Nicodemou, vfdev-5, Vincent Moens, Vivek Kumar, Yi Zhang, Yiwen Song, Yonghye Kwon, Yuchen Huang, Zhengxu Chen, Zhiqiang Wang, Zhongkai Zhu, zzk1st

pytorch/vision v0.11.0 RegNet, EfficientNet, FX Feature Extraction and more on GitHub