This release introduces the RegNet and EfficientNet architectures, a new FX-based utility to perform Feature Extraction, new data augmentation techniques such as RandAugment and TrivialAugment, updated training recipes that support EMA, Label Smoothing, Learning-Rate Warmup, Mixup and Cutmix, and many more.
Highlights
New Models
RegNet and EfficientNet are two popular architectures that can be scaled to different computational budgets. In this release we include 22 pre-trained weights for their classification variants. The models were trained on ImageNet and can be used as follows:
import torch
from torchvision import models
x = torch.rand(1, 3, 224, 224)
regnet = models.regnet_y_400mf(pretrained=True)
regnet.eval()
predictions = regnet(x)
efficientnet = models.efficientnet_b0(pretrained=True)
efficientnet.eval()
predictions = efficientnet(x)
The accuracies of the pre-trained models obtained on ImageNet val are seen below (see #4403, #4530 and #4293 for more details)
Model | Acc@1 | Acc@5 |
---|---|---|
regnet_x_400mf | 72.834 | 90.95 |
regnet_x_800mf | 75.212 | 92.348 |
regnet_x_1_6gf | 77.04 | 93.44 |
regnet_x_3_2gf | 78.364 | 93.992 |
regnet_x_8gf | 79.344 | 94.686 |
regnet_x_16gf | 80.058 | 94.944 |
regnet_x_32gf | 80.622 | 95.248 |
regnet_y_400mf | 74.046 | 91.716 |
regnet_y_800mf | 76.42 | 93.136 |
regnet_y_1_6gf | 77.95 | 93.966 |
regnet_y_3_2gf | 78.948 | 94.576 |
regnet_y_8gf | 80.032 | 95.048 |
regnet_y_16gf | 80.424 | 95.24 |
regnet_y_32gf | 80.878 | 95.34 |
EfficientNet-B0 | 77.692 | 93.532 |
EfficientNet-B1 | 78.642 | 94.186 |
EfficientNet-B2 | 80.608 | 95.31 |
EfficientNet-B3 | 82.008 | 96.054 |
EfficientNet-B4 | 83.384 | 96.594 |
EfficientNet-B5 | 83.444 | 96.628 |
EfficientNet-B6 | 84.008 | 96.916 |
EfficientNet-B7 | 84.122 | 96.908 |
We would like to thank Ross Wightman and Luke Melas-Kyriazi for contributing the weights of the EfficientNet variants.
FX-based Feature Extraction
A new Feature Extraction method has been added to our utilities. It uses PyTorch FX and enables us to retrieve the outputs of intermediate layers of a network which is useful for feature extraction and visualization. Here is an example of how to use the new utility:
import torch
from torchvision.models import resnet50
from torchvision.models.feature_extraction import create_feature_extractor
x = torch.rand(1, 3, 224, 224)
model = resnet50()
return_nodes = {
"layer4.2.relu_2": "layer4"
}
model2 = create_feature_extractor(model, return_nodes=return_nodes)
intermediate_outputs = model2(x)
print(intermediate_outputs['layer4'].shape)
We would like to thank Alexander Soare for developing this utility.
New Data Augmentations
Two new Automatic Augmentation techniques were added: Rand Augment and Trivial Augment. Both methods can be used as drop-in replacement of the AutoAugment technique as seen below:
from torchvision import transforms
t = transforms.RandAugment()
# t = transforms.TrivialAugmentWide()
transformed = t(image)
transform = transforms.Compose([
transforms.Resize(256),
transforms.RandAugment(), # transforms.TrivialAugmentWide()
transforms.ToTensor()])
We would like to thank Samuel G. Müller for contributing Trivial Augment and for his help on refactoring the AA package.
Updated Training Recipes
We have updated our training reference scripts to add support of Exponential Moving Average, Label Smoothing, Learning-Rate Warmup, Mixup, Cutmix and other SOTA primitives. The above enabled us to improve the classification Acc@1 of some pre-trained models by over 4 points. A major update of the existing pre-trained weights is expected on the next release.
Backward-incompatible changes
[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)
Deprecations
[models] Deprecate the C++ vision::models namespace (#4375)
New Features
[datasets] Add iNaturalist dataset (#4123)
[datasets] Download and Kinetics 400/600/700 Datasets (#3680)
[datasets] Added LFW Dataset (#4255)
[models] Add FX feature extraction as an alternative to intermediate_layer_getter (#4302) (#4418)
[models] Add RegNet Architecture in TorchVision (#4403) (#4530) (#4550)
[ops] Add new masks_to_boxes op (#4290) (#4469)
[ops] Add StochasticDepth implementation (#4301)
[reference scripts] Adding Mixup and Cutmix (#4379)
[transforms] Integration of TrivialAugment with the current AutoAugment Code (#4221)
[transforms] Adding RandAugment implementation (#4348)
[models] Add EfficientNet Architecture in TorchVision (#4293)
Improvements
Various documentation improvements (#4239) (#4251) (#4275) (#4342) (#3894) (#4159) (#4133) (#4138) (#4089) (#3944) (#4349) (#3754) (#4308) (#4352) (#4318) (#4244) (#4362) (#3863) (#4382) (#4484) (#4503) (#4376) (#4457) (#4505) (#4363) (#4361) (#4337) (#4546) (#4553) (#4565) (#4567) (#4574) (#4575) (#4383) (#4390) (#3409) (#4451) (#4340) (#3967) (#4072) (#4028) (#4132)
[build] Add CUDA-11.3 builds to torchvision (#4248)
[ci, tests] Skip some CPU-only tests on CircleCI machines with GPU (#4002) (#4025) (#4062)
[ci] New issue templates (#4299)
[ci] Various CI improvements, in particular putting back GPU testing on windows (#4421) (#4014) (#4053) (#4482) (#4475) (#3998) (#4388) (#4179) (#4394) (#4162) (#4065) (#3928) (#4081) (#4203) (#4011) (#4055) (#4074) (#4419) (#4067) (#4201) (#4200) (#4202) (#4496) (#3925)
[ci] ping maintainers in case a PR was not properly labeled (#3993) (#4012) (#4021) (#4501)
[datasets] Add bzip2 file compression support to datasets (#4097)
[datasets] Faster dataset indexing (#3939)
[datasets] Enable logging of internal dataset instanciations. (#4319) (#4090)
[datasets] Removed copy=False in torch.from_numpy in MNIST to avoid warning (#4184)
[io] Add warning for files with corrupt containers (#3961)
[models, tests] Add test to check that classification models are FX-compatible (#3662)
[tests] Speedup various tests (#3929) (#3933) (#3936)
[models] Allow custom activation in SqueezeExcitation of EfficientNet (#4448)
[models] Allow gradient backpropagation through GeneralizedRCNNTransform to inputs (#4327)
[ops, tests] Add JIT tests (#4472)
[ops] Make StochasticDepth FX-compatible (#4373)
[ops] Added backward pass on CPU and CUDA for interpolation with anti-alias option (#4208) (#4211)
[ops] Small refactoring to support opt mode for torchvision ops (fb internal specific) (#4080) (#4095)
[reference scripts] Added Exponential Moving Average support to classification reference script (#4381) (#4406) (#4407)
[reference scripts] Adding label smoothing on classification reference (#4335)
[reference scripts] Further enhance Classification Reference (#4444)
[reference scripts] Replaced to_tensor() with pil_to_tensor() + convert_image_dtype() (#4452)
[reference scripts] Update the metrics output on reference scripts (#4408)
[reference scripts] Warmup schedulers in References (#4411)
[tests] Add check for fx compatibility on segmentation and video models (#4131)
[tests] Mock redirection logic for tests (#4197)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Skip building torchvision with ffmpeg when python==3.9 (#4417)
[tests] [jit] Make operation call accept Stack& instead Stack* (#63414) (#4380)
[tests] make tests that involve GDrive more robust (#4454)
[tests] remove dependency for dtype getters (#4291)
[transforms] Replaced example usage of ToTensor() by PILToTensor() + ConvertImageDtype() (#4494)
[transforms] Explicitly copying array in pil_to_tensor (#4566) (#4573)
[transforms] Make get_image_size and get_image_num_channels public. (#4321)
[transforms] adding gray images support for adjust_contrast and adjust_saturation (#4477) (#4480)
[utils] Support single color in utils.draw_bounding_boxes (#4075)
[video, documentation] Port the video_api.ipynb notebook to the example gallery (#4241)
[video, io, tests] Added check for invalid input file (#3932)
[video, io] remove deprecated function call (#3861) (#3989)
[video, tests] Removed test_audio_video_sync as it doesn't work as expected (#4050)
[video] Build torchvision with ffmpeg only on Linux and ignore ffmpeg on other platforms (#4413, #4410, #4041)
Bug Fixes
[build] Conda: Add numpy dependency (#4442)
[build] Explicitly exclude PIL 8.3.0 from compatible dependencies (#4148)
[build] More robust version check (#4285)
[ci] Fix broken clang format test. (#4320)
[ci] Remove mentions of conda-forge (#4082)
[ci] fixup '' -> '/./' for CI filter (#4059)
[datasets] Fix download from google drive which was downloading empty files in some cases (#4109)
[datasets] Fix splitting CelebA dataset (#4377)
[datasets] Add support for files with periods in name (#4099)
[io, tests] Don't check transparency channel for pil >= 8.3 in test_decode_png (#4167)
[io] Fix size_t issues across JPEG versions and platforms (#4439)
[io] Raise proper error when decoding 16-bits jpegs (#4101)
[io] Unpinned the libjpeg version and fixed jpeg_mem_dest's size type Wind… (#4288)
[io] deinterlacing PNG images with read_image (#4268)
[io] More robust ffmpeg version query in setup.py (#4254)
[io] Fixed read_image bug (#3948)
[models] Don't download backbone weights if pretrained=True (#4283)
[onnx, tests] Do not disable profiling executor in ONNX tests (#4324)
[ops, tests] Fix DeformConvTester::test_backward_cuda by setting threads per block to 512 (#3942)
[ops] Fix typing issue to make DeformConv2d scriptable (#4079)
[ops] Fixes deform_conv issue with large input/output (#4351)
[ops] Resolving tracing problem on StochasticDepth iterator. (#4372)
[ops] Port quantize_val and dequantize_val into torchvision to avoid at::native and android xplat incompatibility (#4311)
[reference scripts] Fix bug on EMA n_averaged estimation. (#4544) (#4545)
[tests] Avoid cmyk in nvjpeg tests (#4246)
[tests] Catch ValueError due to recent change to torch.testing.assert_close (#4165)
[tests] Fix failing tests by catching the proper exception from torch.testing (#4121)
[tests] Skip test if connection issues on fate (#4284)
[transforms] Fix RandAugment and TrivialAugment bugs (#4370)
[transforms] [FBcode->GH] [JIT] Add reference semantics to TorchScript classes (#44324) (#4166)
[utils] Handle grayscale images on draw_bounding_boxes (#4043) (#4049)
[video, io] Fixed missing audio with video_reader and pyav backend (#3934, #4064)
Code Quality
Various typing improvements (#4369) (#4168) (#4169) (#4170) (#4171) (#4224) (#4227) (#4395) (#4409) (#4232) (#4234 (#4236) (#4226) (#4416)
Renamed the “master” branch into “main” (#4306) (#4365)
[ci] (fb-internal only) Allow all torchvision test rules to run with RE (#4073)
[ci] add pre-commit hooks for convenient formatting checks (#4387)
[ci] Import hipify_python only when needed (#4031)
[io] Fixed a couple of typos and removed unnecessary bracket (#4345)
[io] use from_blob to avoid memcpy (#4118)
[models, ops] Moving common layers to ops (#4504)
[models, ops] Replace MobileNetV3's SqueezeExcitation with EfficientNet's one (#4487)
[models] Explicitely store a distance value that is reused (#4341)
[models] Use torch instead of scipy for random initialization of inception and googlenet weights (#4256)
[onnx, tests] Use test images from repo rather than internet for ONNX tests (#4176)
[onnx] Import ONNX utils from symbolic_opset11 module (#4230)
[ops] Fix clang formatting in deform_conv2d_kernel.cu (#3943)
[ops] Update gpu atomics include path (#4478) (reverted)
[reference scripts] Cleaned-up coco evaluation code (#4453)
[reference scripts] remove unused package in coco_eval.py (#4404)
[tests] Ported all tests to pytest (#3962) (#3996) (#3950) (#3964) (#3957) (#3959) (#3981) (#3952) (#3977) (#3974) (#3976) (#3983) (#3971) (#3988) (#3990) (#3985) (#3984) (#4030) (#3955)r (#4008) (#4010) (#4023) (#3954) (#4026) (#3953) (#4047) (#4185) (#3947) (#4045) (#4036) (#4034) (#3978) (#4046) (#3991) (#3930) (#4038) (#4037) (#4215) (#3972) (#3966) (#4114) (#4177) (#4280) (#3946) (#4233) (#4258) (#4035) (#4040) (#4000) (#4196) (#3922) (#4032)
[tests] Prevent tests from leaking their respective RNG (#4497) (#3926) (#4250)
[tests] Remove TestCase dependency for test_models_detection_anchor_utils.py (#4207)
[tests] Removed tests executing deprecated F_t.center/five/ten_crop methods (#4479)
[tests] Replace set_deterministic with non-deprecated spelling (#4212)
[tests] Remove torchvision/test/fakedata_generation.py (#4130)
[transforms, reference scripts] Added PILToTensor and ConvertImageDtype classes in reference scripts and used them to replace ToTensor(#4495, #4481)
[transforms] Refactor AutoAugment to support more augmentations. (#4338)
[transforms] Replace deprecated torch.lstsq with torch.linalg.lstsq (#3918)
[video] Drop virtual from private member functions of Decoder class (#4027)
[video] Fixed comparison warnings in audio_stream and video_stream (#4007)
[video] Fixed some ffmpeg deprecation warnings in decoder (#4003)
Contributors
We're grateful for our community, which helps us improving torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
ABD-01, Adam J. Stewart, Aditya Oke, Alex Lin, Alexander Grund, Alexander Soare, Allen Goodman, Amani Kiruga, Anirudh, Beat Buesser, beet, Bert Maher, Bruno Korbar, Camilo De La Torre, cyy, D. Khuê Lê-Huu, David Fan, DevPranjal, dgenzel, dgenzel2, Dmitriy Genzel, Drishti Bhasin, Edward Z. Yang, Eli Uriegas, F-G Fernandez, Francisco Massa, Gary Miguel, Gaurav7888, IgorSusmelj, Ishan Kumar, Ivan Kobzarev, Jiawei Liu, Jithun Nair, Joao Gomes, Joe Early, Julien RIPOCHE, julienripoche, Kai Zhang, kingyiusuen, Loi Ly, Matti Picus, Meghan Lele, Muhammed Abdullah, Nicolas Hug, Nikita Shulga, ORippler, peterbell10, Philip Meier, Prabhat Roy, puhuk, Rajat Jaiswal, S Harish, Sahil Goyal, Samuel Gabriel, Santiago Castro, Saswat Das, Sepehr Sameni, Shengwei An, Shrill Shrestha, Shruti Pulstya, Sugato Ray, tanvimoharir, Vasilis Vryniotis, Vassilis C. Nicodemou, Vassilis Nicodemou, vfdev-5, Vincent Moens, Vivek Kumar, Yi Zhang, Yiwen Song, Yonghye Kwon, Yuchen Huang, Zhengxu Chen, Zhiqiang Wang, Zhongkai Zhu, zzk1st