This release brings several new additions to torchvision that improves support for deployment. Most notably, all models in torchvision are torchscript-compatible, and can be exported to ONNX. Additionally, a few classification models have quantized weights.
Note: this is the last version of torchvision that officially supports Python 2.
Breaking changes
Updated KeypointRCNN pre-trained weights
The pre-trained weights for keypointrcnn_resnet50_fpn have been updated and now correspond to the results reported in the documentation. The previous weights corresponded to an intermediate training checkpoint. (#1609)
Corrected the implementation for MNASNet
The previous implementation contained a bug which affects all MNASNet variants other than mnasnet1_0. The bug was that the first few layers needed to also be scaled in terms of width multiplier, along with all the rest. We now provide a new checkpoint for mnasnet0_5, which gives 32.17 top1 error. (#1224)
Highlights
TorchScript support for all models
All models in torchvision have native support for torchscript, for both training and testing. This includes complex models such as DeepLabV3, Mask R-CNN and Keypoint R-CNN.
Using torchscript with torchvision models is easy:
# get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
# convert to torchscript
model_script = torch.jit.script(model)
model_script.eval()
# compute predictions
predictions = model_script([torch.rand(3, 300, 300)])
Warning: the return type for the scripted version of Faster R-CNN, Mask R-CNN and Keypoint R-CNN is different from its eager counterpart, and it always returns a tuple of losses, detections. This discrepancy will be addressed in a future release.
ONNX
All models in torchvision can now be exported to ONNX for deployment. This includes models such as Mask R-CNN.
# get a pre-trained model
model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)
model.eval()
inputs = [torch.rand(3, 300, 300)]
predictions = model(inputs)
# convert to ONNX
torch.onnx.export(model, inputs, "model.onnx",
do_constant_folding=True,
opset_version=11 # opset_version 11 required for Mask R-CNN
)
Warning: for Faster R-CNN / Mask R-CNN / Keypoint R-CNN, the current exported model is dependent on the input shape during export. As such, make sure that once the model has been exported to ONNX that all images that are fed to it have the same shape as the shape used to export the model to ONNX. This behavior will be made more general in a future release.
Quantized models
torchvision now provides quantized models for ResNet, ResNext, MobileNetV2, GoogleNet, InceptionV3 and ShuffleNetV2, as well as reference scripts for quantizing your own model in references/classification/train_quantization.py (https://github.com/pytorch/vision/blob/master/references/classification/train_quantization.py). Obtaining a pre-trained quantized model can be obtained with a few lines of code:
model = torchvision.models.quantization.mobilenet_v2(pretrained=True, quantize=True)
model.eval()
# run the model with quantized inputs and weights
out = model(torch.rand(1, 3, 224, 224))
We provide pre-trained quantized weights for the following models:
Model | Acc@1 | Acc@5 |
---|---|---|
MobileNet V2 | 71.658 | 90.150 |
ShuffleNet V2: | 68.360 | 87.582 |
ResNet 18 | 69.494 | 88.882 |
ResNet 50 | 75.920 | 92.814 |
ResNext 101 32x8d | 78.986 | 94.480 |
Inception V3 | 77.084 | 93.398 |
GoogleNet | 69.826 | 89.404 |
Torchscript support for torchvision.ops
torchvision ops are now natively supported by torchscript. This includes operators such as nms, roi_align and roi_pool, and for the ops that support backpropagation, both eager and torchscript modes are supported in autograd.
New operators
Deformable Convolution (#1586) (#1660) (#1637)
As described in Deformable Convolutional Networks (https://arxiv.org/abs/1703.06211), torchvision now supports deformable convolutions. The model expects as input both the input as well as the offsets, and can be used as follows:
from torchvision import ops
module = ops.DeformConv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)
x = torch.rand(1, 1, 10, 10)
# number of channels for offset should be a multiple
# of 2 * module.weight.size[2] * module.weight.size[3], which correspond
# to the kernel_size
offset = torch.rand(1, 2 * 3 * 3, 10, 10)
# the output requires both the input and the offsets
out = module(x, offset)
If needed, the user can create their own wrapper module that imposes constraints on the offset. Here is an example, using a single convolution layer to compute the offset:
class BasicDeformConv2d(nn.Module):
def __init__(self, in_channels, out_channels, kernel_size=1, stride=1,
dilation=1, groups=1, offset_groups=1):
super().__init__()
offset_channels = 2 * kernel_size * kernel_size
self.conv2d_offset = nn.Conv2d(
in_channels,
offset_channels * offset_groups,
kernel_size=3,
stride=stride,
padding=dilation,
dilation=dilation,
)
self.conv2d = ops.DeformConv2d(
in_channels,
out_channels,
kernel_size=kernel_size,
stride=stride,
padding=dilation,
dilation=dilation,
groups=groups,
bias=False
)
def forward(self, x):
offset = self.conv2d_offset(x)
return self.conv2d(x, offset)
Position-sensitive RoI Pool / Align (#1410)
Position-Sensitive Region of Interest (RoI) Align operator mentioned in Light-Head R-CNN (https://arxiv.org/abs/1711.07264). These are available under ops.ps_roi_align, ps_roi_pool and the module equivalents ops.PSRoIAlign and ops.PSRoIPool, and have the same interface as RoIAlign / RoIPool.
New Features
TorchScript support
- Bugfix in BalancedPositiveNegativeSampler introduced during torchscript support (#1670)
- Make R-CNN models less verbose in script mode (#1671)
- Minor torchscript fixes for Mask R-CNN (#1639)
- remove BC-breaking changes (#1560)
- Make maskrcnn scriptable (#1407)
- Add Script Support for Video Resnet Models (#1393)
- fix ASPPPooling (#1575)
- Test that torchhub models are scriptable (#1242)
- Make Googlnet & InceptionNet scriptable (#1349)
- Make fcn_resnet Scriptable (#1352)
- Make Densenet Scriptable (#1342)
- make resnext scriptable (#1343)
- make shufflenet and resnet scriptable (#1270)
ONNX
- Enable KeypointRCNN test (#1673)
- enable mask rcnn test (#1613)
- Changes to Enable KeypointRCNN ONNX Export (#1593)
- Disable Profiling in Failing Test (#1585)
- Enable ONNX Test for FasterRcnn (#1555)
- Support Exporting Mask Rcnn to ONNX (#1461)
- Lahaidar/export faster rcnn (#1401)
- Support Exporting RPN to ONNX (#1329)
- Support Exporting MultiScaleRoiAlign to ONNX (#1324)
- Support Exporting GeneralizedRCNNTransform to ONNX (#1325)
Quantization
- Update quantized shufflenet weights (#1715)
- Add commands to run quantized model with pretrained weights (#1547)
- Quantizable googlenet, inceptionv3 and shufflenetv2 models (#1503)
- Quantizable resnet and mobilenet models (#1471)
- Remove model download from test_quantized_models (#1526)
Improvements
Bugfixes
- Bugfix on GroupedBatchSampler for corner case where there are not enough examples in a category to form a batch (#1677)
- Fix rpn memory leak and dataType errors. (#1657)
- Fix torchvision install due to zippeg egg (#1536)
Transforms
- Make shear operation area preserving (#1529)
- PILLOW_VERSION deprecation updates (#1501)
- Adds optional fill colour to rotate (#1280)
Ops
- Add Deformable Convolution operation. (#1586) (#1660) (#1637)
- Fix inconsistent NMS implementation between CPU and CUDA (#1556)
- Speed up nms_cuda (#1704)
- Implementation for Position-sensitive ROI Pool/Align (#1410)
- Remove cpp extensions in favor of torch ops (#1348)
- Make custom ops differentiable (#1314)
- Fix Windows build in Torchvision Custom op Registration (#1320)
- Revert "Register Torchvision Ops as Cutom Ops (#1267)" (#1316)
- Register Torchvision Ops as Cutom Ops (#1267)
- Use Tensor.data_ptr instead of .data (#1262)
- Fix header includes for cpu (#1644)
Datasets
- fixed test for windows by closing the created temporary files (#1662)
- VideoClips windows fixes (#1661)
- Fix VOC on Windows (#1641)
- update dead LSUN link (#1626)
- DatasetFolder should follow links when searching for data (#1580)
- add .tgz support to extract_archive (#1650)
- expose audio_channels as a parameter to kinetics dataset (#1559)
- Implemented integrity check (md5 hash) after dataset download (#1456)
- Move VideoClips dummy dataset to top level for pickling (#1649)
- Remove download for ImageNet (#1457)
- add tar.xz archive handler (#1361)
- Fix DeprecationWarning for collections.Iterable import in LSUN (#1417)
- Support empty target_type for CelebA dataset (#1351)
- VOC2007 support test set (#1340)
- Fix EMNSIT download URL (#1297) (#1318)
- Refactored clip_sampler (#1562)
Documentation
- Fix documentation for NMS (#1614)
- More examples of functional transforms (#1402)
- Fixed doc of crop functionals (#1388)
- Added Training Sample code for fasterrcnn_resnet50_fpn (#1695)
- Fix rpn.py typo (#1276)
- Update README with minimum required version of PyTorch (#1272)
- fix alignment of README (#1396)
- fixed typo in DatasetFolder and ImageFolder (#1284)
Models
Utils
- Adding File object option to utils.save_image (#1301)
- Fix make_grid: support any number of channels in tensor (#1300)
- Fix bug of changing input tensor in utils.save_image (#1244)
Reference scripts
- add a README for training object detection models (#1612)
- Adding args for names of train and val directories (#1544)
- Fix broken bitwise operation in Similarity Reference loss (#1604)
- Fixing issue #1530 by starting ann_id to 1 in convert_to_coco_api (#1531)
- Add commands for model training (#1203)
- adding documentation for automatic mixed precision training (#1533)
- Fix reference training script for Mask R-CNN for PyTorch 1.2 (during evaluation after epoch, mask datatype became bool, pycocotools expects uint8) (#1413)
- fix a little bug about resume (#1628)
- Better explain lr and batch size in references/detection/train.py (#1233)
- update default parameters in references/detection (#1611)
- Removed code redundancy/refactored inn video_classification (#1549)
- Fix comment in default arguments in references/detection (#1243)
Tests
- Correctness test implemented with old test architecture (#1511)
- Simplify and organize test_ops. (#1551)
- Replace asserts with assertEqual (#1488)(#1499)(#1497)(#1496)(#1498)(#1494)(#1487)(#1495)
- Add expected result tests (#1377)
- Add TorchHub tests to torchvision (#1319)
- Scriptability checks for Tensor Transforms (#1690)
- Add tests for results in script vs eager mode (#1430)
- Test for checking non mutating behaviour of tensor transforms (#1656)
- Disable download tests for Python2 (#1269)
- Fix randomresized params flaky (#1282)
CI
- Disable C++ models from being compiled without explicit request (#1535)
- Fix discrepancy in regenerate.py (#1583)
- soumith -> pytorch for docker images (#1577)
- [wip] try vs2019 toolchain (#1509)
- Make CI use PyTorch nightly (#1492)
- Try enabling Windows CUDA CI (#1486)
- Fix CUDA builds on Windows (#1485)
- Try fix Windows CircleCI (#1433)
- Fix CUDA CI (#1464)
- Change approach for rebase to master (#1427)
- Temporary fix for CI (#1411)
- Use PyTorch 1.3 for CI (#1467)
- Use links from S3 to install CUDA (#1472)
- Enable CUDA 9.2 builds for Windows (#1381)
- Fix nightly builds (#1374)
- Fix Windows CI after #1301 (#1368)
- Retry
anaconda login
for Windows builds (#1366) - Fix nightly wheels builds for Windows (#1358)
- Fix CI for py2.7 cu100 wheels (#1354)
- Fix Windows CI (#1347)
- Windows build scripts (#1241)
- Make CircleCI checkout merge commit (#1344)
- use native python code generation logic (#1321)
- Add CircleCI (v2) (#1298)