Highlights
Encoding / Decoding images
Torchvision is extending its encoding/decoding capabilities. For this version, we added a GIF decoder which is available as torchvision.io.decode_gif(raw_tensor)
, torchvision.io.decode_image(raw_tensor)
, and torchvision.io.read_image(path_to_image)
.
We also added support for jpeg GPU encoding in torchvision.io.encode_jpeg()
. This is 10X faster than the existing CPU jpeg encoder.
Stay tuned for more improvements coming in the next versions. We plan to improve jpeg GPU decoding, and add more image decoders (webp in particular).
Resizing according to the longest edge of an image
It is now possible to resize images by setting torchvision.transforms.v2.Resize(max_size=N)
: this will resize the longest edge of the image exactly to max_size
, making sure the image dimension don't exceed this value. Read more on the docs!
Detailed changes
Bug Fixes
[datasets] SBDataset
: Only download noval file when image_set='train_noval' (#8475)
[datasets] Update the download url in class EMNIST
(#8350)
[io] Fix compilation error when there is no libjpeg
(#8342)
[reference scripts] Fix use of cutmix_alpha
in classification training references (#8448)
[utils] Allow K=1
in draw_keypoints
(#8439)
New Features
[io] Add decoder for GIF images (decode_gif()
, decode_image()
,read_image()
) (#8406, #8419)
[transforms] Add GaussianNoise
transform (#8381)
Improvements
[transforms] Allow v2 Resize
to resize longer edge exactly to max_size
(#8459)
[transforms] Add min_area
parameter to SanitizeBoundingBox
(#7735)
[transforms] Make adjust_hue()
work with numpy 2.0
(#8463)
[transforms] Enable one-hot-encoded labels in MixUp
and CutMix
(#8427)
[transforms] Create kernel on-device for transforms.functional.gaussian_blur
(#8426)
[io] Adding GPU acceleration to encode_jpeg
(10X faster than CPU encoder) (#8391)
[io] read_video
: accept BytesIO
objects on pyav
backend (#8442)
[io] Add compatibility with FFMPEG 7.0 (#8408)
[datasets] Add extra to install gdown
(#8430)
[datasets] Support encoded RLE
format in for COCO
segmentations (#8387)
[datasets] Added binary cat vs dog classification target type to Oxford pet dataset (#8388)
[datasets] Return labels for FER2013
if possible (#8452)
[ops] Force use of torch.compile
on deterministic roi_align
implementation (#8436)
[utils] add float support to utils.draw_bounding_boxes()
(#8328)
[feature_extraction] Add concrete_args to feature extraction tracing. (#8393)
[Docs] Various documentation improvements (#8429, #8467, #8469, #8332, #8262, #8341, #8392, #8386, #8385, #8411).
[Tests] Various testing improvements (#8454, #8418, #8480, #8455)
[Code quality] Various code quality improvements (#8404, #8402, #8345, #8335, #8481, #8334, #8384, #8451, #8470, #8413, #8414, #8416, #8412)
Contributors
We're grateful for our community, which helps us improve torchvision by submitting issues and PRs, and providing feedback and suggestions. The following persons have contributed patches for this release:
Adam J. Stewart ahmadsharif1, AJS Payne, Andrew Lingg, Andrey Talman, Anner, Antoine Broyelle, cdzhan, deekay42, drhead, Edward Z. Yang, Emin Orhan, Fangjun Kuang, G, haarisr, Huy Do, Jack Newsom, JavaZero, Mahdi Lamb, Mantas, Nicolas Hug, Nicolas Hug , nihui, Richard Barnes , Richard Zou, Richie Bendall, Robert-André Mauchin, Ross Wightman, Siddarth Ijju, vfdev