meta-pytorch/torchcodec v0.11.0 on GitHub

TorchCodec 0.11 is out! This release brings CUDA decoding improvements and improved HDR metadata and rotation support in VideoDecoder, as well as output fps support!

CUDA decoder performance improvements

We made significant improvements to the CUDA decoder throughput, available via the “beta” backend:

#1232 fixed a bug in the decoder cache, allowing more than one decoder instance to be cached per video configuration. The fix doesn’t affect single-threaded pipelines, but drastically improves the throughput of multi-threaded decoding.
#1227 improved cache performance further.
#1243 added an LRU eviction policy for the cache, which will improve the cache hit when decoding lots of different video configurations.
#1246 added the set_nvdec_cache_capacity() which allows the user to control the cache size. Larger cache sizes are typically more performant, and more memory consuming.

These are available via the “beta” backend”.

⚠️ Note that in the next release, the “beta” backend will become the default backend. This will be a transparent and backward-compatible change. Users who want to stay on the less efficient FFmpeg backend should use:

with set_cuda_backend("ffmpeg"):
    decoder = VideoDecoder(..., device="cuda")

Read more about this in the CUDA utilities section!

FPS Resampling

get_frames_played_in_range() now accepts a fps parameter to resample video at a target frame rate, duplicating or dropping frames as necessary to match the desired output FPS:

decoder = VideoDecoder(path)
# If a source video is 25 fps, a 1-second range will contain 25 frames.
# We can use the fps argument to resample to 5 fps, which gives us 5 frames:
frames_5fps = decoder.get_frames_played_in_range(start_seconds=1, stop_seconds=2, fps=5)

Read the VideoDecoder docs for more details!

(#1148)

Rotation Support

TorchCodec now automatically applies rotation metadata during video decoding on CPU and Beta Cuda backend.

decoder = VideoDecoder(path)
print(decoder.metadata.rotation)  # e.g. 90.0, or None

⚠️ Note that this is a BC-breaking change since we consider it a bug fix. Read more about this in the VideoStreamMetadata docs!

(#1173, #1235)

HDR & Color Metadata

Video Decoder metadata now exposes color-related metadata and pixel format, making it easy to identify HDR content:

metadata = VideoDecoder(path).metadata
print(metadata.color_primaries)     # e.g. "bt2020"
print(metadata.color_space)         # e.g. "bt2020nc"
print(metadata.color_transfer)      # e.g. "smpte2084"
print(metadata.pixel_format)        # e.g. "yuv420p10le"

Read more about these fields in the VideoStreamMetadata docs!

(#1271, #1261, #1267)

Installation enhancements

On Linux, pip install torchcodec now defaults to the CUDA 13.0 wheel to match the behavior of pip install torch. See updated instructions in our README.
Additionally, we have added aarch64 CUDA wheels to PyPI!

Bug Fixes

Fixed audio decoding issue when decoding audio with more than 8 channels. (#1166)
Fixed MKV decoding being up to 42x slower than MP4 in approximate seek mode in some situations. (#1259)
Fixed frame indexing for videos with non-zero start times in approximate seek mode. (#1209)
Fixed time-based samplers to use float64 instead of float32 to avoid precision errors. (#1294)
Improved BT.709 full-range CUDA color conversion on CUDA 12. (#1265)
Improved BT.601 CUDA color conversion accuracy. (#1270)
Fixed AudioDecoder thread oversubscription with certain codecs (FLAC, TAK, wavpack). (#1254)
Fix SwScale error with non-32 aligned input. (#1295)

meta-pytorch/torchcodec v0.11.0 TorchCodec 0.11 on GitHub