NVIDIA/DALI v1.14.0 on GitHub

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

Added HEVC support to the CPU frames decoder (#3885).
Added the CPU audio resampling operator (#3840).
Added support for video processing and per-frame (temporal) arguments to the rotate operator (#3820).
Added support for variable batch size in the debug mode (#3799).
Performance optimizations:
- Optimized tiled transposition algorithm on small data types (#3730).
- Improved CropMirrorNormalize operator performance (#3771).

Fixed Issues

Fixed the compatibility with TensorFlow 2.9 by adding type propagation to DALIDataset (#3875).
Added a missing check when the number of files and labels match in the experimental video reader (#3903).
Added a missing check when the number of samples is greater or equal to the number of shards in readers (#3856).
Fixed scalars handling in the GPU cast operator (#3924).

Improvements

Add support for TensorFlow 2.9. (#3909)
Remove deprecated usage of numpy types int and long (#3898)
Add output_dtype and output_ndim arguments to Pipeline constructor (#3877)
Add hevc support cpu frames decoder (#3885)
Add a C API call to get the max batch size (#3890)
Add bool to Pad supported types (#3895)
Adjust eps in test comparing readers (#3892)
Fix coverity issues. Do not re-throw worker thread error in the destructor. (#3886)
Fix memory leak in C API test (#3889)
Add tutorials references to ops docs - general section (#3869)
Refactor video tests (#3864)
Add NonsilentRegion GPU, implemented in terms of the CPU version (#3874)
Add a check of the decoding progress in the VideoReader (#3858)
Reduce libaviutils log verbosity to errors and above (#3871)
Extend C Api to fetch the layout and ndim from External Source (#3862)
Updated PyTorch-Lightning example with new strategy keyword for Trainer. (#3867)
Update clang version to 14.02 (#3863)
Improve cast operator performance (#3783)
Update CUTLASS to v2.9.0 (#3860)
Change the way how CUDA pub key is installed (#3866)
Audio resampling operator for CPU backend (#3840)
Dependencies update (#3831)
Optimization of tiled transposition algorithm on small data types (#3730)
Improve CropMirrorNormalize operator performance (#3771)
Fix typo (model -> module) (#3848)
Add a check against changing layout in ES (#3839)
Add cpu only and variable batch size tests to per-frame operator (#3850)
Missing f prefix on f-strings fix #3847
Fix handling of arguments with trailing newlines when generating operator docs (#3841)
Add support for sequence processing to rotate (#3820)
Fix TF DALIDataset tests that changed layout between iterations (#3836)
Add ndim argument to the external source operator (#3755)
Add operators cross-referencing to data loading index (#3823)
Features required for autoserialization in DALI Backend (#3795)
Remove gtest RandomBBoxCropTest tests (#3822)
Update user documentation footer copyright date (#3819)
Add operator cross-referencing to custom operators tutorials (#3818)
Fix the default value of resize min_filter in the documentation (#3816)
Benchmark for Transpose operator (#3785)
Add operator cross-referencing to data loading section (#3809)
Update [shields.io](http://shields.io/) badges in README.rst. (#3815)
Add operator cross-referencing to audio processing tutorials (#3806)
Add operator cross-referencing to video processing tutorials (#3808)
Add support for variable batch size and NVTX ranges in debug mode (#3799)
Shutdown() a WorkerThread in the destructor (#3810)
Improve the redirect (#3801)

Bug Fixes

Add tests for operator cast. Revert to plain batched cast kernel until the optimized one is fixed. (#3927)
Fix scalar handling in GPU cast. (#3924)
Adds check to the experimental video reader if the number of files and labels match (#3903)
Add type propagation implementation introduced in TF 2.8 (#3875)
Fix corruption: Change bool to int when querying pointer attributes. (#3873)
Make libtar and libsnd root paths customizable. (#3872)
Add check if the number of samples is greater or equal to the number of shards in readers (#3856)
Fix transposition kernel tests (#3859)
Fix default argument handling in cuda_vm_resource constructor (#3857)
Fixes test_coverage case in test_dali_cpu_only.py and test_dali_variable_batch_size.py (#3849)
Fix rotate assertion warning (#3852)
Make failure in curl to fail Dockerfile.build.aarch64-linux image build (#3821)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync.
Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases.
The DALI TensorFlow plug-in might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have the prebuilt plug-in binary that is shipped with DALI, ensure that the compiler that is used to build TensorFlow exists on the system during the plug-in installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows the best performance when running in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.14.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.14.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.14.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.14.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

https://developer.download.nvidia.com/compute/redist/nvidia-dali/libsndfile-1.1.0.tar.gz

NVIDIA/DALI v1.14.0 DALI v1.14.0 on GitHub