NVIDIA/DALI v0.28.0 on GitHub

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

New operators:
- Affine transform generators, which are operators that generate scale, rotate, shear, translate, crop transform matrices (#2309).
  - You can use the transforms.Combine operator to combine these matrices (#2317).
  - These transformations can be applied to data by using the CoordTransform operator.
- Added min, max, and clamp arithmetic operators (#2298).
- Cat and Stack Operators to concatenate and stack Tensors for the CPU and the GPU (#2301, #2339, #2350).
- The following reductions for the CPU and the GPU (#2342, #2379 #2395):
  - Min, Max, Sum, Mean, MeanSquare, RootMeanSquare, Std, Variance
- The MFCC operator for the GPU (#2423).
- The SelectMasks operator (#2381).
- Add operators for batch reordering:
  - BatchPermutation for generating random reordering of the batch.
  - PermuteBatch, which reorders tensors in a batch, based on a list of provided indices (#2417).
- Operator Compose: PyTorch-style API to compose the operators (#2393).
Improvements in existing operators:
- Added SeekFrames to the audio decoder. The redesign allows you to decide the decoded data type at runtime (#2334).
- Added the ability to handle UTF8 text to the NemoAsrReader (#2358).
- Added explicit file list support to the FileReader (#2389).
- Improvements in the COCO reader API (#2406).
  - The COCOReader API now outputs relative mask polygon coordinates when the option ratio is set to True (#2375).
- RandomBBoxCrop now optionally outputs the indices of the bounding boxes that passed the centroid filter (#2374).
The late initialization of torch_gpu_device in the Pytorch plugin (#2411).
The automatic constant-to-input promotion (#2361) and generalized handling of operator arguments (#2393).
Added a MNIST example for DALI and PyTorch Lightning (#2360).
Added the last_batch_policy to the framework iterator (#2269).
New builds:
- Python 3.9 is now enabled (#2333).
- The DALI wheels for CUDA 11 are built with CUDA 11.1 and use Enhanced Compatibility to work with CUDA 11.0 (#2302, #2356, #2367, and #2413).
- Added support for the SM_86 architecture (#2364).
- Added the ability to cross-build Python wheels for Jetson (#2313).

Bug fixes

Fix error when VideoReader is prematurely terminated (#2336)
Fix failure in affine transforms tests (#2337)
Fix the problem of output outliving the pipeline in python (#2341)
Fix lack of proper layout setting in the VideoReader (#2346)
Fix uniform generator operator (#2352)
Bugfixes: Default nfft value and to_snake_case implementation (#2353)
Fixes problems in the weekly build (#2372)
Fix a problem with reference to "incomplete" type (error in Clang/CUDA). (#2377)
Fix how DALI handles StopIteration from the ExternalSource (#2373)
Fix TL1_nodeps_build and TL0_cpu_only (#2391)
Fix CPU only mode for arithm operators (#2400)
Preserve shape of psuedoscalars in arithmetic ops. (#2359)

Improvements

Add affine transform generators: TransformScale, TransformRotation, TransformShear, TransformCrop (#2309)
Change code/docs language to be more inclusive (#2322)
Update nvidia-tensorflow test package to 20.9 and bump tensorflow-gpu minor versions (#2320)
Update example usage of DALIClassificationIterator in docs strings (#2306)
Reduce video reader memory consumption (#2308)
TensorJoin kernel for CPU (#2301)
Enable automatic python modules for operator (#2329)
Split GaussianBlur Python test (#2332)
Add CombineTransforms operator (#2317)
Append TensorListShapes (#2291)
Enable CUDA 11.1 builds (#2302)
Add min, max and clamp arithmetic ops (#2298)
Update TensorFlow plugin documentation (#2328)
Remove Python 3.5 support, enable Python 3.9 (#2333)
Enable nvJPEG2k build for CUDA 11.1 (#2343)
Add BUILD_DALI_NODEPS to allow building dali_core and dali_kernels without extra third party libraries present in the system (#2321)
Add SeekFrames to audio decoder. Redesign to allow deciding decoded data type at runtime. (#2334)
Add discrete mode to Uniform operator (#2340)
Test for utility CMake function (find_dali) (#2325)
Propagate new build options to other build utilities (#2349)
Add support for N-dim tensors to OneHot (#2345)
Adds a separate option to preallocate nvjPEG2k memory (#2347)
Tensor join GPU (#2339)
Reductions: min, max (#2342)
Tensor concatenation and stacking (#2350)
Use inverse (source-to-destination) matrix in WarpAffine operator (#2338)
Disable more dependencies for nodeps build (#2355)
Update DALI trademark information (#2351)
Reduce GPU memory fraction in TF tests to 0.5. (#2357)
Automatic constant-to-input promotion. (#2361)
Add support for SM_86 architecture (#2364)
Use current class next implementation in init, to avoid special handling of first batch in child classes (#2363)
Add ability to cross-build Python wheels for Jetson (#2313)
Add NemoAsrReader handling of UTF8 text (#2358)
Enable CUDA 11 compatibility mode (#2356)
Add MNIST example for DALI and PyTorch Lightning (#2360)
Add last_batch_policy to the framework iterator (#2269)
COCOReader to output relative mask polygon coordinates when the option ratio is set to True (#2375)
RandomBBoxCrop to optionally output the indices of the bounding boxes that passed the centroid filter (#2374)
Enable compatibility layer in tests for CUDA 11 (#2367)
Reduce Sum Op (#2379)
Install DALI license, copyright and acknowledgments explicitly (#2392)
Add layout support to OneHot operator (#2388)
Generalized handling of operator arguments + operator Compose. (#2393)
GPU DCT kernel (#2398)
Bump up Nvidia TF version to 20.10 (#2397)
More reductions (#2395)
Late initialization of torch_gpu_device in pytorch plugin (#2411)
Add a link to CUDA Enhanced Compatibility Across Minor Releases guide (#2410)
Add explicit file list support to FileReader. (#2389)
Add TransformTranslation deprecation placeholder Op (#2412)
Bump up the CuPy to one that supports CUDA 11.0 (#2413)
Add a missing include in filesystem.cc (#2414)
Add a warning about the Python function incompatibility with TensorFlow (#2415)
Improvements in COCO reader API (#2406)
Add operators for batch reordering (#2417)
Add SelectMasks operator (#2381)
GPU MFCC operator. (#2423)
Make base image for dockers customizable at the build time (#2427)

Breaking API changes

Python 3.5 is no longer supported by the official DALI wheels.

Deprecated feature

Known issues:

The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.28.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). Using the latest driver may enable additional functionality. More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.28.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.28.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

https://developer.download.nvidia.com/compute/redist/nvidia-dali/libsndfile-1.0.28.tar.gz

NVIDIA/DALI v0.28.0 DALI v0.28.0 on GitHub