NVIDIA/DALI v1.3.0 on GitHub

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

New operator:
- Salt and Pepper noise (noise.salt_and_pepper) for CPU and GPU (#2889, #2934, #2956, and #2976).
Added experimental support for inputs via external_source in TensorFlow DALIDataset (#2949, #2993, and #2997).
Numpy reader improvements:
- ROI reading for CPU (#3011).
- intra-sample threading on GPU (#3010).
Improved CPU color_space_conversion operator performance (#2987).
Improved brightness and contrast operators performance (#2981).
Added a C API call to check backend of an operator (#3031 and #3050).
Documentation improvements (#2936, #2960, #2979, #2972, #3013, and #3035).

Fixed issues

This DALI release includes the following fixes:

Fixed an issue in readers.nemo_asr that caused a system error due to keeping too many open files (#3003).
Fixed a bug that caused out of bound memory access in mel_filter_bank (#2986).
Fixed a cudaErrorLaunchOutOfResources error that appeared in transpose operator on some GPUs (#2971).
Fixed handling of non-existing entries in readers.tfrecord (#2952).

Improvements

Rework numpy reader tests (#3036)
Extend HW decoder bench tool (#3043)
Remove space from file name (#3038)
Add experimental input support to TF DALIDataset (#2997)
Use BrightnessContrast as implementation of Brightness and Contrast ops (#2981)
Add C API call to check backend of an operator (#3031)
Fix Video reader documentation (#3035)
Enable DALI to build for CUDA 10.2 (#3007)
NumpyReader: Add support for ROI (#3016)
Add git hooks (#3023)
Update third party (#3009)
Add channel count checking in Dump Image (#3020)
Add parallel chunking support in GPU variant of the numpy reader operator (#3010)
NumpyReader to use HostWorkspace (#3011)
Update documentation of random.uniform to reflect data type conversion behavior (#3013)
Adjust tf code for experimental Dataset with inputs (#2993)
Add best-fit free tree. (#2996)
Refine torch audio pipeline tests: adding frame splicing, fix sequence length calculation, reflect pad start/end of the signal (#2992)
Rename free_tree to coalescing_free_tree. (#2995)
Use thread_pool in ColorSpaceConversion (#2987)
Move to CUDA 11.3 update 1 (#2990)
pool_resource: upstream lock & refactoring (#2988)
Add tests to cover OGG Vorbis, and FLAC audio formats (#2980)
Add synchronization and deferred deallocation to pool_resource (#2983)
Update FFmpeg, fix video container tests (#2918)
Add Preemphasis border policy (#2984)
Numba function operator, docs update (#2972)
Add a link to the DALI roadmap in the main readme and the documentation (#2979)
Add BOOL_SWITCH (#2974)
Add libopus to the binaries distributed with the wheel (#2969)
Add SaltAndPepper GPU operator (#2956)
Update documenation about supported TensorFlow versions by DALI (#2960)
Guard changes to default resources with a mutex. (#2955)
Add Salt and Pepper noise CPU operator (#2889)
Core allocation functions - improve alignment handling (#2947)
Add portable FP16 type & tests. (#2941)
RNGBase: Separate noise generation and application steps (#2934)
Add information about Open-CE effort that provides DALI (#2936)

Bug fixes

Remove mixed image decoder from GetBackendTest (#3050)
Fix pip download folder usage (#3028)
Avoid pre-commit hook for merge commits (#3032)
Coverity issue fixes. (#3021)
Add more connection attempts in setup_packages.py and increase the timeout to 100s (#3024)
Add 60s timeout for URL request in setup_packages.py (#3018)
Check CUDA API return values in device-side test helper. (#3017)
Run baseline pipelines on separate devices (#3012)
Multi paste refactor & fix (#3008)
Remove outdated warning about not supported ROI HW decoding (#2998)
NemoAsrLoader: Close file handles after reading metadata (#3003)
Improve Element Extract Op (#3004)
Temporarily disable test due to incompatible free list. (#3001)
Work around large alignas bug - align manually. (#3000)
Lifts the sm limitation that is tested in the numpy reader test (#2994)
MultiPaste: Fix in_ids argument type in the schema (#2965)
Fix a buffer overrun when the trailing dimension is collapsed. (#2986)
Add missing #include (#2985)
Enable SaltAndPepper GPU variable batch size tests (#2976)
Add missing tests to test_dali_variable_batch_size.py (#2982)
Change all reference to the master branch in the documentation (#2977)
Add missing tests to test_dali_cpu_only.py (#2964)
Add launch bounds to TransposeBatch kernel to avoid cudaErrorLaunchOutOfResources (#2971)
Fix deps docker with custom DALI_deps SHA (#2970)
Add coverage test for CPU only and variable batch size test (#2962)
Enable variable batch size tests (#2957)
Fix returning memory to upstream from pool resource #2961
Fix handling of non_existing entries in TFRecord reader (#2952)
Enable pool to return memory to the upstream upon Out-of-Memory. (#2951)
Fix mixed indent in tf.py (#2949)
Fix bug in default constructed curand_uniform_dist (#2946)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==1.3.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==1.3.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.3.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.3.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

https://developer.download.nvidia.com/compute/redist/nvidia-dali/libsndfile-1.0.31.tar.gz

NVIDIA/DALI v1.3.0 DALI v1.3.0 on GitHub