github NVIDIA/DALI v1.4.0
DALI v1.4.0

latest releases: v1.43.0-dev, v1.41.0, v1.42.0-dev...
3 years ago

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

  • readers.numpy improvements:
    • Added ROI support in the GPU operator (#3034 and #3040).
    • Parallelized reading in the CPU operator (#3077).
    • Added a tutorial (#3095 and #3139).
  • DALI Dataset improvements:
  • Video reader improvements:
    • Added an option to pad missing frames at the end of sequence (#3002).
    • Added support for the VP8 and MJPEG formats (#3045).
  • Added CPU parallelization to the Slice and SliceFlipNormalizePermutePad kernels. (#3062, #3068, and #3080)
  • Added an option to readers.nemo_asr to return indices of the entries in the manifest (#3085).
  • Improved the performance in the GPU image decoder by optimizing the memory allocations. (#3067).

Fixed issues

This DALI release includes the following fixes:

  • Fixed a crash that happened when a functools.partial result was passed as a source to external_source (#3143).
  • Fixed the hardware image decoder to fall back to the hybrid implementation for unsupported file formats instead of throwing an error (#3086).

Improvements

  • Add NumpyReader tutorial to the rendered documentation page (#3139)
  • Update docs analytics tracking (#3135)
  • VM async_pool - refactoring & tests (#3117)
  • Extend the video loader error message for vfr videos on how to disable the check in case of false positives (#3125)
  • Integer literal suffixes (#3122)
  • SliceCPU kernel to run plain memcpy when applicable (#3110)
  • CUDA VM memory resource (#3114)
  • Add Numpy Reader Tutorial (#3095)
  • Bump TensorFlow version in tests (#3107)
  • Efficient det code drop (#3115)
  • Move to CUDA 11.4 build (#3109)
  • Add batch support to DALI Dataset (#3089)
  • Update third party dependencies (#3093)
  • Add bitmask::append. (#3101)
  • Free list API cleanup. (#3100)
  • NemoAsrReader to optionally return indices of the entries in the manifest. (#3085)
  • Paralellize reading in NumpyReader CPU (#3077)
  • Bit mask utility (#3083)
  • Add ExecutionEngine to SliceFlipNormalizePermutePad CPU kernel, to allow parallel execution (#3080)
  • Add an ability to pad missing frames in the Video reader sequence (#3002)
  • Rework the TF DALIDataset input API (#3063)
  • Add ExecutionEngine to Slice CPU kernel, to allow parallel execution (#3068)
  • Use HW NVJPEG decoder memory pool even if size hint is not set (#3067)
  • CUDA Virtual Memory API wrappers. (#3064)
  • Add information about installing CUDA 10.2 DALI version (#3066)
  • Add image decoder memory hints for nvJPEG in DALI examples (#3029)
  • Add split shape utility (#3062)
  • Add ROI support to NumpyReader GPU (#3034)
  • Enable no_copy mode handling in TF DALI Dataset (#3058)
  • Add support for VP8 and MJPEG videos (#3045)
  • Make pytorch lightning example work with multiple GPUs (#3037)
  • Add override flags for no_copy option of External Source (#3041)
  • Add NumpyFileWrapper to numpy loader (#3054)
  • Add a mention of CPU-only arguments inputs in docs (#3039)
  • Minor changes in Slice GPU kernels, before reusing them in NumpyReader GPU (#3040)

Bug fixes

  • Fix hint handling: (#3145)
  • Add support for functools.partial in ExternalSource. (#3143)
  • Install libcufile (for GDS) as a part of the cuda base build step (#3142)
  • Add check of strerror_r return value in CUFile HandleIOError (#3141)
  • Disable VMAsyncPool CrossStream test on incompatible platforms. (#3140)
  • Fix the lack of execution of variable batch size test (#3134)
  • Throw std::bad_alloc when ordinary host memory runs out + tests for xxx_malloc resources. (#3131)
  • Fix allocation hint handling in CUDA VM resource (#3128)
  • Revert change from python to Python_EXECUTABLE (#3126)
  • Coverity issue fixes - bulk drop, July 2021 (#3124)
  • Make nvJPEG detect corrupted stream before offloading to HW decoder (#3113)
  • Add --no-index option to TL1_tensorflow-dali_test test (#3112)
  • Minor fixes (#3119)
  • DALI TF install tool: Copy files for import check, rather than symlink (#3116)
  • minor fixes (#3108)
  • Dali TF installation: check import before completing the installation (#3104)
  • Remove no longer applicable sed command from RN50 MXNet test (#3103)
  • Use DALI_extra instead of example_audio_file in the spectrogram example (#3106)
  • Unify apt-get invocations (#3094)
  • Make DALI extra download optional in tests (#3102)
  • Remove pre CUDA 10.0 support in TL1_tensorflow-dali_test (#3099)
  • Bug fixes (#3096)
  • MMUtilFixes: (#3098)
  • Fix override no copy flags for External Source C API (#3097)
  • Fix HW decoder fallback to the hybrid decoder (#3086)
  • Fix DALI installation for python 3.9 version (#3092)
  • Fix python test on aarch64 platform (#3091)
  • Move pycocotools to regular pip packages in SSD test (#3090)
  • Use PEP 503 compatible extra url index to install PyTorch (#3079)
  • Remove compiler name subdirectory in prebuilt DALI TF prebuilt directory (#3078)
  • Disable MNIST dataset download for DALI pipelines (#3075)
  • Fix known FFmpeg n4.4 vulnerabilities (#3071)
  • Fix DALI TF Plugin build in TF 2.6 (#3074)
  • Fix error handling in Executor (#3069)
  • Fix typo inout -> input (#3070)
  • Fix error message when creating a TensorShape from iterators with more elements than expected (#3060)
  • Add warning about not using external_inputs in proto (#3057)
  • Fix usage of removed _ExternalSource in test (#3059)
  • Make the Python test utilities have local random state (#3055)
  • Fix batch size handling in PermuteBatch. (#3026)
  • Update FFmpeg to address CVE-2021-33815 (#3053)
  • Remove duplicated ExternalSource implementation (#3033)
  • Build the latest clang from source (#3025)

Breaking API changes

There are no breaking changes in this DALI release.

Deprecated features

There are no deprecated features in this DALI release.

Known issues:

  • The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
  • The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
    To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
  • Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
    • privileged=yes in Extra Settings for AWS data points
    • --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Note: Starting from version 1.4.0, DALI will be providing CUDA 10.2 builds instead of CUDA 10.0

Install via pip for CUDA 10:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.4.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.4.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). 
Using the latest driver may enable additional functionality. 
More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.4.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.4.0

Or use direct download links (CUDA 10.2):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

  • This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

Don't miss a new DALI release

NewReleases is sending notifications on new releases.