NVIDIA/DALI v0.29.0 on GitHub

Key Features and Enhancements

This DALI release includes the following key features and enhancements.

New operators:
- NumpyReader GPU Operator with the support of GPU Direct Storage (#2477)
- NvJpeg2K decoding was enabled in ImageDecoder operator (#2501)
- segmentation.RandomMaskPixel operator for creating random masks containing foreground pixels (#2445)
- OneHot for GPU (#2436)
Move all NVTX infrastructure into core and create DALI domain (#2472)
New Examples:
- Add mask processing to COCO Reader with Augmentations example (#2426)
- Add reductions example (#2457)
- Example of random_mask_pixel to perform biased random crop (#2474)
- Update ExternalSource framework examples (#2482)
Operator Improvements:
- Pad: Add support for per-sample shape and alignment requirements (#2432)
- RandomResizedCrop: enable channel-first and video support + add tests (#2430)
- PythonFunction Operator: support for output layouts (#2486)
- Optimize the DCT GPU kernel. (#2471)
- COCOReader: Support for uncompressed RLE masks (#2478)
- transforms.Rotation to accept scalar inputs (#2494)
Move to CUDA 11.1 update 1 (#2419)

Fixed issues

NumpyReader : Replace std::regex with custom implementation (#2489) - fix ABI incompatibility issues
Fix the dimensionality of labels in SSDRandomCrop. (#2488)

Improvements

Move to CUDA 11.1 update 1 (#2419)
RandomResizedCrop: enable channel-first and video support + add tests (#2430)
Pad operator: Add support for per-sample shape and alignment requirements (#2432)
Update clang to 10.0 (#2424)
Add mask processing to COCO Reader with Augmentations example (#2426)
Make custom nvJEPG allocator return a relevant allocation status (#2438)
Make the custom nvJPEG allocator not throw and return only the status (#2443)
Add SearchableRLEMask utility (#2441)
Add GPU support to OneHot operator (#2436)
Reduce axes names (#2425)
Remove CUDA headers and generate stubs in runtime (#2420)
TensorVector update for iter-to-iter variable batch size (#2435)
Fix build with all options off, relax libclang required version (#2455)
Add support for UINT8 and INT8 outputs in CMN + scale and shift arguments (#2458)
CocoReader Parse RLE masks only when piwelwise masks are requested (#2462)
Add reductions example (#2457)
Enables direct linking with libcuda.so instead of dlopen (#2459)
Add segmentation.RandomMaskPixel operator (#2445)
Skips the building of prebuilt DALI package for nvidia-tensorflow (#2451)
Pad to square tests (#2442)
Enable compile time generation of dynlink wrappers for nvml (#2463)
Deprecate squeeze_labels option from MXNet iterator and enhance .squeeze function to match numpy style interface (#2450)
Hide hidden ops and improve Enum docs quality (#2470)
Enforce uniform rank and type of the outputs read by CPU DataReader. (#2476)
Move all NVTX infrastructure into core and create DALI domain (#2472)
MXNet Iterator: Revert to squeeze_labels=True behavior by default (#2479)
Example of random_mask_pixel to perform biased random crop (#2474)
Update DALI dependency (#2483)
Update ExternalSource framework examples (#2482)
Optimize the DCT GPU kernel. (#2471)
Support the output layouts in the PythonFunction Operator (#2486)
transforms.Rotation to accept scalar inputs (#2494)
Rework tutorials general (#2480)
Add support for GPU based numpy reader (#2477)
Per sample ExternalSource (#2469)
Use atol instead of rtol (#2499)
Lifts the restriction and enables enable_frame_num and enable_timestamps for filenames (#2468)
Reenable nvJPEG2000 (#2501)
Disables GDS for the default build configuration (#2502)
COCOReader: Support for uncompressed RLE masks (#2478)
Memory manager - interfaces, utilities, monotonic resources, malloc resource (#2497)
Update Jetson compilation guide (#2508)
Makes sure that cuFile and nvJPEG2k are not possible to set when not supported (#2510)

Bug fixes

Fix seed in RandomResizedCrop test. (#2437)
QNX build fix (#2440)
Fix lack of proper loading of best_prec1 from the checkpoint (#2466)
Fix the dimensionality of labels in SSDRandomCrop. (#2488)
NumpyReader : Replace std::regex with custom implementation (#2489)
Fix CPU only mode in C API (#2496)
Fix bugs reported by static analysis (#2491)
Fix typo in STYLE_GUIDE.md (#2503)
Fix NVJPEG2K_ENABLED test macros (#2504)

Breaking API changes

Deprecated features

Deprecate squeeze_labels option from MXNet iterator and enhance .squeeze function to match numpy style interface (#2450)

Known issues:

The video loader operator requires that the key frames occur at a minimum every 10 to 15 frames of the video stream. If the key frames occur at a lesser frequency, then the returned frames may be out of sync.
The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.)
Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when run in Docker with escalated privileges, for example:
- privileged=yes in Extra Settings for AWS data points
- --privileged or --security-opt seccomp=unconfined for bare Docker

Binary builds

Install via pip for CUDA 10
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda100==0.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda100==0.29.0

or for CUDA 11:

CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later). Using the latest driver may enable additional functionality. More details can be found in enhanced CUDA compatibility guide.

pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==0.29.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==0.29.0

Or use direct download links (CUDA 10.0):

Or use direct download links (CUDA 11.0):

FFmpeg source code:

This software uses code of FFmpeg licensed under the LGPLv2.1 and its source can be downloaded here

Libsndfile source code:

https://developer.download.nvidia.com/compute/redist/nvidia-dali/libsndfile-1.0.28.tar.gz

NVIDIA/DALI v0.29.0 DALI v0.29.0 on GitHub