Key Features and Enhancements
This DALI release includes the following key features and enhancements:
- Added GPU non-silent region detection operator (#3944, #4001).
- Added experimental support for the eager execution of stateful operators and arithmetic operators (#4016, #3952, #3969, #3990).
- Added
antialias
flag to Resize operator for improved control over resampling mode used (#4032). - Added experimental support for custom GPU Numba operators (#3891, #3998, #4006, #4013).
- Added support for processing video and handling of temporal arguments to color-manipulation operators and affine transform operators (#3937, #3946, #3917).
Fixed Issues
The following issues were fixed in this release:
- Fixed DALI + PyTorch Lightning iterator issue resulting in subsequent epochs terminating too early (#3923, #4048).
- Fixed scalars handling by the readers.tfrecord operator (#4024).
- Fixed variable batch size handling by the crop and coord_transform operators (#4045, #3958).
Improvements
- Add little-endian and big-endian read functions for InputStreams (#4038)
- Add antialias flag to Resize (#4032)
- Reformat python files (#4026)
- Python formatting (#4035)
- Enable nose2 in Python Tests (#4033)
- Imgcodec module boilerplate (interfaces/placeholders/basic logic) (#4029)
- Remove deprecated option options.experimental_optimization.map_vectorization.enabled (#4027)
- Guided contribution tutorial (#4011)
- Fix python formatting (#3982)
- Add eager mode stateful operators (#4016)
- Disable Numba GPU op for incompatible Numba versions (#4025)
- Add missing quote marks to the DALI_AFFINITY_MASK usage example (#4020)
- Add abstract InputStream. Refactor existing FileStreams to in to use it. (#4019)
- Make DALI iterator to call
reset()
wheniter()
is called upon it (#3923) - Add eager mode operators coverage test (#3952)
- Add ack for Numba GPU op (#3998)
- Add eager mode arithm ops (#3969)
- Reduce DALI conda package installation time (#3995)
- Add Non-silent region GPU operator (#3944)
- Workaround for nosetests in Python 3.10 (#3986)
- Numba cuda operator (#3891)
- Fix Python formatting (#3992)
- Fix Python formatting (#3988)
- Add examples of processing video that utilize per-frame operator (#3917)
- Per frame affine transforms (#3946)
- Handle partially pruned multi-output external sources (#3975)
- Dependencies update (#3979)
- Doxygen typo (#3989)
- Add per frame parameters support to brightness_contrast and color_twist families (#3937)
- Fix missing return (#3985)
- Support vector alike output for OpSpec::TryGetRepeatedArgument (#3851)
- Fix Python formatting (#3962)
- Fix and reenable optimized Cast kernel (#3976)
Bug Fixes
- Fix lack of reset when iter() is called on the DALI framework iterator (#4048)
- Use actual batch size instead of max batch size in crop_attr.h (#4045)
- Support scalars in readers.tfrecord (#4024)
- Add const char* ctor to ThreadPool (#4005)
- Remove unconditional float16 type mapping in Numba GPU op (#4013)
- Change flake8 config (#4004)
- Fix Numba CI issues (#4006)
- Fix and simplify moving mean squares CPU kernel. (#4001)
- Fix nan check and unused external source arguments in debug mode (#3990)
- Fix fn.coord_transform handling of a default matrix in variable batch case (#3958)
- Fix test_dali_tf_dataset_mnist_eager test (#3991)
- Fix test_dali_tf_dataset_mnist_eager.py and test_dali_tf_dataset_mnist_graph.py tests (#3987)
- Improve handling of "dtype" arguments in OpSchema/OpSpec (#3981)
Breaking API changes
- The shape of scalars read by the readers.tfrecord operator is now
()
instead of(1,)
. - For
cubic
andlinear
interpolation modes, theresize
operator applies the antialiasing filter by default now. The antialiasing can be turned off with theantialias
flag.
Deprecated features
- The triangular interpolation for
resize
operator has been deprecated as it is equivalent to linear interpolation with antialiasing on.
Known issues:
- The video loader operator requires that the key frames occur, at a minimum, every 10 to 15 frames of the video stream.
If the key frames occur at a frequency that is less than 10-15 frames, the returned frames might be out of sync. - Experimental VideoReaderDecoder does not support open GOP.
It will not report an error and might produce invalid frames. VideoReader uses a heuristic approach to detect open GOP and should work in most common cases. - The DALI TensorFlow plugin might not be compatible with TensorFlow versions 1.15.0 and later.
To use DALI with the TensorFlow version that does not have a prebuilt plugin binary shipped with DALI, make sure that the compiler that is used to build TensorFlow exists on the system during the plugin installation. (Depending on the particular version, you can use GCC 4.8.4, GCC 4.8.5, or GCC 5.4.) - In experimental debug and eager modes, GPU external source is not properly synchronized with DALI internal streams. As a workaround, the user may manually synchronize the device before returning the data from the callback.
- Due to some known issues with meltdown/spectra mitigations and DALI, DALI shows best performance when running in Docker with escalated privileges, for example:
privileged=yes
in Extra Settings for AWS data points--privileged
or--security-opt seccomp=unconfined
for bare Docker.
Binary builds
Install via pip for CUDA 10.2:
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda102==1.16.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda102==1.16.0
or for CUDA 11:
CUDA 11.0 build uses CUDA toolkit enhanced compatibility. It is built with the latest CUDA 11.x toolkit
while it can run on the latest, stable CUDA 11.0 capable drivers (450.80 or later).
Using the latest driver may enable additional functionality.
More details can be found in enhanced CUDA compatibility guide.
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-cuda110==1.16.0
pip install --extra-index-url https://developer.download.nvidia.com/compute/redist/ nvidia-dali-tf-plugin-cuda110==1.16.0
Or use direct download links (CUDA 10.2):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda102/nvidia_dali_cuda102-1.16.0-5323000-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda102/nvidia-dali-tf-plugin-cuda102-1.16.0.tar.gz
Or use direct download links (CUDA 11.0):
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.0-5322998-py3-none-manylinux2014_x86_64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-cuda110/nvidia_dali_cuda110-1.16.0-5322998-py3-none-manylinux2014_aarch64.whl
- https://developer.download.nvidia.com/compute/redist/nvidia-dali-tf-plugin-cuda110/nvidia-dali-tf-plugin-cuda110-1.16.0.tar.gz
FFmpeg source code:
Libsndfile source code: