Notes
- This release contains new reduction operations, Winograd algorithm performance improvements as well as bug fixes. Various host side performance improvements have been added as well.
Changes
- Added a GPU reference kernel implementation for faster testing.
- Add TargetID support for new AMD GPU architectures.
- Implementation of four additional generic tensor reduction operations (AVG, AMAX, NORM1, NORM2).
- Fixed a bug where Batchnorm would give incorrect results when the product of image height and image width is not a factor of four.
- Various host side improvements for better find and tuning performance.
- Added support for AMD Code Object V4.