This is a maintenance release, which focuses (again) on the DNN API. However, this version includes bug-fixes for a number of severe issues, which have been found in various domains (SMM, DNN, SPMDM, and in general).
INTRODUCED
- Documented header-only implementation of LIBXSMM
- DNN: introduced routine to check code gen. (libxsmm_dnn_get_codegen_success)
- DNN: introduced routine for explicit transpose (libxsmm_dnn_transpose_filter)
- DNN: introduced to query number of tasks (libxsmm_dnn_get_parallel_tasks)
- DNN: support external filter reduction in case of parallelization over the minibatch
- MEM: exposed routine to query size of buffer allocated by libxsmm_[aligned_]malloc
- SPMDM: introduced support for beta, code optimizations
CHANGES
- SPMDM: improved static code path selection (no CPUID dispatch)
- SMM: raised THRESHOLD until which JIT code is automatically generated
- Raised baseline code path to SSE4.2 to avoid CPUID-dispatched CRC32;
fixed (again) controlling the static code path according to documentation - Adjusted separation between gen-library and main library
- MEM/debug: checksum for internal bookkeeping structure
- MEM: streamlined internal bookkeeping structures
- Improved reliability of library initialization
FIXES
- SMM: evtl. wrong code version under concurrent dispatch under hash key collision
- DNN: raised/fixed weight update performance to the expected level (AVX-512)
- DNN: fixed a bug which was introduced by code refactoring (fwd. convolution)
- DNN: fixed bug in Deepbench and refactored backward convolution code
- DNN: corrected setting up the handle for the weight update convolution
- MEM: fixed kernel-dump related console output (print correct address)
- Avoid certain (pseudo-)AVX-512 intrinsics, which might be not present (GCC)
- Avoid AVX-512/Core intrinsics prior to Clang 3.8 (3.9 brings them in)
- Avoid to apply AVX-512/Core flags with earlier versions of Clang (IDEs)
- Updated C++ entry points for code dispatch (remainder of issue #105);
this change fixed performance issue with CP2K/intel branch - SPMDM: fixed issue for N if not a multiple of 16