This is a bug-fix release, which resolves a severe issue with concurrently modifying the code registry. The related code did not receive much development in the past (macro based), but is now cleanly erected and covered by a rigorous test case. There is also enough of an API to determine some basic registry properties (capacity, size), and a guarantee to receive JIT-code under reasonable conditions (e.g., if the registry is not exhausted). A routine allows to relax the conditions under which no JIT-code is generated (libxsmm_[get|set}_dispatch_trylock allows return no code if the access to the code registry is contended).
INTRODUCED
- Slightly improved multi-target functions, but GCC 4.9 (and later) is needed to avoid "legacy" support.
- Introduced libxsmm_get_registry_info to receive basic metrics about the (GEMM-)code registry.
- Implemented parallelization threshold for the libxsmm_otrans_omp routines.
- Improved code generation of sparse matrix domain and JIT-support (A-sparse/reg., CSR);
libxsmm_create_dgemm_descriptor routine to ease language binding (pyfr sample code). - Cover a wider range of compiler versions, see our build status page.
- More error/warnings covered in release builds (LIBXSMM_VERBOSE=1).
- Build all possible sample codes as part of the CI tests.
- Implemented sync/lock abstraction (Windows).
- Optimized access to thread-local code cache.
CHANGES
- TF configured THRESHOLD=0, but explicit JIT does not fallback plus THRESHOLD is an upper limit;
prevented 0-threshold, and choose default if THRESHOLD=0 is requested (128**3). - Suppress warnings about unused functions when our build system is not used.
- Reduced lock-contention in JIT-code generation (more locks).
- Use relaxed Atomics in JIT-code thread synchronization.
FIXES
- Fixed severe issue with concurrent JIT-code generation (code registry); new/rigorous test case.
- Fixed an issue when building the DNN sample code using GCC 6.3 (linker error).
- SPMDM: avoid some duplicated symbols under Windows.