libxsmm/libxsmm 1.5.1 on GitHub

This (minor) release is mainly a bugfix release, which gains its urgency from a fixed bug in the Fortran interface (SMM functionality), where requesting a JIT kernel never returned a suitable PROCEDURE POINTER (always NULL). The implemented fix now reaches v1.5's goal to support a wider variety of Fortran compilers (GNU, Intel, CRAY, and PGI) while the Fortran interface code still allows to stay with GNU Fortran 4.5 (oldest supported Fortran compiler).

Beyond the above bugfix, there are four fixes for the new DNN functionality, and an improved/fixed console output of the DNN sample code. Furthermore, the out-of-place transpose code now detects when the input and output matrix are pointing to the same array (alias). Instead to return an error code in general, the most common special case (M=N, LD_in=LD_out) is now implemented (high-performance in-place transpose is still pending for a future release).

INTRODUCED

SC'16 paper "LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation"
=> Please consider to attend the presentation!
Self-contained Linux perf support (see PR #100): removed dependency to Linux kernel header
Additional sample code (spmdm) for sparse matrix multiplication (see PR #101)

CHANGES

Improved reliability of the out-of-place transpose, and support for in-place corner case
Additional test infrastructure e.g., allowing to test with Intel Compiler
New script (.travis.sh) to build/run Travis testset (.travis.yml; "script:" section)
DNN backend: expanded support for 8 and 16-bit integer instructions

FIXES

Fixed Fortran interface, where requesting a JIT kernel never returned a suitable PROCEDURE (NULL)
=> This issue has been introduced by v1.5, which aimed to support a wider variety of compilers
DNN backend: fixed bug in int16 convolutions (2d register blocking)
DNN: fixed bug in nhwc/rsck fallback code (forward convolutions)
DNN: fixed bug in unrolling calculation for int16 implementation
DNN: fixed case for less than 16 input channels (int16)
DNN sample code: fixed GOP and GFLOP output

libxsmm/libxsmm 1.5.1 Version 1.5.1 on GitHub

libxsmm/libxsmm 1.5.1
Version 1.5.1

on GitHub