github libxsmm/libxsmm 1.5.1
Version 1.5.1

latest releases: 1.old_kernelapi_rip, 1.libxsmm_dnn_rip, 1.eol...
7 years ago

This (minor) release is mainly a bugfix release, which gains its urgency from a fixed bug in the Fortran interface (SMM functionality), where requesting a JIT kernel never returned a suitable PROCEDURE POINTER (always NULL). The implemented fix now reaches v1.5's goal to support a wider variety of Fortran compilers (GNU, Intel, CRAY, and PGI) while the Fortran interface code still allows to stay with GNU Fortran 4.5 (oldest supported Fortran compiler).

Beyond the above bugfix, there are four fixes for the new DNN functionality, and an improved/fixed console output of the DNN sample code. Furthermore, the out-of-place transpose code now detects when the input and output matrix are pointing to the same array (alias). Instead to return an error code in general, the most common special case (M=N, LDin=LDout) is now implemented (high-performance in-place transpose is still pending for a future release).

INTRODUCED

  • SC'16 paper "LIBXSMM: Accelerating Small Matrix Multiplications by Runtime Code Generation"
    => Please consider to attend the presentation!
  • Self-contained Linux perf support (see PR #100): removed dependency to Linux kernel header
  • Additional sample code (spmdm) for sparse matrix multiplication (see PR #101)

CHANGES

  • Improved reliability of the out-of-place transpose, and support for in-place corner case
  • Additional test infrastructure e.g., allowing to test with Intel Compiler
  • New script (.travis.sh) to build/run Travis testset (.travis.yml; "script:" section)
  • DNN backend: expanded support for 8 and 16-bit integer instructions

FIXES

  • Fixed Fortran interface, where requesting a JIT kernel never returned a suitable PROCEDURE (NULL)
    => This issue has been introduced by v1.5, which aimed to support a wider variety of compilers
  • DNN backend: fixed bug in int16 convolutions (2d register blocking)
  • DNN: fixed bug in nhwc/rsck fallback code (forward convolutions)
  • DNN: fixed bug in unrolling calculation for int16 implementation
  • DNN: fixed case for less than 16 input channels (int16)
  • DNN sample code: fixed GOP and GFLOP output

Don't miss a new libxsmm release

NewReleases is sending notifications on new releases.