github libxsmm/libxsmm 1.8.3
Version 1.8.3

latest releases: 1.old_kernelapi_rip, 1.libxsmm_dnn_rip, 1.eol...
6 years ago

Overview: while v1.9 is in the works, this release fixes two issues, and pushes for an improved (OSX w/ Intel Compiler) and wider OS/Compiler coverage (MinGW, BSD, see Compatibility). Among minor or exotic issues resolved in this release, the stand-alone JIT-generated matrix transposes (out-of-place) are now limited to matrix shapes such that only reasonable amounts of code are generated. There has been also a rare synchronization issue reproduced with CP2K/smp in LIBXSMM v1.8.1 (and likely earlier), which is resolved since the previous release (v1.8.2).

JIT code generation/dispatch performance: JIT-generating code (non-transposed GEMMs) is known to be blazingly fast, which this release (re-)confirms with the extended dispatch microbenchmark: single-threaded code generation (uncontended) of matrix kernels with M,N,K := 4...64 (equally distributed random numbers) takes less than 25 µs on typical systems, and non-cached code dispatch takes less than 50x longer than calling a function that does nothing whereas cached code-dispatch takes less than 15x longer than an empty function (code dispatch is roughly three orders of magnitudes faster than code generation i.e., Nanoseconds vs. Microseconds).

INTRODUCED

  • Support for mixing C and C++ code when using header-only based LIBXSMM.
  • Issue 202: reintroduced copy-update with LIBXSMM's install target (make).
  • Experimental: sketched Python support built into LIBXSMM (PYMOD=1).

IMPROVEMENTS / CHANGES

  • Completed revision of synchronization layer (started in v1.8.2); initial documentation.
  • Reduced TRACE output due to self-watching (internal) initialization/termination.
  • Wider OS validation incl. more exotic sets (MinGW in addition to Cygwin, BSD).
  • Prevent production code (non-debug) on 32-bit platforms (compilation error).
  • Increased test variety while staying within same turnaround time limit.
  • Continued to close implementation gaps (synchronization primitives).
  • Sparse SOA domain received fixes/improvements driven by EDGE.
  • More readable code snippets in documentation (reduced width).
  • Initial preparation for JIT-generating SSE code (disabled).
  • Improved detection of OpenBLAS library (Makefile.inc).
  • Updated (outdated) support for Intel Compiler (OSX).
  • Compliant soname under Linux and OSX.

FIXES

  • Fixed selection of statically generated code targeting Skylake server (SKX).
  • Sparse SOA domain: resolved issues pointed out by static analysis.
  • Fixed support for JIT-generated matrix transpose (code size).
  • Fixed selecting an incorrect prefetch strategy (BGEMM).

Don't miss a new libxsmm release

NewReleases is sending notifications on new releases.