github libxsmm/libxsmm 1.4.4
Version 1.4.4

latest releases: 1.old_kernelapi_rip, 1.libxsmm_dnn_rip, 1.eol...
7 years ago

This release improves and stabilizes previously released features while containing the necessary changes (generalized VTune Profiling support, JIT buffer management, and changes to the code registry structure) for upcoming new functionality. It also contains a number of new (preview-)features (not yet documented) such as sparse SoA matrix multiplication in the frontend, and stand-alone out-of-place general matrix transposes.

CHANGES

  • Introduced SONAME for shared objects (dynamic library) under Linux and OS X (see issue #79). This change may ease to include the library into Linux distributions (package repository). The Python utility script have been adjusted to output various version number formats used to format the SONAME. This change also updated the installation target (Makefile) to install symbol links rather than duplicating shared libraries.
  • Made PREFETCH=1 the default as it already refers to auto-prefetch based on the CPUID. This change complements previous efforts to reduce the "need" for different compile-time configurations and specializations. Performance related needs are now mostly migrated to CPUID dispatched code paths.
  • LIBXSMM_VERBOSE mode now includes accurate heap memory consumption for the code registry and for the JIT'ted code buffers, and it also allows to dump the JIT code to files for manual inspection (issue #88).
  • Improved FORTRAN 2003 conformance (larger set of warnings under the PEDANTIC=2 umbrella flag), and resolved an issue with the Intel Compiler 2011 SP1 (avoid MERGE intrinsic in PARAMETER declaration).
  • Deprecated (actually removed) the ROW_MAJOR support in preparation of including a regular CBLAS interface. This also removes the associated configuration flags in the interface while keeping some support for deployed applications which fortunately only check for COL_MAJOR.
  • Initial sparse matrix support arrived in the interface.; such a kernel is not managed by the code registry, but rather created (libxsmm_create_dcsr_soa) and released (libxsmm_destroy) manually.
  • Internal library services are ported in preparation of the Windows support. This includes VTune support for executable buffers in general, which also includes manually managed kernels (sparse SOA kernels).
  • Initial stand-alone support for out-of-place matrix transpose (libxsmm_*transpose_oop) for C/C++ and FORTRAN. The CPUID-dispatched code and the implementation of the in-place transpose are still missing.
  • Enabled JIT code generation under Windows (does not work yet due to incorrect calling convention). In fact, all code previously preventing the JIT facility under Windows is now removed, and thus one may call into JIT code (and fail due to the different calling convention). Prefetch signatures are still avoided under Windows (although this does not help with the calling convention). Cygwin support still avoids JIT other than exercising the related code when building a DEBUG version.
  • Improved Clang support, and in particular account somewhat better for the broken Intrinsic support in Clang (when the static code path is below the code path "needed" for the Intrinsics). This also played out as an improvement for the GCC based tool chain, which somewhat better supports the Intrinsics use-case (target attribute). Under OS X, the SSE 4.2 code is now enabled as the baseline/static code path (due to broken support with CRC32 intrinsics in particular). Note, under Linux the CRC32 instructions are CPUID-dispatched.
  • Allow for a header-only implementation of LIBXSMM to ease adoption with certain header-only C++ libraries (Eigen, etc.); see issue #86. This facility also works for C (which is quite notable), however the header-only implementation currently not allows to link C and C++ objects into a single binary.
  • Code which does not call any BLAS related code in LIBXSMM (e.g., the sparse SOA kernels) may now link against libxsmmext in order to get rid of the BLAS dependency. For more details see issue #82.
  • Updated documentation (it is still behind newer/development features); updated the CP2K guide (documentation folder).

FIXES

  • libxsmm_xmmdispatch now properly falls back to BLAS if the requested kernel is not supported.
  • There are numerous smaller improvements and CHANGES which can be perceived as fixes.

UPCOMING

  • Initial support for convolutions as commonly used in Machine Learning
  • High performance stand-alone in-place transpose
  • Windows JIT support

Don't miss a new libxsmm release

NewReleases is sending notifications on new releases.