github libxsmm/libxsmm 1.8.2
Version 1.8.2

latest releases: 1.old_kernelapi_rip, 1.libxsmm_dnn_rip, 1.eol...
6 years ago

This last release of the 1.8.x line (before 1.9) accumulated a large number of changes to tweak interfaces, and to generally improve usability. The documentation vastly improved and extended, is more structured, and also available per ReadtheDocs (with online full-text search). In preparation of a fully revised implementation of the DNN API (rewrite), the interface of the DNN domain (Tensor API) changed in an incompatible way (our policy should have delayed this to v1.9). However, the current main user of the DNN API has been updated (integration with TensorFlow). Also notable, v1.8.2 introduces JIT-code generation with Windows call-convention (support limited to 4-argument kernels i.e., no prefetch signature for the MM domain, and no support for DNN/convolution kernels).

INTRODUCED

  • Introduced kernel introspection/query API for registered code: full GEMM descriptor, and code size.
  • Introduced explicit batch interface (and an experimental auto-batch option); parallelized/sequential.
  • Introduced BGEMM interface for handle-based GEMM using optimized format (copy-in/out).
  • More comprehensive sparse support (EDGE: Extreme Scale Fused Seismic Simulations).
  • More comprehensive collection of DNN test cases (DeepBench, ResNet50, etc.).
  • Implemented CI for DNN domain, and infrastructure for validation (libxsmm_matdiff).
  • Support to schedule CI/tests into a Slurm based cluster environment (.travis.sh).
  • Introduced "make INTRINSICS=0" to allow building with outdated Binutils.
  • Generate preprocessor symbols for statically generated code (presence check).
  • Allow FORTRAN to access (static-)configuration values using preprocessor.
  • FORTRAN 77 support for a much wider set of functionality (MM domain).
  • Introduced MHD file I/O to e.g., aid visual inspection and validation.
  • Cleaned up type-definitions and FE-macros (lower precision GEMM).
  • More comprehensive set of prefetch strategies (SMM domain).
  • Extended LIBXSMM_VERBOSE=2 to show library version, etc.
  • Wider use of QFMA accross domains (MM, SpMM, DNN).
  • Updated application recipe for CP2K and TensorFlow.
  • Initial Eigen related code sample (batched SMMs).
  • CPUID for CPUs codenamed "Icelake".

CHANGES

  • Revised/unified API attribute decoration, and cleaned up header-only header.
  • Removed script for regenerating documentation bits (README.sh); now only per make.
  • Changed matcopy kernels to have column-major semantics (similar to transpose).
  • Support const/non-const GEMM prototypes interfering with LIBXSMM's header-only.
  • Slightly revised and based all F2K3 interfaces on lower-level F77 (implicit) routines.
  • Incorporated/enabled new/additional instructions in the code generator (BE).
  • Reshuffled properties/sizes in GEMM descriptor for future extensions.
  • Portable build-locks for improved turnaround time in parallel CI builds.
  • Comprehensive validation of DNN domain (all major benchmarks).
  • Consistent use of libxsmm_blasint (libxsmm_dmmdispatch).
  • Revised error/warning messages (LIBXSMM_VERBOSE=1).
  • Initial support for some fused operations (DNN domain).
  • Removed support for small GEMM descriptors (BIG=0).
  • Removed libxsmm_timer_xtick (libxsmm_timer.h).
  • Improved turnaround time in Travis CI testing.
  • Thread-safe scratch memory allocation.
  • Support VS 2017 (startup script, etc.)

FIXES

  • Fixed potential issue with GEMM flags being incorrectly created (GEMM wrapper).
  • Several fixes for improved FORTRAN interface compatibility (optional arguments, etc.).
  • Disabled AVX-512 code generation with Intel Compiler 2013 (SP1 brings the req. bits).
  • Fixed code gen. issue with SOA sparse kernels; corrected precision of SOA sample code.
  • Fixed index calculation in tiled libxsmm_matcopy; updated test case accordingly.
  • Fixed a number of issues in several DNN code paths unveiled by better testing.
  • Several fixes in sparse SOA domain (unveiled by LIBXSMM's integration into PyFR).
  • Improved support for (legacy) Clang wrt AVX-512 code generation (intrinsics).
  • Ported bit-scan intrinsics abstraction to yield same result with all compilers.
  • Allow static code generation to target SKX and KNM (Makefile).
  • Fixed several code generation issues for SMMs on KNM.

Don't miss a new libxsmm release

NewReleases is sending notifications on new releases.