This is a patch release containing the following changes to v1.6.4: Fixed issue with memory descriptor size computations (fc836a3) Reduced required scratchpad size for RNNs (c7e165a) Improved performance of fp16 convolution with bias on GPUs (943760e) Fixed segmentation fault for convolution weight gradient on systems with Intel AVX512 support (85e92b3)