- On ARM64 (AArch64), replaced the inline assembly implementation of
128-bit multiplication with a__uint128_t-based version.
This avoids potential miscompilations of the experimental MuseAir
hash observed with aggressive compiler optimizations (e.g. -O2/-O3)
on some toolchains (notably Apple Silicon), while still generating
optimal code on modern compilers. - Fixed the speed computation of the undocumented -T option for the
recovery functions. The reported values are now comparable to those
of the generation functions. Note that the real speed is unchanged
from the previous version, only the way it is measured has changed.