Added
-
API functions now return more precise error codes in some cases.
-
Out-of-memory errors contain more information.
Fixed
-
Yet another in-place lowering issue (#1569).
-
Removed unnecessary bounds checks in register tiling, giving about
1.8x speedup on e.g. matrix multiplication on newer NVIDIA GPUs. -
A parser bug erroneously demanded whitespace in some type
expressions (#1573). -
Some memory was not being freed correctly when shutting down OpenCL
and CUDA contexts, which could lead to memory leaks in processes
that created and freed many contexts. -
An incorrect copy-removal in some exotic cases (#1572).
-
'restore'-functions might perform undefined pointer arithmetic when
passed garbage.