Added
-
Faster index calculations for very tight GPU kernels (such as the
ones corresponding to 2D tiling). -
scan
with vectorised operators (e.g.map2 (+)
) is now faster
in some cases. -
The C API has now been documented and stabilized, including
obtaining profiling information (although this is still
unstructured).
Fixed
-
Fixed some cases of missing fusion (#953).
-
Context deinitialisation is now more complete, and should not leak
memory (or at least not nearly as much, if any). This makes it
viable to repeatedly create and free Futhark contexts in the same
process (although this can still be quite slow).