-
Faster device-to-device copies on CUDA.
-
"More correctly" detect L2 cache size for OpenCL backend on AMD GPUs.
Fixed
-
Handling of
..
inimport
paths (again). -
Detection of impossible loop parameter sizes (#2144).
-
Rare case where GPU histograms would use slightly too much shared
memory and fail at run-time. -
Rare crash in layout optimisation.