Added
-
The
cuda
andhip
backends now generate faster code forscan
s
that have been fused withmap
s that internally produce arrays.
Work by Anders Holst and Christian Påbøl Jacobsen. -
f16.ldexp
,f32.ldexp
,f64.ldexp
, corresponding to the
functions in the C math library.
Fixed
-
Incorrect data dependency information for
scatter
andvjp
could
cause invalid simplification. -
Barrier divergence in certain complicated kernels that contain both
bounds checks and intragroup scans.