github halide/Halide v17.0.0
Halide v17.0.0

latest releases: v19.0.0.dev0, v18.0.0.dev0, v17.0.0.dev0...
9 months ago

Changes Of Note

  • ParamMap has been removed entirely from the public API. All users of ParamMap should migrate to Callable instead.
  • Halide::Parameter has been moved to the public Halide API (it was formerly "internal" and not intended for public use).
  • New scheduling primitives:
    • Func::partition() and friends: Set the loop partition policy, which controls how/whether a loop is split into three loops (prologue/steady-state/epilogue). Loop partitioning can be useful to optimize boundary conditions (e.g. clamp_edge).
    • Func::hoist_storage() and friends: allows a functions's storage to be moved to a given loop level. Unlike Func::store_at(), no optimizations are triggered (e.g. sliding window).
  • New TailStrategy options for for existing scheduling directives:
    • ShiftInwardsAndBlend: Equivalent to ShiftInwards, but protects values that would be re-evaluated by loading the memory location that would be stored to, modifying only the elements not contained within the overlap, and then storing the blended result. Unlike ShiftInwards, this is valid to use in update definitions.
    • RoundUpAndBlend: Equivalent to RoundUp, but protects values that would be written beyond the end by loading the memory location that would be stored to, modifying only the elements within the region being computed, and then storing the blended result. Unlike RoundUp, this is valid to use on non-outermost splits in update definitions.
  • Substantially improved performance and display in the VizIR output.
  • Profiler improvements:
    • Substantially nicer text output
    • Injects timing into calls for copy_to_host and copy_to_device so you can measure host<->device copy overhead
    • Allows option sorting via HL_PROFILER_SORT env var
  • Substantially faster codegen for several GPU backends.
  • Experimental serialization/deserialization feature allows for saving of Halide IR code.
  • Various bug fixes and improvements in the Anderson2021 autoscheduler.
  • Improved ARM codegen, including: better patterns for sdot/udot; improved shift/mul codegen.
  • Support for Zen4 architecture in the x86 backend.
  • Updates to the ONNX app.
  • Various fixes and improvements to sliding-window and storage-folding.
  • Improvements to slow gather operations for some x86 variants.
  • Improvements to correctness for the .async() scheduling directive.
  • Improved codegen for float16 conversion, especially on x86.
  • Several compile-time warnings of dubious usefulness disabled.
  • WebAssembly codegen now defaults to assuming that saturating-float-to-int and sign-extension instructions sets are always available.
  • Target now does some reality-checking that it doesn't contain obviously nonsensical Feature combinations

What's Changed

Full Changelog: v16.0.0...v17.0.0

Don't miss a new Halide release

NewReleases is sending notifications on new releases.